Tuesday, May 19th 2020

Possible AMD "Vermeer" Clock Speeds Hint at IPC Gain

The bulk of AMD's 4th generation Ryzen desktop processors will comprise of "Vermeer," a high core-count socket AM4 processor and successor to the current-generation "Matisse." These chips combine up to two "Zen 3" CCDs with a cIOD (client I/O controller die). While the maximum core count of each chiplet isn't known, they will implement the "Zen 3" microarchitecture, which reportedly does away with CCX to get all cores on the CCD to share a single large L3 cache, this is expected to bring about improved inter-core latencies. AMD's generational IPC uplifting efforts could also include improving bandwidth between the various on-die components (something we saw signs of in the "Zen 2" based "Renoir"). The company is also expected to leverage a newer 7 nm-class silicon fabrication node at TSMC (either N7P or N7+), to increase clock speeds - or so we thought.

An Igor's Lab report points to the possibility of AMD gunning for efficiency, by letting the IPC gains handle the bulk of Vermeer's competitiveness against Intel's offerings, not clock-speeds. The report decodes OPNs (ordering part numbers) of two upcoming Vermeer parts, one 8-core and the other 16-core. While the 8-core part has some generational clock speed increases (by around 200 MHz on the base clock), the 16-core part has lower max boost clock speeds than the 3950X. Then again, the OPNs reference A0 revision, which could mean that these are engineering samples that will help AMD's ecosystem partners to build their products around these processors (think motherboard- or memory vendors), and that the retail product could come with higher clock speeds after all. We'll find out in September, when AMD is expected to debut its 4th generation Ryzen desktop processor family, around the same time NVIDIA launches GeForce "Ampere."
Sources: Igor's Lab, VideoCardz
Add your own comment

37 Comments on Possible AMD "Vermeer" Clock Speeds Hint at IPC Gain

#26
efikkan
BoboOOZAny large workload can be parallelized, if it's big enough, it means it can be broken into pieces that can be dealt with separately.
Everything has a cost. If you divide it into too small work chunks you'll end up with too much overhead. There are also dependencies which requires things to be executed in sequence (like a pipeline), limiting how large work chunks you can create before synchronizing. Another implication of this is just mutation of data, and those thinking that throwing in mutexes everywhere will solve it would be ignorant, you'll quickly end up with stalled threads and even deadlocks. Additionally there is OS scheduling overhead, which quickly can add latency up to 1ms or more, which becomes significant when your entire workload lives within a few ms window and you're trying to sync up hundreds or thousands of times per frame, but not significant if you have giant work chunks taking several seconds or even minutes.
BoboOOZAre you really trying to say that the PS5 will get by with doing most of the work on one 2GHz core?
What? I've never said anything like that.
I pointed out that we've had 8-core consoles for nearly 7 years now. Many predicted this would cause "fully multithreaded games" within "a couple of years" back when Xbox One and PS4 launched, just like you are doing now. But it didn't happen, because rendering doesn't work that way.
BoboOOZI think what you do not understand is the fact that having all the calls to the graphic API coming from a single thread doesn't equate at all to the fact that that thread is doing all the computing.
I understand fully, and I never claimed so either.
As I said, having multiple threads building a single queue makes no sense. Which is why the amount of threads interfacing with the GPU is limited to the distinct tasks (rendering passes or compute workloads) which can be separated and potentially parallelized.
Additionally, the driver uses up to several thread on its side, but that's out of our control.
BoboOOZThe rest is more blurry, and it depends entirely on what direction the gaming industry will take from now on. At present, we are GPU bound in most games with high settings.
Well, that's the goal.
Ideally, all games should be GPU bound. That way you get the graphics performance you paid for.
BoboOOZDevelopers could take the approach of shifting more load towards the CPU, and parallelize massively their code, or could choose to do the minimum necessary to see performance improvements. Since I don't have my crystal ball, I have no way to predict which will occur.
This clearly illustrates that you don't understand how games works.
GPUs have specialized hardware to do all kinds of rendering tasks, like creating verticies, tessellation, rasterization, texture mapping, etc. While all of these can technically be emulated in software, the performance would be terrible.

Offloading the GPU to the CPU would be to go backwards. And what would be the purpose? GPUs are massively powerful at dense math, while CPUs are better at logic. Games typically do a lot of the logic parts on the CPU while building a queue, then sends batches of data to the GPU which it does the best.

I wouldn't advice anyone to base anything from their crystal balls, but rather understand the field of expertise before attempting to do qualified predictions. But I can list some of the current trends;
* GPUs are becoming more flexible, and the rendering pipeline is increasingly programmable. Some games will take advantage of this, and thus become less CPU bottlenecked.
* Most games are using a few generalized game engines. These engines are increasingly bloated, and will to some extend counteract the improvements on the GPU side. Some of these may spawn a lot of threads, most of which does fairly little work at all. The ever-increasing bloat of these engines is also responsible for the lack of benefits from lower API overhead in DirectX 12.
* Resource streaming will be more utilized, but slowly.
* GPU accelerated audio will probably become common eventually.
BoboOOZBut technically, it is perfectly possible to code an application in such a way that it runs faster on a 16 core with 85% clock speed versus on a 10 core at 100% clock speed. It has already been done for many applications and it can be done for the game too.
I'm sorry, but this is where you go wrong. An arbitrary piece of code can't be parallelized any way you want, see my first paragraph.
A workload which mostly consists of large work chunks which can be processed mostly async can scale to almost any core count. The need for synchronization increase the overhead for additional threads, so with more synchronization a workload will always have a diminishing return with increasing thread count. Games is one of the workloads which is on the "worst" end of this scale.
BoboOOZI am quite sure future improvements of processor performance will rely much more on core counts than on IPC. And clock frequencies will stagnate at best, as both Intel and AMD continue to shrink their nodes, we will have 64 cores in a few years for home computers, but we will never reach 6GHz.
You're right about clock speeds stagnating (at least until we have different types of semiconductors), but the way forward is a balanced approach between more cores, more superscalar and more SIMD. Many are forgetting that the performance per core is the base scaling factor for multithreaded performance. Since most non-server workloads does not scale linearly, having faster cores actually helps you scale to more cores and suffer less from OS overhead. The key here is to strike the right balance between core count and core speed for your workload.
Posted on Reply
#27
BoboOOZ
efikkanEverything has a cost. If you divide it into too small work chunks you'll end up with too much overhead. There are also dependencies which requires things to be executed in sequence (like a pipeline), limiting how large work chunks you can create before synchronizing. Another implication of this is just mutation of data, and those thinking that throwing in mutexes everywhere will solve it would be ignorant, you'll quickly end up with stalled threads and even deadlocks. Additionally there is OS scheduling overhead, which quickly can add latency up to 1ms or more, which becomes significant when your entire workload lives within a few ms window and you're trying to sync up hundreds or thousands of times per frame, but not significant if you have giant work chunks taking several seconds or even minutes.


What? I've never said anything like that.
I pointed out that we've had 8-core consoles for nearly 7 years now. Many predicted this would cause "fully multithreaded games" within "a couple of years" back when Xbox One and PS4 launched, just like you are doing now. But it didn't happen, because rendering doesn't work that way.


I understand fully, and I never claimed so either.
As I said, having multiple threads building a single queue makes no sense. Which is why the amount of threads interfacing with the GPU is limited to the distinct tasks (rendering passes or compute workloads) which can be separated and potentially parallelized.
Additionally, the driver uses up to several thread on its side, but that's out of our control.


Well, that's the goal.
Ideally, all games should be GPU bound. That way you get the graphics performance you paid for.


This clearly illustrates that you don't understand how games works.
GPUs have specialized hardware to do all kinds of rendering tasks, like creating verticies, tessellation, rasterization, texture mapping, etc. While all of these can technically be emulated in software, the performance would be terrible.

Offloading the GPU to the CPU would be to go backwards. And what would be the purpose? GPUs are massively powerful at dense math, while CPUs are better at logic. Games typically do a lot of the logic parts on the CPU while building a queue, then sends batches of data to the GPU which it does the best.

I wouldn't advice anyone to base anything from their crystal balls, but rather understand the field of expertise before attempting to do qualified predictions. But I can list some of the current trends;
* GPUs are becoming more flexible, and the rendering pipeline is increasingly programmable. Some games will take advantage of this, and thus become less CPU bottlenecked.
* Most games are using a few generalized game engines. These engines are increasingly bloated, and will to some extend counteract the improvements on the GPU side. Some of these may spawn a lot of threads, most of which does fairly little work at all. The ever-increasing bloat of these engines is also responsible for the lack of benefits from lower API overhead in DirectX 12.
* Resource streaming will be more utilized, but slowly.
* GPU accelerated audio will probably become common eventually.



I'm sorry, but this is where you go wrong. An arbitrary piece of code can't be parallelized any way you want, see my first paragraph.
A workload which mostly consists of large work chunks which can be processed mostly async can scale to almost any core count. The need for synchronization increase the overhead for additional threads, so with more synchronization a workload will always have a diminishing return with increasing thread count. Games is one of the workloads which is on the "worst" end of this scale.


You're right about clock speeds stagnating (at least until we have different types of semiconductors), but the way forward is a balanced approach between more cores, more superscalar and more SIMD. Many are forgetting that the performance per core is the base scaling factor for multithreaded performance. Since most non-server workloads does not scale linearly, having faster cores actually helps you scale to more cores and suffer less from OS overhead. The key here is to strike the right balance between core count and core speed for your workload.
Revival of an old thread, but here's a new game that requires 8 cores as recommended settings. We'll see more and more of these, especially in open-world games:
www.techspot.com/news/86731-youll-need-serious-hardware-play-serious-sam-4.html
Posted on Reply
#28
Vayra86
BoboOOZRevival of an old thread, but here's a new game that requires 8 cores as recommended settings. We'll see more and more of these, especially in open-world games:
www.techspot.com/news/86731-youll-need-serious-hardware-play-serious-sam-4.html
If a simple shooter requires 8 cores at 3.3 Ghz I seriously question this devs' sanity, capability and overall product. They say its needed for thousands of actors. Hello, Total War wants a word?

Its also a nice way to identify shit console ports.
Posted on Reply
#29
BoboOOZ
Vayra86If a simple shooter requires 8 cores at 3.3 Ghz I seriously question this devs' sanity, capability and overall product. They say its needed for thousands of actors. Hello, Total War wants a word?
SS is a great single player shooter, can' wait to see the new one.
Total war is a great game too, but all those soldiers do not have individual AI, the AI is at unit level. The complexity is left only to the GPU for rendering. In SS all these monsters are trying to find you and come at you.
Vayra86Its also a nice way to identify shit console ports.
I'm not that sure, all the previous versions were PC first. You can play on a 4 core, too, you'll just play at 720p/30fps :D
Posted on Reply
#30
Vayra86
BoboOOZSS is a great single player shooter, can' wait to see the new one.
Total war is a great game too, but all those soldiers do not have individual AI, the AI is at unit level. The complexity is left only to the GPU for rendering. In SS all these monsters are trying to find you and come at you.

I'm not that sure, all the previous versions were PC first. You can play on a 4 core, too, you'll just play at 720p/30fps :D
I'll give you another one


Enter the Matrix ;) Of course its obviously not fully dynamic. But I strongly doubt Serious Sam will do that with a supposedly infinite number of actors. That is why I'm saying... this does not have to be done on 8 core machines. And from the pov of being a capable game on most mainstream systems... I'd say its optimistic.

I did play the early SS's, they weren't bad at all, but very straightforward. That is why, again, this req seems so questionable. This game ran on a toaster CPU.
Posted on Reply
#31
BoboOOZ
Vayra86I'll give you another one


Enter the Matrix ;) Of course its obviously not fully dynamic. But I strongly doubt Serious Sam will do that with a supposedly infinite number of actors. That is why I'm saying... this does not have to be done on 8 core machines. And from the pov of being a capable game on most mainstream systems... I'd say its optimistic.

I did play the early SS's, they weren't bad at all, but very straightforward. That is why, again, this req seems so questionable. This game ran on a toaster CPU.
I don't know Matrix, so I can't comment on it, but I would imagine those agents are pretty dumb, otherwise, Neo dies :cool:. But if you have next-gen games and they can run on equipment 10 years old, that's an indication that the developers aren't trying to give you more, with higher requiurement? What's the ponit of having good, new equipment, if it remains unused?
Posted on Reply
#32
efikkan
BoboOOZRevival of an old thread, but here's a new game that requires 8 cores as recommended settings. We'll see more and more of these, especially in open-world games:
www.techspot.com/news/86731-youll-need-serious-hardware-play-serious-sam-4.html
Considering the requirements doesn't even bother listing what class of CPU, just the ambiguous "8 cores" and "3.3 GHz", they probably didn't put much thought into it. I'm pretty sure a 4/6 core Comet Lake would outperform an good old 8-core Bulldozer in this game.
Posted on Reply
#33
BoboOOZ
efikkanConsidering the requirements doesn't even bother listing what class of CPU, just the ambiguous "8 cores" and "3.3 GHz", they probably didn't put much thought into it. I'm pretty sure a 4/6 core Comet Lake would outperform an good old 8-core Bulldozer in this game.
Bulldozer was not an 8 core, that was settled by a class-action a while ago... And given that the required GPU's are at most 2 generations old, that gives a decent ballpark to what 8 core means.
Anyway, I'm looking forward to see if the required oomph will also translate to better graphics and gameplay, or it's just a lack of optimization.
Posted on Reply
#34
Vayra86
BoboOOZI don't know Matrix, so I can't comment on it, but I would imagine those agents are pretty dumb, otherwise, Neo dies :cool:. But if you have next-gen games and they can run on equipment 10 years old, that's an indication that the developers aren't trying to give you more, with higher requiurement? What's the ponit of having good, new equipment, if it remains unused?
The point is 9 times out of 10 you really never needed the good new equipment. its just a cost- or quality cutting measure that you pay for. Optimization and writing great software is an art form. Not everyone is talented, and lots of software is being written. Its close to being a factory product that rolls off the line and into a box.

In many cases a lack of talent, time and/or optimization is solved by iterative development. You get a game, and a day one patch to make it work. You get a patch every other week. Etc.

Make no mistake everything you see up to and including specs like these is just cold hard business, nothing else. New technology? Man, we had accurate reflections as early as Unreal 1 and given enough work on a rasterized approach we can already create scenes that rival ray traced content. Or are just ray traced content, baked in. Its 2020 and we're now thinking of automation. Why? Apparently there is an economical reality where it generates profit, or is likely to do so.

NPC's and AI are of a similar nature. The groundwork is decades old and still being iterated on. If they just took that and made it 'a lot bigger' then its easy to arrive at an 8 core requirement like this. You said it right, Total War found a trick around it. Enter the Matrix does something similar - the way that works is that every time the game picks 4-5 actors that are surrounding Neo, and makes them 'active', the rest is dancing around it creating an illusion of density. Yes, you see through it. And I guarantee you... even in SS4 with its fabulous system you will see through it. None of this is new. Dying Light for example... how many zombies exactly? Exactly. And again... that game is not CPU intensive.

Another example... that Vulkan / Mantle demo, what was it called? It did NOT melt CPUs. With tens of thousands of actors.
Posted on Reply
#35
efikkan
BoboOOZBulldozer was not an 8 core, that was settled by a class-action a while ago...
That's not how lawsuits work. AMD settled because a settlement is cheaper than the alternative, not because the claim was correct.
Bulldozer was an 8-core design, there is no doubt about that, albeit with major "shortcomings" in the design.
BoboOOZAnd given that the required GPU's are at most 2 generations old, that gives a decent ballpark to what 8 core means.
Anyway, I'm looking forward to see if the required oomph will also translate to better graphics and gameplay, or it's just a lack of optimization.
There is usually no relation between the age of the CPU and the GPU in the recommendations. A 7 year old Haswell, or even older CPUs can still be more relevant than GPUs of a similar age, not to mention being able to compete with brand new AMD CPUs.

People are generally putting way too much thought into these recommendations. Usually they are derived from the test systems which were primarily used during development, and can sometimes be very optimistic or conservative. Look at reviews if you want to see the reality, or gamble and buy the game yourself.
Posted on Reply
#36
BoboOOZ
efikkanBulldozer was an 8-core design, there is no doubt about that, albeit with major "shortcomings" in the design.
The shortcoming was that there were only 4 fp units, so for many workflows it became effectively a 4 core. It is accurate to describe it as a 4 core with 2 integer units and one fp unit per core, because I'm pretty sure they didn't make cores with 1 integer unit and half an fp unit.
efikkanThere is usually no relation between the age of the CPU and the GPU in the recommendations. A 7 year old Haswell, or even older CPUs can still be more relevant than GPUs of a similar age, not to mention being able to compete with brand new AMD CPUs.
I would it say it is quite the contrary and what you are quoting is the exception, the epic Intel stagnation in the last decade
efikkanPeople are generally putting way too much thought into these recommendations. Usually they are derived from the test systems which were primarily used during development, and can sometimes be very optimistic or conservative. Look at reviews if you want to see the reality, or gamble and buy the game yourself.
I'm not a game developer, but definitely by the way you and a couple of other guys are describing them around the forum, they must be a bunch of lazy ignorants , who aren't even capable to monitor in the windows task manager their thread/core utilization during their testing sessions...
I will probably buy it though, we'll say what that legion mode is all about.
Posted on Reply
#37
seronx
BoboOOZThe shortcoming was that there were only 4 fp units, so for many workflows it became effectively a 4 core. It is accurate to describe it as a 4 core with 2 integer units and one fp unit per core, because I'm pretty sure they didn't make cores with 1 integer unit and half an fp unit.
FPU's aren't part of the core.

K7 doesn't have a FPU in the core.
K8 doesn't have a FPU in the core.
Greyhound doesn't have a FPU in the core.
Husky doesn't have a FPU in the core.
Bobcat doesn't have a FPU in the core.
Jaguar doesn't have a FPU in the core.
Zen doesn't have a FPU in the core.

The only modern design from AMD to have a FPU inside the core is this one:


Single control unit, single instruction bus, single data bus, single superscalar datapath => one core.

AMD's Orochi design is more accurate to describe as four processors with two cores each. As by architect definition since before the 90s.

Retire unit (C0) & Retire unit (C1) => Two control units
Scheduler (C0) & Scheduler (C1) => Two instruction buses
Datapath (C0) & Datapath (C1) => Two datapaths
Load/Store (C0) & Load/Store (C1) => Two data buses
A Bulldozer processor is a dual-core design.

General consensus to marketing is one core in processor, just call the processor a core. In this, case AMD had two cores in a processor, and thus it is a dual-core unit.

Imagine reading a "technical" document... where ___ core contains core. When previous documents have... where ___ processor contains/builds on processor core/core.
Posted on Reply
Add your own comment
Apr 18th, 2024 20:13 EDT change timezone

New Forum Posts

Popular Reviews

Controversial News Posts