Tuesday, May 19th 2020

Possible AMD "Vermeer" Clock Speeds Hint at IPC Gain

The bulk of AMD's 4th generation Ryzen desktop processors will comprise of "Vermeer," a high core-count socket AM4 processor and successor to the current-generation "Matisse." These chips combine up to two "Zen 3" CCDs with a cIOD (client I/O controller die). While the maximum core count of each chiplet isn't known, they will implement the "Zen 3" microarchitecture, which reportedly does away with CCX to get all cores on the CCD to share a single large L3 cache, this is expected to bring about improved inter-core latencies. AMD's generational IPC uplifting efforts could also include improving bandwidth between the various on-die components (something we saw signs of in the "Zen 2" based "Renoir"). The company is also expected to leverage a newer 7 nm-class silicon fabrication node at TSMC (either N7P or N7+), to increase clock speeds - or so we thought.

An Igor's Lab report points to the possibility of AMD gunning for efficiency, by letting the IPC gains handle the bulk of Vermeer's competitiveness against Intel's offerings, not clock-speeds. The report decodes OPNs (ordering part numbers) of two upcoming Vermeer parts, one 8-core and the other 16-core. While the 8-core part has some generational clock speed increases (by around 200 MHz on the base clock), the 16-core part has lower max boost clock speeds than the 3950X. Then again, the OPNs reference A0 revision, which could mean that these are engineering samples that will help AMD's ecosystem partners to build their products around these processors (think motherboard- or memory vendors), and that the retail product could come with higher clock speeds after all. We'll find out in September, when AMD is expected to debut its 4th generation Ryzen desktop processor family, around the same time NVIDIA launches GeForce "Ampere."
Sources: Igor's Lab, VideoCardz
Add your own comment

37 Comments on Possible AMD "Vermeer" Clock Speeds Hint at IPC Gain

#1
Darmok N Jalad
Vermeer is not just the name of a Baroque artist, but it also a company that makes commercial-grade wood chippers. Will the 4000 series grind up heavy work with ease? :)
Posted on Reply
#2
Jism
They need to work out that 7nm proces so that all core clocks of now boost levels should be overcome. When AMD can accomplish that, then it's pretty much bye bye intel.

It's always bin low current > high clocks OR high current > low clocks. But a FX could easily consume up to 220W and even 300W in extreme conditions easily.
Posted on Reply
#3
SIGSEGV
We'll find out in September, when AMD is expected to debut its 4th generation Ryzen desktop processor family, around the same time NVIDIA launches GeForce "Ampere."
perfect.
however, I would eagerly wait for AMD's Radeon next-gen GPUs offering.
Posted on Reply
#4
dicktracy
Get rid of the crappy glue and put 8 cores in a single CCD and it might finally slay the legendary Skylake in gaming. We all know Ampere will increase GPU threshold in high res and AMD needs to do exactly this to close the gap. Can you imagine still losing to Skylake while using 7nm+ process this time around. Put 16 cores in a single CCD with Zen 4 and that should be a beast.
Posted on Reply
#5
medi01
dicktracy
legendary Skylake in gaming
XCOM: Chimera Squad at totally reasonable 1080p resolution, while equipped with $1.3k GPU:

Posted on Reply
#6
dicktracy
medi01
XCOM: Chimera Squad at totally reasonable 1080p resolution, while equipped with $1.3k GPU:


Cool nitpick. Now let's look at multiple games in an average:







RTX 2080 Ti bottlenecks the fastest Skylake model while the fastest Zen 2 model clearly bottlenecks the RTX 2080 ti. You know Ampere is going to make Zen 2 look even worse right?
Posted on Reply
#7
efikkan
Estimates for Zen 3's IPC gains are all over the place. Some claim Zen 3 is a minor improvement, while others claim it's a major architectural overhaul, but we'll see.
Nevertheless, IPC improvements is the area to focus on going forward, and any IPC improvement is appreciated.
Jism
They need to work out that 7nm proces so that all core clocks of now boost levels should be overcome. When AMD can accomplish that, then it's pretty much bye bye intel.
Intel's rated boost clock speeds are too optimistic, and they usually throttle quite a bit when the power limit kicks in. So in reality, with high sustained load on multiple cores, AMD often matches or exceeds Intel in actual clock speeds.

I don't think AMD should be pushing too hard on unstable boost speeds, what we need is good sustained performance. AMD needs to work on the areas where they fall behind Intel, primarily the CPU front-end and memory controller latency. The CPU front-end is one of the largest area of improvement in Sunny Cove over Skylake, so AMD needs to step up here.
Posted on Reply
#8
Mouth of Sauron
High core-count CPUs will, of course, once again see their main usage in playing games in 1080p, coupled with 1200g GPU...
Posted on Reply
#9
Vayra86
"Wait darling, are you seriously saying 'more IPC' now?"



"Indeed!"

Posted on Reply
#10
Alexandrus
dicktracy
Get rid of the crappy glue and put 8 cores in a single CCD and it might finally slay the legendary Skylake in gaming. We all know Ampere will increase GPU threshold in high res and AMD needs to do exactly this to close the gap. Can you imagine still losing to Skylake while using 7nm+ process this time around. Put 16 cores in a single CCD with Zen 4 and that should be a beast.
They already do have 8 cores in a CCD, perhaps you do not know what a CCD is ;)
Posted on Reply
#11
ARF
15-20% IPC and +200 MHz betterment, because the CCX will be now with 8 cores, up from just 4, and the caches will be unified for the whole CCd.
Posted on Reply
#12
medi01
dicktracy
Now let's look at multiple games in an average:
Note the following, stranger:

1) 1080p gaming on $1.3k makes no sense. We do it, to figure "how CPUs will behave in the future". It's an arguable theory, that assumes that future performance could be deducted by testing stuff at unrealistic resolutions
2) New games show completely different behavior. Games that actually do use CPU power (lot's of AI stuff going in that game) get us to that picture which is outright embarrassing to Intel

Mkay?
Low resolution tests of archaic games are only good for easing the pain of the blue fans.
On top of low resolution tests using the fastest card money can buy being questionable on its own.
Posted on Reply
#13
BoboOOZ
medi01
Note the following, stranger:

1) 1080p gaming on $1.3k makes no sense. We do it, to figure "how CPUs will behave in the future". It's an arguable theory, that assumes that future performance could be deducted by testing stuff at unrealistic resolutions
Just to make the devil 's advocate, it makes sense for competitive FPS players. The guys that use 240+ FPS monitors like this one: www.techpowerup.com/267368/alienware-announces-aw2521h-360hz-gaming-monitor
Other than that, I agree with you that more threads will probably age better in gaming, given the architecture of next-gen consoles.
Posted on Reply
#14
Cheeseball
Not a Potato
medi01
1) 1080p gaming on $1.3k makes no sense. We do it, to figure "how CPUs will behave in the future". It's an arguable theory, that assumes that future performance could be deducted by testing stuff at unrealistic resolutions
Flawed argument. There are many that prefer 1080p at high FPS for competitive shooters (e.g. PUBG, R6S and Doom Eternal speedruns). That point would make sense if you're just playing and want to enjoy the visual fidelity.
Posted on Reply
#15
efikkan
BoboOOZ
Other than that, I agree with you that more threads will probably age better in gaming, given the architecture of next-gen consoles.
This has been predicted for many years, but people fail to understand that it's the nature of the workload which dictates how it can scale across multiple threads.

While we probably will continue to see games use a little more threads in general, this is mostly for non-rendering tasks; audio, networking, video encoding, etc. It doesn't make sense do rendering (which consists of building queues for the GPU) over more than 1-3 threads, with each thread doing its separate task like a render pass, viewport, particle simulation or resource loading. While it is technically possible to have multiple threads build a single GPU queue, the synchronization overhead would certainly kill any perceived "performance advantage".

In the next years single thread performance will continue to be important, but only to the point where the GPU is fully saturated. So with the next generations from AMD and Intel we should expect Intel's gaming advantage to shrink a bit.
Posted on Reply
#16
BoboOOZ
efikkan
This has been predicted for many years, but people fail to understand that it's the nature of the workload which dictates how it can scale across multiple threads.

While we probably will continue to see games use a little more threads in general, this is mostly for non-rendering tasks; audio, networking, video encoding, etc. It doesn't make sense do rendering (which consists of building queues for the GPU) over more than 1-3 threads, with each thread doing its separate task like a render pass, viewport, particle simulation or resource loading. While it is technically possible to have multiple threads build a single GPU queue, the synchronization overhead would certainly kill any perceived "performance advantage".
I'm a senior software engineer, so I also speak from experience. It is not the nature of the workload that dictates the scaling upon multiple threads, it is the way the software is written. 10-15 years ago software was completely monolithic, so the only way to use multiple cores or threads was to have multiple applications running at one time.

Since then, due to the plateau in frequency, in many domains, software has been written differently so that it can take advantage of massive parallelization. Of course, parallelization requires a shift in paradigm, and software, firmware and hardware advances. And it is much more difficult to write fully parallel software than monolithic, but it is feasible and it has already been done in many applications.
In gaming, until now there has not been a strong drive in this direction, because the average consumer computer thread count wasn't that high. But that is over with the PS5&co, these consoles have 8 cores clocked rather low. If games that are being written right now, would only use 2-3 cores, that means they would suck big time. So I'm pretty sure that next-gen games will be quite good at using multiple threads, and we will start feeling this in PC gaming in less than 2 year's time.
Posted on Reply
#17
seronx
efikkan
Estimates for Zen 3's IPC gains are all over the place. Some claim Zen 3 is a minor improvement, while others claim it's a major architectural overhaul, but we'll see.
Family 17h = Zen to Zen2
Family 19h = Zen3 speculatively to Zen4.

It is very much likely going to be an architectural overhaul within that of Bobcat to Jaguar overhauling at least;
www.extremetech.com/gaming/142163-amds-next-gen-bobcat-apu-could-win-big-in-notebooks-and-tablets-if-it-launches-on-time
www.techpowerup.com/180394/amd-jaguar-micro-architecture-takes-the-fight-to-atom-with-avx-sse4-quad-core

Ex:
Bobcat dual-core (14h) => two separate L2s
Jaguar dual-core (16h) => one unified L2
::
Zen2 octo-core (17h) => two separate CCXs
Zen3 octo-core (19h) => one unified CCX
Posted on Reply
#18
efikkan
BoboOOZ
I'm a senior software engineer, so I also speak from experience. It is not the nature of the workload that dictates the scaling upon multiple threads, it is the way the software is written.
<snip>
And it is much more difficult to write fully parallel software than monolithic, but it is feasible and it has already been done in many applications.
Since we're flashing credentials, so am I, with a thesis in graphics programming :)
That depends on your definition of being "fully parallel". If you have a workload of independent work chunks that can be processed without synchronization, you can scale almost linearly until you reach a bottleneck in hardware or software. This mostly applies to large workloads of independent chunks, and the overhead of thread communication is negligible because of the chunk size vs. time scale. Examples include large encoding jobs, web servers, software rendering etc.
On the opposite end of the spectrum are highly synchronized workloads, where any workload will reach the point of diminishing returns due to overhead as threading isn't free.
There is also instruction level parallelism, but that's a topic of its own.
BoboOOZ
In gaming, until now there has not been a strong drive in this direction, because the average consumer computer thread count wasn't that high. But that is over with the PS5&co, these consoles have 8 cores clocked rather low. If games that are being written right now, would only use 2-3 cores, that means they would suck big time. So I'm pretty sure that next-gen games will be quite good at using multiple threads, and we will start feeling this in PC gaming in less than 2 year's time.
These are common misconceptions, even among programmers. While games have been using more than one thread for a long time, using many threads for rendering haven't happened despite Xbox One and PS4 launching nearly 7 years ago with 8 cores.

Firstly games work on a very small time scale, e.g. 8.3ms if you want 120 Hz, there is very little room for overhead before you encounter serious stutter. Rendering with DirectX, OpenGL or Vulkan works by using API calls to build a queue for the GPU pipeline. The GPU pipeline itself isn't fully controlled by the programmer, but at certain points in the pipeline it executes programmable pieces of code called "shader programs"(the name is misleading, as it's much more than shading). While it is technically possible to have multiple GPU queues (doing different things) or even to have multiple threads cooperate building a single queue, it wouldn't make sense doing so since the API calls needs to be executed in order, so you need synchronization, and the overhead of synchronization is much more substantial than building the entire queue from a single thread. This is the reason why even after all these years of multi-core CPUs all games use 1 thread per GPU workload. Having a pool of worker threads to build a single queue makes no sense today or several years from now. If you need to offload something, you should offload the non-rendering stuff, but even then do limited synchronization, as the individual steps in a rendering lives within <1ms, which leaves very little time for constantly syncing threads to do a tiny bit of work.

As someone who has been using OpenGL and DirectX since the early 2000s, I've seen the transition from a fixed function pipeline to a gradually more programmable pipeline. The long term trend (10+ years) is to continue offloading the rendering logic to the GPU, hopefully one day achieving a completely programmable pipeline from the GPU. As we continue to take steps in that direction, the CPU will become less of a bottleneck. The need for more threads for games will be dictated by whatever non-rendering work the games needs.
Posted on Reply
#19
Punkenjoy
Few things to consider:

- It do not mean that an app is not using 100% cpu on a 8 core chip that the 2-4 extra core vs a 4 or 6 core that the additional core aren't helpful. The goal is always to run a specific list of thing in the shortest timeframe. That may be run something that won't utilise 100% of a core for the whole frame on a different core. Overall the latency is reduced, but it won't use 100% of that core.

- Having more thread in a program add an overhead that require more power to overcome. A faster cpu will be able to overcome that better than a slower one.

- Latency is still king in game. Core to core latency is still something that need to be taken into consideration. And depending on the workload, that core to core latency can be transformed in a Core to L3 cache or Core to Memory latency, slowing things down quite a bit.

- AMD FX CPU had a lot of core thread and still do reasonably well in some title with frame time consistency (meaning no big fps drop), but that do not mean at all that they can run these games faster. just smoother with lower fps. They had an hard time against an intel 2500k at the time. A 2500k can have some difficulties with frame time consistency in modern title, but still deliver better average FPS in many title.

- on that subject, a 2600k witch is very similar to a 2500k do way better in many game these days than it did at launch. remove some minor MHz differency and the main difference is going from 4core/4thread to 4core/8thread.

So to recap, in my opinion. a 3950x right now might do better in the future when game developper will be used to the 8 core / 16 thread of the next gen console (right now they are on a 8 core/8 thread slow jaguar cpu). but a newer CPU with less thread but better IPC and frequency could also do a much better job at running these games.

this is why cpu like the 3300x and the 3600 make so much sense right now. i do not think, except on very specific case that these super high end parts are really worth it. If the CPU race is restarted, spending 300 buck every 1.5 years will give better results than spending 600 bucks for 3 + years.
Posted on Reply
#20
HenrySomeone
dicktracy
Get rid of the crappy glue and put 8 cores in a single CCD and it might finally slay the legendary Skylake in gaming. We all know Ampere will increase GPU threshold in high res and AMD needs to do exactly this to close the gap. Can you imagine still losing to Skylake while using 7nm+ process this time around. Put 16 cores in a single CCD with Zen 4 and that should be a beast.
Yet that is exactly what I am expecting; granted, the gap will probably finally be in the single digits, but it will remain and what's worse (for AMD fan(bois) anyway) soon after comes Rocket Lake...
Posted on Reply
#21
BoboOOZ
efikkan
Since we're flashing credentials, so am I, with a thesis in graphics programming :)
That depends on your definition of being "fully parallel". If you have a workload of independent work chunks that can be processed without synchronization, you can scale almost linearly until you reach a bottleneck in hardware or software. This mostly applies to large workloads of independent chunks, and the overhead of thread communication is negligible because of the chunk size vs. time scale. Examples include large encoding jobs, web servers, software rendering etc.
On the opposite end of the spectrum are highly synchronized workloads, where any workload will reach the point of diminishing returns due to overhead as threading isn't free.
There is also instruction level parallelism, but that's a topic of its own.
Any large workload can be parallelized, if it's big enough, it means it can be broken into pieces that can be dealt with separately. There is overhead for splitting work between worker threads and recomposing the result, but it can be optimized so that there are still gains from distributing workloads, people have been doing this for years, in all types of applications. It simply works, although it may be complicated.
efikkan
These are common misconceptions, even among programmers. While games have been using more than one thread for a long time, using many threads for rendering haven't happened despite Xbox One and PS4 launching nearly 7 years ago with 8 cores.
Are you really trying to say that the PS5 will get by with doing most of the work on one 2GHz core?
efikkan
Firstly games work on a very small time scale, e.g. 8.3ms if you want 120 Hz, there is very little room for overhead before you encounter serious stutter.
I imagine you do realize that 120Hz translates to over 8 million clock cycles on your average AMD APU SMT core?
efikkan
Rendering with DirectX, OpenGL or Vulkan works by using API calls to build a queue for the GPU pipeline. The GPU pipeline itself isn't fully controlled by the programmer, but at certain points in the pipeline it executes programmable pieces of code called "shader programs"(the name is misleading, as it's much more than shading). While it is technically possible to have multiple GPU queues (doing different things) or even to have multiple threads cooperate building a single queue, it wouldn't make sense doing so since the API calls needs to be executed in order, so you need synchronization, and the overhead of synchronization is much more substantial than building the entire queue from a single thread. This is the reason why even after all these years of multi-core CPUs all games use 1 thread per GPU workload. Having a pool of worker threads to build a single queue makes no sense today or several years from now. If you need to offload something, you should offload the non-rendering stuff, but even then do limited synchronization, as the individual steps in a rendering lives within <1ms, which leaves very little time for constantly syncing threads to do a tiny bit of work.

As someone who has been using OpenGL and DirectX since the early 2000s, I've seen the transition from a fixed function pipeline to a gradually more programmable pipeline. The long term trend (10+ years) is to continue offloading the rendering logic to the GPU, hopefully one day achieving a completely programmable pipeline from the GPU. As we continue to take steps in that direction, the CPU will become less of a bottleneck. The need for more threads for games will be dictated by whatever non-rendering work the games needs.
I haven't looked at OpenGL code for quite a while, but I am sure that APIs have transitioned to more asynchronous workflows (because everything is becoming asynchronous these days, even good ol' HTTP), but that is not the main point here. I think what you do not understand is the fact that having all the calls to the graphic API coming from a single thread doesn't equate at all to the fact that that thread is doing all the computing.
Posted on Reply
#22
medi01
BoboOOZ
Just to make the devil 's advocate, it makes sense for competitive FPS players. The guys that use 240+ FPS monitors like this one: www.techpowerup.com/267368/alienware-announces-aw2521h-360hz-gaming-monitor
Other than that, I agree with you that more threads will probably age better in gaming, given the architecture of next-gen consoles.
That is fair enough, but note how different "niche gamers, that are into 240hz +" are from simply "gamers.
Posted on Reply
#23
BoboOOZ
medi01
That is fair enough, but note how different "niche gamers, that are into 240hz +" are from simply "gamers.
Frankly, I have no idea what percentage of the market they represent. I play a bit of Fortnite and I understand there's a small advantage at playing at very high framerates (even higher than the monitor), but I also happen to know lots of kids who play at 60Hz and play very well...
But, as usual, there's a Gaussian distribution, so it's rather normal to have the highest end of the market comprised of only 5%-10% of the total.
Posted on Reply
#24
HenrySomeone
BoboOOZ
Just to make the devil 's advocate, it makes sense for competitive FPS players. The guys that use 240+ FPS monitors like this one: www.techpowerup.com/267368/alienware-announces-aw2521h-360hz-gaming-monitor
Other than that, I agree with you that more threads will probably age better in gaming, given the architecture of next-gen consoles.
Only up to a point. For instance, 3800x might and I repeat might offer slightly better experience than let's say 10600k at the very end of both chips' usability, like around 5 years from now, but both will be struggling by then, since single thread advancements will continue to be important despite what leagues of AMD fan(boy)s would tell you. 3900x (or 3950x for that matter) will never get you better (while still objectively good enough) framerates than 9900k / 10700k though, of that I am completely certain. A fine example are old 16 core Opterons compared even with 8 core FX chips (that clocked much better), not to mention something like a 2600k.
Posted on Reply
#25
BoboOOZ
HenrySomeone
Only up to a point. For instance, 3800x might and I repeat might offer slightly better experience than let's say 10600k at the very end of both chips' usability, like around 5 years from now, but both will be struggling by then, since single thread advancements will continue to be important despite what leagues of AMD fan(boy)s would tell you. 3900x (or 3950x for that matter) will never get you better (while still objectively good enough) framerates than 9900k / 10700k though, of that I am completely certain. A fine example are old 16 core Opterons compared even with 8 core FX chips (that clocked much better), not to mention something like a 2600k.
The only up to a point I definitely agree with.

The rest is more blurry, and it depends entirely on what direction the gaming industry will take from now on. At present, we are GPU bound in most games with high settings. Developers could take the approach of shifting more load towards the CPU, and parallelize massively their code, or could choose to do the minimum necessary to see performance improvements. Since I don't have my crystal ball, I have no way to predict which will occur.
But technically, it is perfectly possible to code an application in such a way that it runs faster on a 16 core with 85% clock speed versus on a 10 core at 100% clock speed. It has already been done for many applications and it can be done for the game too. When exactly this will happen, it's hard to predict, but it has to happen if we are to play open-world games at 400 fps in the future. Your example with opterons vs fx is flawed, because is based on insufficiently parallelized applications.

The last part I am pretty sure I completely disagree, I am quite sure future improvements of processor performance will rely much more on core counts than on IPC. And clock frequencies will stagnate at best, as both Intel and AMD continue to shrink their nodes, we will have 64 cores in a few years for home computers, but we will never reach 6GHz.
Posted on Reply
Add your own comment