• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

Possible AMD "Vermeer" Clock Speeds Hint at IPC Gain

btarunr

Editor & Senior Moderator
Staff member
Joined
Oct 9, 2007
Messages
47,670 (7.43/day)
Location
Dublin, Ireland
System Name RBMK-1000
Processor AMD Ryzen 7 5700G
Motherboard Gigabyte B550 AORUS Elite V2
Cooling DeepCool Gammax L240 V2
Memory 2x 16GB DDR4-3200
Video Card(s) Galax RTX 4070 Ti EX
Storage Samsung 990 1TB
Display(s) BenQ 1440p 60 Hz 27-inch
Case Corsair Carbide 100R
Audio Device(s) ASUS SupremeFX S1220A
Power Supply Cooler Master MWE Gold 650W
Mouse ASUS ROG Strix Impact
Keyboard Gamdias Hermes E2
Software Windows 11 Pro
The bulk of AMD's 4th generation Ryzen desktop processors will comprise of "Vermeer," a high core-count socket AM4 processor and successor to the current-generation "Matisse." These chips combine up to two "Zen 3" CCDs with a cIOD (client I/O controller die). While the maximum core count of each chiplet isn't known, they will implement the "Zen 3" microarchitecture, which reportedly does away with CCX to get all cores on the CCD to share a single large L3 cache, this is expected to bring about improved inter-core latencies. AMD's generational IPC uplifting efforts could also include improving bandwidth between the various on-die components (something we saw signs of in the "Zen 2" based "Renoir"). The company is also expected to leverage a newer 7 nm-class silicon fabrication node at TSMC (either N7P or N7+), to increase clock speeds - or so we thought.

An Igor's Lab report points to the possibility of AMD gunning for efficiency, by letting the IPC gains handle the bulk of Vermeer's competitiveness against Intel's offerings, not clock-speeds. The report decodes OPNs (ordering part numbers) of two upcoming Vermeer parts, one 8-core and the other 16-core. While the 8-core part has some generational clock speed increases (by around 200 MHz on the base clock), the 16-core part has lower max boost clock speeds than the 3950X. Then again, the OPNs reference A0 revision, which could mean that these are engineering samples that will help AMD's ecosystem partners to build their products around these processors (think motherboard- or memory vendors), and that the retail product could come with higher clock speeds after all. We'll find out in September, when AMD is expected to debut its 4th generation Ryzen desktop processor family, around the same time NVIDIA launches GeForce "Ampere."



View at TechPowerUp Main Site
 
Vermeer is not just the name of a Baroque artist, but it also a company that makes commercial-grade wood chippers. Will the 4000 series grind up heavy work with ease? :)
 
They need to work out that 7nm proces so that all core clocks of now boost levels should be overcome. When AMD can accomplish that, then it's pretty much bye bye intel.

It's always bin low current > high clocks OR high current > low clocks. But a FX could easily consume up to 220W and even 300W in extreme conditions easily.
 
We'll find out in September, when AMD is expected to debut its 4th generation Ryzen desktop processor family, around the same time NVIDIA launches GeForce "Ampere."
perfect.
however, I would eagerly wait for AMD's Radeon next-gen GPUs offering.
 
Get rid of the crappy glue and put 8 cores in a single CCD and it might finally slay the legendary Skylake in gaming. We all know Ampere will increase GPU threshold in high res and AMD needs to do exactly this to close the gap. Can you imagine still losing to Skylake while using 7nm+ process this time around. Put 16 cores in a single CCD with Zen 4 and that should be a beast.
 
XCOM: Chimera Squad at totally reasonable 1080p resolution, while equipped with $1.3k GPU:

View attachment 155930
Cool nitpick. Now let's look at multiple games in an average:

lawl.png



ttZT2gfoskvFfXD3DfyFcF-650-80.png.png



RTX 2080 Ti bottlenecks the fastest Skylake model while the fastest Zen 2 model clearly bottlenecks the RTX 2080 ti. You know Ampere is going to make Zen 2 look even worse right?
 
Estimates for Zen 3's IPC gains are all over the place. Some claim Zen 3 is a minor improvement, while others claim it's a major architectural overhaul, but we'll see.
Nevertheless, IPC improvements is the area to focus on going forward, and any IPC improvement is appreciated.

They need to work out that 7nm proces so that all core clocks of now boost levels should be overcome. When AMD can accomplish that, then it's pretty much bye bye intel.
Intel's rated boost clock speeds are too optimistic, and they usually throttle quite a bit when the power limit kicks in. So in reality, with high sustained load on multiple cores, AMD often matches or exceeds Intel in actual clock speeds.

I don't think AMD should be pushing too hard on unstable boost speeds, what we need is good sustained performance. AMD needs to work on the areas where they fall behind Intel, primarily the CPU front-end and memory controller latency. The CPU front-end is one of the largest area of improvement in Sunny Cove over Skylake, so AMD needs to step up here.
 
"Wait darling, are you seriously saying 'more IPC' now?"

1589961301079.png


"Indeed!"

1589961365027.png
 
Get rid of the crappy glue and put 8 cores in a single CCD and it might finally slay the legendary Skylake in gaming. We all know Ampere will increase GPU threshold in high res and AMD needs to do exactly this to close the gap. Can you imagine still losing to Skylake while using 7nm+ process this time around. Put 16 cores in a single CCD with Zen 4 and that should be a beast.
They already do have 8 cores in a CCD, perhaps you do not know what a CCD is ;)
 
15-20% IPC and +200 MHz betterment, because the CCX will be now with 8 cores, up from just 4, and the caches will be unified for the whole CCd.
 
Now let's look at multiple games in an average:

Note the following, stranger:

1) 1080p gaming on $1.3k makes no sense. We do it, to figure "how CPUs will behave in the future". It's an arguable theory, that assumes that future performance could be deducted by testing stuff at unrealistic resolutions
2) New games show completely different behavior. Games that actually do use CPU power (lot's of AI stuff going in that game) get us to that picture which is outright embarrassing to Intel

Mkay?
Low resolution tests of archaic games are only good for easing the pain of the blue fans.
On top of low resolution tests using the fastest card money can buy being questionable on its own.
 
Note the following, stranger:

1) 1080p gaming on $1.3k makes no sense. We do it, to figure "how CPUs will behave in the future". It's an arguable theory, that assumes that future performance could be deducted by testing stuff at unrealistic resolutions

Just to make the devil 's advocate, it makes sense for competitive FPS players. The guys that use 240+ FPS monitors like this one: https://www.techpowerup.com/267368/alienware-announces-aw2521h-360hz-gaming-monitor
Other than that, I agree with you that more threads will probably age better in gaming, given the architecture of next-gen consoles.
 
1) 1080p gaming on $1.3k makes no sense. We do it, to figure "how CPUs will behave in the future". It's an arguable theory, that assumes that future performance could be deducted by testing stuff at unrealistic resolutions

Flawed argument. There are many that prefer 1080p at high FPS for competitive shooters (e.g. PUBG, R6S and Doom Eternal speedruns). That point would make sense if you're just playing and want to enjoy the visual fidelity.
 
Other than that, I agree with you that more threads will probably age better in gaming, given the architecture of next-gen consoles.
This has been predicted for many years, but people fail to understand that it's the nature of the workload which dictates how it can scale across multiple threads.

While we probably will continue to see games use a little more threads in general, this is mostly for non-rendering tasks; audio, networking, video encoding, etc. It doesn't make sense do rendering (which consists of building queues for the GPU) over more than 1-3 threads, with each thread doing its separate task like a render pass, viewport, particle simulation or resource loading. While it is technically possible to have multiple threads build a single GPU queue, the synchronization overhead would certainly kill any perceived "performance advantage".

In the next years single thread performance will continue to be important, but only to the point where the GPU is fully saturated. So with the next generations from AMD and Intel we should expect Intel's gaming advantage to shrink a bit.
 
This has been predicted for many years, but people fail to understand that it's the nature of the workload which dictates how it can scale across multiple threads.

While we probably will continue to see games use a little more threads in general, this is mostly for non-rendering tasks; audio, networking, video encoding, etc. It doesn't make sense do rendering (which consists of building queues for the GPU) over more than 1-3 threads, with each thread doing its separate task like a render pass, viewport, particle simulation or resource loading. While it is technically possible to have multiple threads build a single GPU queue, the synchronization overhead would certainly kill any perceived "performance advantage".

I'm a senior software engineer, so I also speak from experience. It is not the nature of the workload that dictates the scaling upon multiple threads, it is the way the software is written. 10-15 years ago software was completely monolithic, so the only way to use multiple cores or threads was to have multiple applications running at one time.

Since then, due to the plateau in frequency, in many domains, software has been written differently so that it can take advantage of massive parallelization. Of course, parallelization requires a shift in paradigm, and software, firmware and hardware advances. And it is much more difficult to write fully parallel software than monolithic, but it is feasible and it has already been done in many applications.
In gaming, until now there has not been a strong drive in this direction, because the average consumer computer thread count wasn't that high. But that is over with the PS5&co, these consoles have 8 cores clocked rather low. If games that are being written right now, would only use 2-3 cores, that means they would suck big time. So I'm pretty sure that next-gen games will be quite good at using multiple threads, and we will start feeling this in PC gaming in less than 2 year's time.
 
Estimates for Zen 3's IPC gains are all over the place. Some claim Zen 3 is a minor improvement, while others claim it's a major architectural overhaul, but we'll see.
Family 17h = Zen to Zen2
Family 19h = Zen3 speculatively to Zen4.

It is very much likely going to be an architectural overhaul within that of Bobcat to Jaguar overhauling at least;

Ex:
Bobcat dual-core (14h) => two separate L2s
Jaguar dual-core (16h) => one unified L2
::
Zen2 octo-core (17h) => two separate CCXs
Zen3 octo-core (19h) => one unified CCX
 
Last edited:
I'm a senior software engineer, so I also speak from experience. It is not the nature of the workload that dictates the scaling upon multiple threads, it is the way the software is written.
<snip>
And it is much more difficult to write fully parallel software than monolithic, but it is feasible and it has already been done in many applications.
Since we're flashing credentials, so am I, with a thesis in graphics programming :)
That depends on your definition of being "fully parallel". If you have a workload of independent work chunks that can be processed without synchronization, you can scale almost linearly until you reach a bottleneck in hardware or software. This mostly applies to large workloads of independent chunks, and the overhead of thread communication is negligible because of the chunk size vs. time scale. Examples include large encoding jobs, web servers, software rendering etc.
On the opposite end of the spectrum are highly synchronized workloads, where any workload will reach the point of diminishing returns due to overhead as threading isn't free.
There is also instruction level parallelism, but that's a topic of its own.

In gaming, until now there has not been a strong drive in this direction, because the average consumer computer thread count wasn't that high. But that is over with the PS5&co, these consoles have 8 cores clocked rather low. If games that are being written right now, would only use 2-3 cores, that means they would suck big time. So I'm pretty sure that next-gen games will be quite good at using multiple threads, and we will start feeling this in PC gaming in less than 2 year's time.
These are common misconceptions, even among programmers. While games have been using more than one thread for a long time, using many threads for rendering haven't happened despite Xbox One and PS4 launching nearly 7 years ago with 8 cores.

Firstly games work on a very small time scale, e.g. 8.3ms if you want 120 Hz, there is very little room for overhead before you encounter serious stutter. Rendering with DirectX, OpenGL or Vulkan works by using API calls to build a queue for the GPU pipeline. The GPU pipeline itself isn't fully controlled by the programmer, but at certain points in the pipeline it executes programmable pieces of code called "shader programs"(the name is misleading, as it's much more than shading). While it is technically possible to have multiple GPU queues (doing different things) or even to have multiple threads cooperate building a single queue, it wouldn't make sense doing so since the API calls needs to be executed in order, so you need synchronization, and the overhead of synchronization is much more substantial than building the entire queue from a single thread. This is the reason why even after all these years of multi-core CPUs all games use 1 thread per GPU workload. Having a pool of worker threads to build a single queue makes no sense today or several years from now. If you need to offload something, you should offload the non-rendering stuff, but even then do limited synchronization, as the individual steps in a rendering lives within <1ms, which leaves very little time for constantly syncing threads to do a tiny bit of work.

As someone who has been using OpenGL and DirectX since the early 2000s, I've seen the transition from a fixed function pipeline to a gradually more programmable pipeline. The long term trend (10+ years) is to continue offloading the rendering logic to the GPU, hopefully one day achieving a completely programmable pipeline from the GPU. As we continue to take steps in that direction, the CPU will become less of a bottleneck. The need for more threads for games will be dictated by whatever non-rendering work the games needs.
 
Few things to consider:

- It do not mean that an app is not using 100% cpu on a 8 core chip that the 2-4 extra core vs a 4 or 6 core that the additional core aren't helpful. The goal is always to run a specific list of thing in the shortest timeframe. That may be run something that won't utilise 100% of a core for the whole frame on a different core. Overall the latency is reduced, but it won't use 100% of that core.

- Having more thread in a program add an overhead that require more power to overcome. A faster cpu will be able to overcome that better than a slower one.

- Latency is still king in game. Core to core latency is still something that need to be taken into consideration. And depending on the workload, that core to core latency can be transformed in a Core to L3 cache or Core to Memory latency, slowing things down quite a bit.

- AMD FX CPU had a lot of core thread and still do reasonably well in some title with frame time consistency (meaning no big fps drop), but that do not mean at all that they can run these games faster. just smoother with lower fps. They had an hard time against an intel 2500k at the time. A 2500k can have some difficulties with frame time consistency in modern title, but still deliver better average FPS in many title.

- on that subject, a 2600k witch is very similar to a 2500k do way better in many game these days than it did at launch. remove some minor MHz differency and the main difference is going from 4core/4thread to 4core/8thread.

So to recap, in my opinion. a 3950x right now might do better in the future when game developper will be used to the 8 core / 16 thread of the next gen console (right now they are on a 8 core/8 thread slow jaguar cpu). but a newer CPU with less thread but better IPC and frequency could also do a much better job at running these games.

this is why cpu like the 3300x and the 3600 make so much sense right now. i do not think, except on very specific case that these super high end parts are really worth it. If the CPU race is restarted, spending 300 buck every 1.5 years will give better results than spending 600 bucks for 3 + years.
 
Get rid of the crappy glue and put 8 cores in a single CCD and it might finally slay the legendary Skylake in gaming. We all know Ampere will increase GPU threshold in high res and AMD needs to do exactly this to close the gap. Can you imagine still losing to Skylake while using 7nm+ process this time around. Put 16 cores in a single CCD with Zen 4 and that should be a beast.
Yet that is exactly what I am expecting; granted, the gap will probably finally be in the single digits, but it will remain and what's worse (for AMD fan(bois) anyway) soon after comes Rocket Lake...
 
Since we're flashing credentials, so am I, with a thesis in graphics programming :)
That depends on your definition of being "fully parallel". If you have a workload of independent work chunks that can be processed without synchronization, you can scale almost linearly until you reach a bottleneck in hardware or software. This mostly applies to large workloads of independent chunks, and the overhead of thread communication is negligible because of the chunk size vs. time scale. Examples include large encoding jobs, web servers, software rendering etc.
On the opposite end of the spectrum are highly synchronized workloads, where any workload will reach the point of diminishing returns due to overhead as threading isn't free.
There is also instruction level parallelism, but that's a topic of its own.
Any large workload can be parallelized, if it's big enough, it means it can be broken into pieces that can be dealt with separately. There is overhead for splitting work between worker threads and recomposing the result, but it can be optimized so that there are still gains from distributing workloads, people have been doing this for years, in all types of applications. It simply works, although it may be complicated.

These are common misconceptions, even among programmers. While games have been using more than one thread for a long time, using many threads for rendering haven't happened despite Xbox One and PS4 launching nearly 7 years ago with 8 cores.
Are you really trying to say that the PS5 will get by with doing most of the work on one 2GHz core?
Firstly games work on a very small time scale, e.g. 8.3ms if you want 120 Hz, there is very little room for overhead before you encounter serious stutter.
I imagine you do realize that 120Hz translates to over 8 million clock cycles on your average AMD APU SMT core?
Rendering with DirectX, OpenGL or Vulkan works by using API calls to build a queue for the GPU pipeline. The GPU pipeline itself isn't fully controlled by the programmer, but at certain points in the pipeline it executes programmable pieces of code called "shader programs"(the name is misleading, as it's much more than shading). While it is technically possible to have multiple GPU queues (doing different things) or even to have multiple threads cooperate building a single queue, it wouldn't make sense doing so since the API calls needs to be executed in order, so you need synchronization, and the overhead of synchronization is much more substantial than building the entire queue from a single thread. This is the reason why even after all these years of multi-core CPUs all games use 1 thread per GPU workload. Having a pool of worker threads to build a single queue makes no sense today or several years from now. If you need to offload something, you should offload the non-rendering stuff, but even then do limited synchronization, as the individual steps in a rendering lives within <1ms, which leaves very little time for constantly syncing threads to do a tiny bit of work.

As someone who has been using OpenGL and DirectX since the early 2000s, I've seen the transition from a fixed function pipeline to a gradually more programmable pipeline. The long term trend (10+ years) is to continue offloading the rendering logic to the GPU, hopefully one day achieving a completely programmable pipeline from the GPU. As we continue to take steps in that direction, the CPU will become less of a bottleneck. The need for more threads for games will be dictated by whatever non-rendering work the games needs.

I haven't looked at OpenGL code for quite a while, but I am sure that APIs have transitioned to more asynchronous workflows (because everything is becoming asynchronous these days, even good ol' HTTP), but that is not the main point here. I think what you do not understand is the fact that having all the calls to the graphic API coming from a single thread doesn't equate at all to the fact that that thread is doing all the computing.
 
That is fair enough, but note how different "niche gamers, that are into 240hz +" are from simply "gamers.
Frankly, I have no idea what percentage of the market they represent. I play a bit of Fortnite and I understand there's a small advantage at playing at very high framerates (even higher than the monitor), but I also happen to know lots of kids who play at 60Hz and play very well...
But, as usual, there's a Gaussian distribution, so it's rather normal to have the highest end of the market comprised of only 5%-10% of the total.
 
Just to make the devil 's advocate, it makes sense for competitive FPS players. The guys that use 240+ FPS monitors like this one: https://www.techpowerup.com/267368/alienware-announces-aw2521h-360hz-gaming-monitor
Other than that, I agree with you that more threads will probably age better in gaming, given the architecture of next-gen consoles.
Only up to a point. For instance, 3800x might and I repeat might offer slightly better experience than let's say 10600k at the very end of both chips' usability, like around 5 years from now, but both will be struggling by then, since single thread advancements will continue to be important despite what leagues of AMD fan(boy)s would tell you. 3900x (or 3950x for that matter) will never get you better (while still objectively good enough) framerates than 9900k / 10700k though, of that I am completely certain. A fine example are old 16 core Opterons compared even with 8 core FX chips (that clocked much better), not to mention something like a 2600k.
 
Back
Top