• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

Possible AMD "Vermeer" Clock Speeds Hint at IPC Gain

Joined
May 15, 2020
Messages
511 (1.39/day)
Location
France
System Name Home
Processor Ryzen 3600X
Motherboard MSI Tomahawk 450 MAX
Cooling Noctua NH-U14S
Memory 16GB Crucial Ballistix 3600 MHz DDR4 CAS 16
Video Card(s) MSI RX 5700XT EVOKE OC
Storage Samsung 970 PRO 512 GB
Display(s) ASUS VA326HR + MSI Optix G24C4
Case MSI - MAG Forge 100M
Power Supply Aerocool Lux RGB M 650W
Only up to a point. For instance, 3800x might and I repeat might offer slightly better experience than let's say 10600k at the very end of both chips' usability, like around 5 years from now, but both will be struggling by then, since single thread advancements will continue to be important despite what leagues of AMD fan(boy)s would tell you. 3900x (or 3950x for that matter) will never get you better (while still objectively good enough) framerates than 9900k / 10700k though, of that I am completely certain. A fine example are old 16 core Opterons compared even with 8 core FX chips (that clocked much better), not to mention something like a 2600k.
The only up to a point I definitely agree with.

The rest is more blurry, and it depends entirely on what direction the gaming industry will take from now on. At present, we are GPU bound in most games with high settings. Developers could take the approach of shifting more load towards the CPU, and parallelize massively their code, or could choose to do the minimum necessary to see performance improvements. Since I don't have my crystal ball, I have no way to predict which will occur.
But technically, it is perfectly possible to code an application in such a way that it runs faster on a 16 core with 85% clock speed versus on a 10 core at 100% clock speed. It has already been done for many applications and it can be done for the game too. When exactly this will happen, it's hard to predict, but it has to happen if we are to play open-world games at 400 fps in the future. Your example with opterons vs fx is flawed, because is based on insufficiently parallelized applications.

The last part I am pretty sure I completely disagree, I am quite sure future improvements of processor performance will rely much more on core counts than on IPC. And clock frequencies will stagnate at best, as both Intel and AMD continue to shrink their nodes, we will have 64 cores in a few years for home computers, but we will never reach 6GHz.
 
Joined
Jun 10, 2014
Messages
2,433 (0.96/day)
Any large workload can be parallelized, if it's big enough, it means it can be broken into pieces that can be dealt with separately.
Everything has a cost. If you divide it into too small work chunks you'll end up with too much overhead. There are also dependencies which requires things to be executed in sequence (like a pipeline), limiting how large work chunks you can create before synchronizing. Another implication of this is just mutation of data, and those thinking that throwing in mutexes everywhere will solve it would be ignorant, you'll quickly end up with stalled threads and even deadlocks. Additionally there is OS scheduling overhead, which quickly can add latency up to 1ms or more, which becomes significant when your entire workload lives within a few ms window and you're trying to sync up hundreds or thousands of times per frame, but not significant if you have giant work chunks taking several seconds or even minutes.

Are you really trying to say that the PS5 will get by with doing most of the work on one 2GHz core?
What? I've never said anything like that.
I pointed out that we've had 8-core consoles for nearly 7 years now. Many predicted this would cause "fully multithreaded games" within "a couple of years" back when Xbox One and PS4 launched, just like you are doing now. But it didn't happen, because rendering doesn't work that way.

I think what you do not understand is the fact that having all the calls to the graphic API coming from a single thread doesn't equate at all to the fact that that thread is doing all the computing.
I understand fully, and I never claimed so either.
As I said, having multiple threads building a single queue makes no sense. Which is why the amount of threads interfacing with the GPU is limited to the distinct tasks (rendering passes or compute workloads) which can be separated and potentially parallelized.
Additionally, the driver uses up to several thread on its side, but that's out of our control.

The rest is more blurry, and it depends entirely on what direction the gaming industry will take from now on. At present, we are GPU bound in most games with high settings.
Well, that's the goal.
Ideally, all games should be GPU bound. That way you get the graphics performance you paid for.

Developers could take the approach of shifting more load towards the CPU, and parallelize massively their code, or could choose to do the minimum necessary to see performance improvements. Since I don't have my crystal ball, I have no way to predict which will occur.
This clearly illustrates that you don't understand how games works.
GPUs have specialized hardware to do all kinds of rendering tasks, like creating verticies, tessellation, rasterization, texture mapping, etc. While all of these can technically be emulated in software, the performance would be terrible.

Offloading the GPU to the CPU would be to go backwards. And what would be the purpose? GPUs are massively powerful at dense math, while CPUs are better at logic. Games typically do a lot of the logic parts on the CPU while building a queue, then sends batches of data to the GPU which it does the best.

I wouldn't advice anyone to base anything from their crystal balls, but rather understand the field of expertise before attempting to do qualified predictions. But I can list some of the current trends;
* GPUs are becoming more flexible, and the rendering pipeline is increasingly programmable. Some games will take advantage of this, and thus become less CPU bottlenecked.
* Most games are using a few generalized game engines. These engines are increasingly bloated, and will to some extend counteract the improvements on the GPU side. Some of these may spawn a lot of threads, most of which does fairly little work at all. The ever-increasing bloat of these engines is also responsible for the lack of benefits from lower API overhead in DirectX 12.
* Resource streaming will be more utilized, but slowly.
* GPU accelerated audio will probably become common eventually.


But technically, it is perfectly possible to code an application in such a way that it runs faster on a 16 core with 85% clock speed versus on a 10 core at 100% clock speed. It has already been done for many applications and it can be done for the game too.
I'm sorry, but this is where you go wrong. An arbitrary piece of code can't be parallelized any way you want, see my first paragraph.
A workload which mostly consists of large work chunks which can be processed mostly async can scale to almost any core count. The need for synchronization increase the overhead for additional threads, so with more synchronization a workload will always have a diminishing return with increasing thread count. Games is one of the workloads which is on the "worst" end of this scale.

I am quite sure future improvements of processor performance will rely much more on core counts than on IPC. And clock frequencies will stagnate at best, as both Intel and AMD continue to shrink their nodes, we will have 64 cores in a few years for home computers, but we will never reach 6GHz.
You're right about clock speeds stagnating (at least until we have different types of semiconductors), but the way forward is a balanced approach between more cores, more superscalar and more SIMD. Many are forgetting that the performance per core is the base scaling factor for multithreaded performance. Since most non-server workloads does not scale linearly, having faster cores actually helps you scale to more cores and suffer less from OS overhead. The key here is to strike the right balance between core count and core speed for your workload.
 
Joined
May 15, 2020
Messages
511 (1.39/day)
Location
France
System Name Home
Processor Ryzen 3600X
Motherboard MSI Tomahawk 450 MAX
Cooling Noctua NH-U14S
Memory 16GB Crucial Ballistix 3600 MHz DDR4 CAS 16
Video Card(s) MSI RX 5700XT EVOKE OC
Storage Samsung 970 PRO 512 GB
Display(s) ASUS VA326HR + MSI Optix G24C4
Case MSI - MAG Forge 100M
Power Supply Aerocool Lux RGB M 650W
Everything has a cost. If you divide it into too small work chunks you'll end up with too much overhead. There are also dependencies which requires things to be executed in sequence (like a pipeline), limiting how large work chunks you can create before synchronizing. Another implication of this is just mutation of data, and those thinking that throwing in mutexes everywhere will solve it would be ignorant, you'll quickly end up with stalled threads and even deadlocks. Additionally there is OS scheduling overhead, which quickly can add latency up to 1ms or more, which becomes significant when your entire workload lives within a few ms window and you're trying to sync up hundreds or thousands of times per frame, but not significant if you have giant work chunks taking several seconds or even minutes.


What? I've never said anything like that.
I pointed out that we've had 8-core consoles for nearly 7 years now. Many predicted this would cause "fully multithreaded games" within "a couple of years" back when Xbox One and PS4 launched, just like you are doing now. But it didn't happen, because rendering doesn't work that way.


I understand fully, and I never claimed so either.
As I said, having multiple threads building a single queue makes no sense. Which is why the amount of threads interfacing with the GPU is limited to the distinct tasks (rendering passes or compute workloads) which can be separated and potentially parallelized.
Additionally, the driver uses up to several thread on its side, but that's out of our control.


Well, that's the goal.
Ideally, all games should be GPU bound. That way you get the graphics performance you paid for.


This clearly illustrates that you don't understand how games works.
GPUs have specialized hardware to do all kinds of rendering tasks, like creating verticies, tessellation, rasterization, texture mapping, etc. While all of these can technically be emulated in software, the performance would be terrible.

Offloading the GPU to the CPU would be to go backwards. And what would be the purpose? GPUs are massively powerful at dense math, while CPUs are better at logic. Games typically do a lot of the logic parts on the CPU while building a queue, then sends batches of data to the GPU which it does the best.

I wouldn't advice anyone to base anything from their crystal balls, but rather understand the field of expertise before attempting to do qualified predictions. But I can list some of the current trends;
* GPUs are becoming more flexible, and the rendering pipeline is increasingly programmable. Some games will take advantage of this, and thus become less CPU bottlenecked.
* Most games are using a few generalized game engines. These engines are increasingly bloated, and will to some extend counteract the improvements on the GPU side. Some of these may spawn a lot of threads, most of which does fairly little work at all. The ever-increasing bloat of these engines is also responsible for the lack of benefits from lower API overhead in DirectX 12.
* Resource streaming will be more utilized, but slowly.
* GPU accelerated audio will probably become common eventually.



I'm sorry, but this is where you go wrong. An arbitrary piece of code can't be parallelized any way you want, see my first paragraph.
A workload which mostly consists of large work chunks which can be processed mostly async can scale to almost any core count. The need for synchronization increase the overhead for additional threads, so with more synchronization a workload will always have a diminishing return with increasing thread count. Games is one of the workloads which is on the "worst" end of this scale.


You're right about clock speeds stagnating (at least until we have different types of semiconductors), but the way forward is a balanced approach between more cores, more superscalar and more SIMD. Many are forgetting that the performance per core is the base scaling factor for multithreaded performance. Since most non-server workloads does not scale linearly, having faster cores actually helps you scale to more cores and suffer less from OS overhead. The key here is to strike the right balance between core count and core speed for your workload.
Revival of an old thread, but here's a new game that requires 8 cores as recommended settings. We'll see more and more of these, especially in open-world games:
 
Joined
Sep 17, 2014
Messages
14,842 (6.10/day)
Location
The Washing Machine
Processor i7 8700k 4.6Ghz @ 1.24V
Motherboard AsRock Fatal1ty K6 Z370
Cooling beQuiet! Dark Rock Pro 3
Memory 16GB Corsair Vengeance LPX 3200/C16
Video Card(s) MSI GTX 1080 Gaming X @ 2100/5500
Storage Samsung 850 EVO 1TB + Samsung 830 256GB + Crucial BX100 250GB + Toshiba 1TB HDD
Display(s) Gigabyte G34QWC (3440x1440)
Case Fractal Design Define C TG
Audio Device(s) Situational :)
Power Supply EVGA G2 750W
Mouse Logitech G502 Protheus Spectrum
Keyboard Lenovo Thinkpad Trackpoint II (Best K/B ever... <3)
Software W10 x64
Revival of an old thread, but here's a new game that requires 8 cores as recommended settings. We'll see more and more of these, especially in open-world games:

If a simple shooter requires 8 cores at 3.3 Ghz I seriously question this devs' sanity, capability and overall product. They say its needed for thousands of actors. Hello, Total War wants a word?

Its also a nice way to identify shit console ports.
 
Joined
May 15, 2020
Messages
511 (1.39/day)
Location
France
System Name Home
Processor Ryzen 3600X
Motherboard MSI Tomahawk 450 MAX
Cooling Noctua NH-U14S
Memory 16GB Crucial Ballistix 3600 MHz DDR4 CAS 16
Video Card(s) MSI RX 5700XT EVOKE OC
Storage Samsung 970 PRO 512 GB
Display(s) ASUS VA326HR + MSI Optix G24C4
Case MSI - MAG Forge 100M
Power Supply Aerocool Lux RGB M 650W
If a simple shooter requires 8 cores at 3.3 Ghz I seriously question this devs' sanity, capability and overall product. They say its needed for thousands of actors. Hello, Total War wants a word?
SS is a great single player shooter, can' wait to see the new one.
Total war is a great game too, but all those soldiers do not have individual AI, the AI is at unit level. The complexity is left only to the GPU for rendering. In SS all these monsters are trying to find you and come at you.
Its also a nice way to identify shit console ports.
I'm not that sure, all the previous versions were PC first. You can play on a 4 core, too, you'll just play at 720p/30fps :D
 
Joined
Sep 17, 2014
Messages
14,842 (6.10/day)
Location
The Washing Machine
Processor i7 8700k 4.6Ghz @ 1.24V
Motherboard AsRock Fatal1ty K6 Z370
Cooling beQuiet! Dark Rock Pro 3
Memory 16GB Corsair Vengeance LPX 3200/C16
Video Card(s) MSI GTX 1080 Gaming X @ 2100/5500
Storage Samsung 850 EVO 1TB + Samsung 830 256GB + Crucial BX100 250GB + Toshiba 1TB HDD
Display(s) Gigabyte G34QWC (3440x1440)
Case Fractal Design Define C TG
Audio Device(s) Situational :)
Power Supply EVGA G2 750W
Mouse Logitech G502 Protheus Spectrum
Keyboard Lenovo Thinkpad Trackpoint II (Best K/B ever... <3)
Software W10 x64
SS is a great single player shooter, can' wait to see the new one.
Total war is a great game too, but all those soldiers do not have individual AI, the AI is at unit level. The complexity is left only to the GPU for rendering. In SS all these monsters are trying to find you and come at you.

I'm not that sure, all the previous versions were PC first. You can play on a 4 core, too, you'll just play at 720p/30fps :D

I'll give you another one

1600097769397.png

Enter the Matrix ;) Of course its obviously not fully dynamic. But I strongly doubt Serious Sam will do that with a supposedly infinite number of actors. That is why I'm saying... this does not have to be done on 8 core machines. And from the pov of being a capable game on most mainstream systems... I'd say its optimistic.

I did play the early SS's, they weren't bad at all, but very straightforward. That is why, again, this req seems so questionable. This game ran on a toaster CPU.
 
Joined
May 15, 2020
Messages
511 (1.39/day)
Location
France
System Name Home
Processor Ryzen 3600X
Motherboard MSI Tomahawk 450 MAX
Cooling Noctua NH-U14S
Memory 16GB Crucial Ballistix 3600 MHz DDR4 CAS 16
Video Card(s) MSI RX 5700XT EVOKE OC
Storage Samsung 970 PRO 512 GB
Display(s) ASUS VA326HR + MSI Optix G24C4
Case MSI - MAG Forge 100M
Power Supply Aerocool Lux RGB M 650W
I'll give you another one

View attachment 168708
Enter the Matrix ;) Of course its obviously not fully dynamic. But I strongly doubt Serious Sam will do that with a supposedly infinite number of actors. That is why I'm saying... this does not have to be done on 8 core machines. And from the pov of being a capable game on most mainstream systems... I'd say its optimistic.

I did play the early SS's, they weren't bad at all, but very straightforward. That is why, again, this req seems so questionable. This game ran on a toaster CPU.
I don't know Matrix, so I can't comment on it, but I would imagine those agents are pretty dumb, otherwise, Neo dies :cool:. But if you have next-gen games and they can run on equipment 10 years old, that's an indication that the developers aren't trying to give you more, with higher requiurement? What's the ponit of having good, new equipment, if it remains unused?
 
Joined
Jun 10, 2014
Messages
2,433 (0.96/day)
Revival of an old thread, but here's a new game that requires 8 cores as recommended settings. We'll see more and more of these, especially in open-world games:
Considering the requirements doesn't even bother listing what class of CPU, just the ambiguous "8 cores" and "3.3 GHz", they probably didn't put much thought into it. I'm pretty sure a 4/6 core Comet Lake would outperform an good old 8-core Bulldozer in this game.
 
Joined
May 15, 2020
Messages
511 (1.39/day)
Location
France
System Name Home
Processor Ryzen 3600X
Motherboard MSI Tomahawk 450 MAX
Cooling Noctua NH-U14S
Memory 16GB Crucial Ballistix 3600 MHz DDR4 CAS 16
Video Card(s) MSI RX 5700XT EVOKE OC
Storage Samsung 970 PRO 512 GB
Display(s) ASUS VA326HR + MSI Optix G24C4
Case MSI - MAG Forge 100M
Power Supply Aerocool Lux RGB M 650W
Considering the requirements doesn't even bother listing what class of CPU, just the ambiguous "8 cores" and "3.3 GHz", they probably didn't put much thought into it. I'm pretty sure a 4/6 core Comet Lake would outperform an good old 8-core Bulldozer in this game.
Bulldozer was not an 8 core, that was settled by a class-action a while ago... And given that the required GPU's are at most 2 generations old, that gives a decent ballpark to what 8 core means.
Anyway, I'm looking forward to see if the required oomph will also translate to better graphics and gameplay, or it's just a lack of optimization.
 
Joined
Sep 17, 2014
Messages
14,842 (6.10/day)
Location
The Washing Machine
Processor i7 8700k 4.6Ghz @ 1.24V
Motherboard AsRock Fatal1ty K6 Z370
Cooling beQuiet! Dark Rock Pro 3
Memory 16GB Corsair Vengeance LPX 3200/C16
Video Card(s) MSI GTX 1080 Gaming X @ 2100/5500
Storage Samsung 850 EVO 1TB + Samsung 830 256GB + Crucial BX100 250GB + Toshiba 1TB HDD
Display(s) Gigabyte G34QWC (3440x1440)
Case Fractal Design Define C TG
Audio Device(s) Situational :)
Power Supply EVGA G2 750W
Mouse Logitech G502 Protheus Spectrum
Keyboard Lenovo Thinkpad Trackpoint II (Best K/B ever... <3)
Software W10 x64
I don't know Matrix, so I can't comment on it, but I would imagine those agents are pretty dumb, otherwise, Neo dies :cool:. But if you have next-gen games and they can run on equipment 10 years old, that's an indication that the developers aren't trying to give you more, with higher requiurement? What's the ponit of having good, new equipment, if it remains unused?

The point is 9 times out of 10 you really never needed the good new equipment. its just a cost- or quality cutting measure that you pay for. Optimization and writing great software is an art form. Not everyone is talented, and lots of software is being written. Its close to being a factory product that rolls off the line and into a box.

In many cases a lack of talent, time and/or optimization is solved by iterative development. You get a game, and a day one patch to make it work. You get a patch every other week. Etc.

Make no mistake everything you see up to and including specs like these is just cold hard business, nothing else. New technology? Man, we had accurate reflections as early as Unreal 1 and given enough work on a rasterized approach we can already create scenes that rival ray traced content. Or are just ray traced content, baked in. Its 2020 and we're now thinking of automation. Why? Apparently there is an economical reality where it generates profit, or is likely to do so.

NPC's and AI are of a similar nature. The groundwork is decades old and still being iterated on. If they just took that and made it 'a lot bigger' then its easy to arrive at an 8 core requirement like this. You said it right, Total War found a trick around it. Enter the Matrix does something similar - the way that works is that every time the game picks 4-5 actors that are surrounding Neo, and makes them 'active', the rest is dancing around it creating an illusion of density. Yes, you see through it. And I guarantee you... even in SS4 with its fabulous system you will see through it. None of this is new. Dying Light for example... how many zombies exactly? Exactly. And again... that game is not CPU intensive.

Another example... that Vulkan / Mantle demo, what was it called? It did NOT melt CPUs. With tens of thousands of actors.
 
Last edited:
Joined
Jun 10, 2014
Messages
2,433 (0.96/day)
Bulldozer was not an 8 core, that was settled by a class-action a while ago...
That's not how lawsuits work. AMD settled because a settlement is cheaper than the alternative, not because the claim was correct.
Bulldozer was an 8-core design, there is no doubt about that, albeit with major "shortcomings" in the design.

And given that the required GPU's are at most 2 generations old, that gives a decent ballpark to what 8 core means.
Anyway, I'm looking forward to see if the required oomph will also translate to better graphics and gameplay, or it's just a lack of optimization.
There is usually no relation between the age of the CPU and the GPU in the recommendations. A 7 year old Haswell, or even older CPUs can still be more relevant than GPUs of a similar age, not to mention being able to compete with brand new AMD CPUs.

People are generally putting way too much thought into these recommendations. Usually they are derived from the test systems which were primarily used during development, and can sometimes be very optimistic or conservative. Look at reviews if you want to see the reality, or gamble and buy the game yourself.
 
Joined
May 15, 2020
Messages
511 (1.39/day)
Location
France
System Name Home
Processor Ryzen 3600X
Motherboard MSI Tomahawk 450 MAX
Cooling Noctua NH-U14S
Memory 16GB Crucial Ballistix 3600 MHz DDR4 CAS 16
Video Card(s) MSI RX 5700XT EVOKE OC
Storage Samsung 970 PRO 512 GB
Display(s) ASUS VA326HR + MSI Optix G24C4
Case MSI - MAG Forge 100M
Power Supply Aerocool Lux RGB M 650W
Bulldozer was an 8-core design, there is no doubt about that, albeit with major "shortcomings" in the design.
The shortcoming was that there were only 4 fp units, so for many workflows it became effectively a 4 core. It is accurate to describe it as a 4 core with 2 integer units and one fp unit per core, because I'm pretty sure they didn't make cores with 1 integer unit and half an fp unit.
There is usually no relation between the age of the CPU and the GPU in the recommendations. A 7 year old Haswell, or even older CPUs can still be more relevant than GPUs of a similar age, not to mention being able to compete with brand new AMD CPUs.
I would it say it is quite the contrary and what you are quoting is the exception, the epic Intel stagnation in the last decade
People are generally putting way too much thought into these recommendations. Usually they are derived from the test systems which were primarily used during development, and can sometimes be very optimistic or conservative. Look at reviews if you want to see the reality, or gamble and buy the game yourself.
I'm not a game developer, but definitely by the way you and a couple of other guys are describing them around the forum, they must be a bunch of lazy ignorants , who aren't even capable to monitor in the windows task manager their thread/core utilization during their testing sessions...
I will probably buy it though, we'll say what that legion mode is all about.
 
Joined
Jul 10, 2010
Messages
1,216 (0.31/day)
Location
USA, Arizona
System Name SolarwindMobile
Processor AMD FX-9800P RADEON R7, 12 COMPUTE CORES 4C+8G
Motherboard Acer Wasp_BR
Cooling It's Copper.
Memory 2 x 8GB SK Hynix/HMA41GS6AFR8N-TF
Video Card(s) ATI/AMD Radeon R7 Series (Bristol Ridge FP4) [ACER]
Storage TOSHIBA MQ01ABD100 1TB + KINGSTON RBU-SNS8152S3128GG2 128 GB
Display(s) ViewSonic XG2401 SERIES
Case Acer Aspire E5-553G
Audio Device(s) Realtek ALC255
Power Supply PANASONIC AS16A5K
Mouse SteelSeries Rival
Keyboard Ducky Channel Shine 3
Software Windows 10 Home 64-bit (Version 1607, Build 14393.969)
The shortcoming was that there were only 4 fp units, so for many workflows it became effectively a 4 core. It is accurate to describe it as a 4 core with 2 integer units and one fp unit per core, because I'm pretty sure they didn't make cores with 1 integer unit and half an fp unit.
FPU's aren't part of the core.

K7 doesn't have a FPU in the core.
K8 doesn't have a FPU in the core.
Greyhound doesn't have a FPU in the core.
Husky doesn't have a FPU in the core.
Bobcat doesn't have a FPU in the core.
Jaguar doesn't have a FPU in the core.
Zen doesn't have a FPU in the core.

The only modern design from AMD to have a FPU inside the core is this one:
d30641.gif

Single control unit, single instruction bus, single data bus, single superscalar datapath => one core.

AMD's Orochi design is more accurate to describe as four processors with two cores each. As by architect definition since before the 90s.

Retire unit (C0) & Retire unit (C1) => Two control units
Scheduler (C0) & Scheduler (C1) => Two instruction buses
Datapath (C0) & Datapath (C1) => Two datapaths
Load/Store (C0) & Load/Store (C1) => Two data buses
A Bulldozer processor is a dual-core design.

General consensus to marketing is one core in processor, just call the processor a core. In this, case AMD had two cores in a processor, and thus it is a dual-core unit.

Imagine reading a "technical" document... where ___ core contains core. When previous documents have... where ___ processor contains/builds on processor core/core.
 
Last edited:
Top