First Signs of AMD Zen 3 "Vermeer" CPUs Surface, Ryzen 7 5800X Tested

Freaky_Snuke · Sep 29, 2020

*RDNA 1 and Zen 3 naming scheme consumer confusion incoming*
But by the time those cpus come out RDNA 1 is gonna be mostly irrelevant anyway.

If that turns out to be true an all AMD truly high-end gaming rig might become a reality for the first time since like... decades this end of year.

Turmania · Sep 29, 2020

Just waiting for 8 core 16 thread cpu that can boost to mininum 5 ghz on all cores and stay there on air cooling solution without overclock. I would prefer if it is AMD since they have clock advantage per clock. Till then, i see no point upgrading at least for my main system.

Rahnak · Sep 29, 2020

Steevo said:
I remember the debate, as no one uses 720 unless we are comparing low power CPUs in tablets.

So feel free to use a resolution that's unused for a comparison of you feel better about it. But it's like comparing which jet fighter is better at being a submarine. Or which sports car does best off-road. Or which network switch makes the best cricket bat.

If you remember the debate, then surely you remember the reason why tests at lower resolutions are relevant. And again, I said 1080p and 1440p.

Steevo said:
Again, you addressed one point, that is meaningless. Any thoughts on core counts, memory latency, power consumption? Typically AMD gets you more overall performance for the dollar.

I addressed the only point relevant in the article. It's a gaming benchmark. At 4K. Which says very little of CPU performance and that was my point. It doesn't say anything about Zen 3's memory latency, power consumption or prices, so I have no opinion on those.

ZoneDymo · Sep 29, 2020

Freaky_Snuke said:
*RDNA 1 and Zen 3 naming scheme consumer confusion incoming*
But by the time those cpus come out RDNA 1 is gonna be mostly irrelevant anyway.

If that turns out to be true an all AMD truly high-end gaming rig might become a reality for the first time since like... decades this end of year.

Well in fairness an Intel or Nvidia truely high end gaming rig has never been a reality.

RandallFlagg · Sep 29, 2020

Rahnak said:
The fact that the benchmark was run at 4K rather than 1440p or 1080p is a little suspicious. And the fact that while having much higher cpu frames, it was still marginally behind in actual framerate in 2 out of 3.

That - why run a game at 4k as a CPU test as that gets GPU limited - and also that Ashes was developed in partnership with AMD, and originally ran much better on R9 290s than on any of that generation's Nvidia cards.

Then you have that the 10900k only lost in synthetic 'cpu framerate', it won in 2 out of 3 on actual framerate (which is what you'd actually see)...

This really looks more like a planned marketing stunt than an objective benchmark to me. We will know in a few weeks either way.

InVasMani · Sep 29, 2020

DemonicRyzen666 said:
monolithic die

Apparent AMD Ryzen 5 4650G benchmarks put it close to the 3600

It looks like the 4650G won't quite be a 'Ryzen 5 3600 killer', but it should still offer great value

www.pcgamesn.com

Ryzen 7 Pro 4750G Review: Renoir Ushers in a New Era for 7nm Desktop APUs

Zen 2, Meet 7nm Vega

www.tomshardware.com

Lower latency

AMD Ryzen 7 4700GE Memory Benchmarked: Extremely Low Latency Explains Tiny L3 Caches

AMD's 7 nm "Renoir" APU silicon, which features eight "Zen 2" CPU cores, has only a quarter of the L3 cache of the 8-core "Zen 2" CCD used in "Matisse," "Rome," and "Castle Peak" processors, with each of its two quad-core compute complexes (CCXs) featuring just 4 MB of it (compared to 16 MB per...

www.techpowerup.com

None of it seems to do anything for Ryzen.

Yeah it's a odd discrepancy at first glance 16GB vs 32GB. It would seem that TUP APISAK might've chosen that comparison to show AMD's performance with a higher density module in play to not only highlight the higher performance of the AMD chip, but also glean into memory latency playing a role with it. The highest density ram modules often require looser latency which could what is being represented here. If the performance advantages on the new Ryzen chip being portrayed here is coming from the larger ram density that would be the worst case scenario and a bit unlikely, but with a limited amount of benchmarks to compare between both chips paired with that GPU module could perhaps be the case. This could simply be the closest comparison that could be compared at present by the leaker tough to say.

RandallFlagg said:
That - why run a game at 4k as a CPU test as that gets GPU limited - and also that Ashes was developed in partnership with AMD, and originally ran much better on R9 290s than on any of that generation's Nvidia cards.

Then you have that the 10900k only lost in synthetic 'cpu framerate', it won in 2 out of 3 on actual framerate (which is what you'd actually see)...

This really looks more like a planned marketing stunt than an objective benchmark to me. We will know in a few weeks either way.

My take on it is this 4K is actually more CPU computational than 1080p, but it's a harder and less exciting to benchmark and account for. It would be interesting perhaps to place a 30FPS/45FPS/60FPS GPU limit and do some PhysX testing assigned to the CPU across 1080p up thru 8K and seeing what the scaling is ends up like and if it's linear or more non-linear. I don't see how it could be and seems it would vary and fluctuate a lot depending on the type of scene. It would be rather insightful and interesting see which things present more bottlenecks in the CPU design more for PhysX as well. Seeing just how much multi-core performance impacts PhysX would be cool a well that might show a upside to AMD's design if heavy use of PhysX can be exploited by developers. If there is advantages to the multi-core approach for stuff like PhysX it just goes to show you AMD's approach should only continue to blossom further in those area's moving forward especially true since Intel has followed suit in order to try to keep pace with it. If anything that's a clear indicator that Intel knows the vital importance of the multi-core design approach and if they simply stuck with a quad core they'd already be left in the dust. In fact I want to see how Intel's chips perform limited to 4c/8t versus AMD's latest Ryzen chips let's just see where Intel would be if they didn't grudgingly glue sh*t together at 14nm+++++++++++++++ today because of AMD.

Bansaku · Sep 29, 2020

CmdrLaw said:
Watching this intently.

Did a build for a friend recently, they wanted to go Intel and the 10850K OCing very easily @5.3 All core on 10 cores was a hell of an incentive to switch back to blue.

Does your friend pay their own power bill, because at that clock speed the CPU is pulling well over 300W! And at that speed, what does it REALLY do for his gaming experience? Gaming @ 1440P/4K 60Hz I saw little to no performance difference between my old i7 3770K and my new 3700X, despite the 4x the benchmark scores.

arbiter · Sep 30, 2020

InVasMani said:
Yeah it's a odd discrepancy at first glance 16GB vs 32GB. It would seem that TUP APISAK might've chosen that comparison to show AMD's performance with a higher density module in play to not only highlight the higher performance of the AMD chip, but also glean into memory latency playing a role with it. The highest density ram modules often require looser latency which could what is being represented here. If the performance advantages on the new Ryzen chip being portrayed here is coming from the larger ram density that would be the worst case scenario and a bit unlikely, but with a limited amount of benchmarks to compare between both chips paired with that GPU module could perhaps be the case. This could simply be the closest comparison that could be compared at present by the leaker tough to say.

That assumes they wouldn't use most expensive ram for their side and cheapest brand for the other. AMD has in the history pull shenanigans with their benchmark releases so i would say this isn't outside the realm of possible to happen. The benchmark doesn't tell us what timings used and mhz the ram is running at so.

InVasMani · Sep 30, 2020

arbiter said:
That assumes they wouldn't use most expensive ram for their side and cheapest brand for the other. AMD has in the history pull shenanigans with their benchmark releases so i would say this isn't outside the realm of possible to happen. The benchmark doesn't tell us what timings used and mhz the ram is running at so.

It's a unofficial benchmark comparison it really doesn't matter at this point and pricing between both could change at any point between now and launch. I get what you're alluding to and yeah obviously memory latency and density can skew perceptions and AMD has pulled shenanigans as has Intel and Nvidia. It's a common industry trend they all do it. Wait til things are verified and the dust settles. I'm sure I'll be satisfied with Zen 3 to be honest it certainly can't be any worse than Zen 2 which itself isn't bad.

Deleted member 185088 · Sep 30, 2020

efikkan said:
Just because a piece of software runs better on one CPU doesn't mean it's optimized for it, it could be that a the hardware just handles the workload better due to resource balancing and advantages of that architecture, advantages which usually are hard or impossible to exploit directly from software.

You can target the strengths of one architure and the program will run faster on it.
This guy made different workloads and run them on a Phenom and i7 8th gen, even though the phenom is so ol it's still faster in some:

I find it hard to believe that game engines don't do that at least to some extent.

RandallFlagg · Sep 30, 2020

Xex360 said:
You can target the strengths of one architure and the program will run faster on it.
This guy made different workloads and run them on a Phenom and i7 8th gen, even though the phenom is so ol it's still faster in some:

I find it hard to believe that game engines don't do that at least to some extent.

Yep, and it's also possible to do the reverse - design hardware to run specific instructions or even a specific sequence of instructions very quickly. You could target your CPU to a use case where you have multiple threads doing the exact same thing to different parts of a large data set where said threads did not need to interact with each others data set much.

For example, Cinebench.

arbiter · Sep 30, 2020

RandallFlagg said:
Yep, and it's also possible to do the reverse - design hardware to run specific instructions or even a specific sequence of instructions very quickly. You could target your CPU to a use case where you have multiple threads doing the exact same thing to different parts of a large data set where said threads did not need to interact with each others data set much.

For example, Cinebench.

The game they used had direct AMD funding for a lot of it and when you look at player charts that game only gets 60-70 players avg so really not good metric to use a game no one plays. As other guy said you can code things for a certain cpu and get great results. Apple used to do that same thing back when they used PowerPC processors to make them look better then PC x86 machines.

DemonicRyzen666 · Sep 30, 2020

wait what ? it's fake lol

AMD Ryzen 7 5800X Zen 3 Benchmark Leak Shows Big IPC Gains

An upcoming AMD Zen 3 CPU has made a cameo in Ashes of the Singurality with very interesting results.

hothardware.com

does anyone else think that single core score is FAR to low ?

RandallFlagg · Sep 30, 2020

arbiter said:
The game they used had direct AMD funding for a lot of it and when you look at player charts that game only gets 60-70 players avg so really not good metric to use a game no one plays. As other guy said you can code things for a certain cpu and get great results. Apple used to do that same thing back when they used PowerPC processors to make them look better then PC x86 machines.

I know, I agree 100%. For people who know the history of Ashes (I was one of the pre-release buyers) it is one of the most suspect benchmarks. What was particularly embarrassing for AMD in regards to Ashes was how despite their partnership in creating the game and the use of the AMD Vulkan API, when Pascal (10xx series) came out they got obliterated in Ashes anyway.

Looking beyond the surface and clickbait article titles of this "leak" - if Zen 3 is so good, why is an AMD co-sponsored title being used at 4k for pre-release hype and still losing in actual FPS 2/3 of the time? And why are both recent leaks - one on 5700U a week ago and now this one on 5800X - for that *same* AMD sponsored title which very few play regularly? Why not use something a bit more mainstream at settings that don't go GPU limited? Hmmm.....

InVasMani · Sep 30, 2020

Parallel single threading seems entirely plausible phase the clock skew peaks and dips on two chips and synchronize oscillation switching between one and the other. You should get 100% increase in performance with two chips like that in theory, but clock skew frequency oscillation is always in constant motion so you move from peaks to dips so with the switching in mind to maximize both you end up 50% in the best case scenario though synchronizing and sequencing it might not be 100% perfect so could be closer to 48%. I don't know if they can execute it perfectly in practice, but in theory it's defiantly within the scope of possibilities. You can actually mimic that with a pair of music sequencers it's functionally possible.

I mentioned the concept of it in the Intel bigLITTLE TPU thread not that far back you can basically manipulate clock skew or cycle duties in a clever manner in theory to get more performance by manipulating it in a similar fashion to what was done with by MOS Technology with the SID chip for the arpeggio's to simulate playing chords with polyphony it was a clever hardware trick at the time. It seems far fetched and somewhat unimaginable to actually be applied, but innovation always is you have to think outside the box or you'll always been stuck in a box.

This is a quadruple LFO what is allegedly being done is twin LFO if you look at the intersection points that's half a cycle duty rising and falling voltages/frequencies. If you look at the blue and green or yellow and purple they intersect perfectly. What's being done is a switching at the intersection cross section so you've got two valley peaks closer together and the base of the mountain so to speak isn't as far downward. That's assuming this is in fact being done and put into practice by AMD. I see it within the oscilloscope of possibilities for certain. That's basically what DDR memory did in practice. Big question is if they can pull it off within the dynamic complexity of software. Then again why can't they!!? Can't see what they can't divert it like a rail road track at that crossroad intersection point. That nets you a roughly 50% performance gain with 4 chips the valley dips would be reduce more and the peaks would happen more routinely and you'd end up with 100% more performance I think that's what DDR5 is suppose to do actually on the data rate hence the phrase quad data rate.

Thinking about it further I really don't see a problem with the I/O die managing that type of load switching in real time quickly and the data would already be present in the CPU memory it's not like it gets flushed instantly. Yeah maybe it could become a bit of a materialized reality. If not now certainly later. I have to think AMD will incorporate a I/O for the GPU soon as well if they want to pursue multi-chip GPU's.

PanicLake · Sep 30, 2020

arbiter said:
The game they used had direct AMD funding for a lot of it and when you look at player charts that game only gets 60-70 players avg so really not good metric to use a game no one plays. As other guy said you can code things for a certain cpu and get great results. Apple used to do that same thing back when they used PowerPC processors to make them look better then PC x86 machines.

Yes but if you compare the 5800X with the "old" 3800X, it is still a big improvement...

Crazy 4K Batch	Ryzen 7 5800X	Ryzen 7 3800X	Core i9-10900K
Normal	167fps	125fps	136fps
Medium	135fps	111fps	119fps
Heavy	110fps	87fps	96fps

RandallFlagg · Sep 30, 2020

PanicLake said:
Yes but if you compare the 5800X with the "old" 3800X, it is still a big improvement...

Crazy 4K Batch Ryzen 7 5800X Ryzen 7 3800X Core i9-10900K
Normal 167fps 125fps 136fps
Medium 135fps 111fps 119fps
Heavy 110fps 87fps 96fps

That should be "CPU Framerate" not "FPS".

If this were a car, what you are doing would be like calculating 0-60 time based on engine HP and car weight, while ignoring *actual* 0-60 time. No one does that. In real FPS Ryzen 3800X loses all 3 and 5800X loses 2 out of 3.

arbiter · Sep 30, 2020

RandallFlagg said:
That should be "CPU Framerate" not "FPS".

If this were a car, what you are doing would be like calculating 0-60 time based on engine HP and car weight, while ignoring *actual* 0-60 time. No one does that. In real FPS Ryzen 3800X loses all 3 and 5800X loses 2 out of 3.

Even all those numbers say amd is faster but then you look at "avg (all batches)" Intel win's cpu frame rate still and has a 5900 score vs 5800 of amd IN a benchmark that is known to favor AMD. So to me those numbers in general mean NOTHING. They need to get a Benchmark that isn't slanted instead of a game that is pretty much a glorfied tech demo for their hardware.

DemonicRyzen666 · Sep 30, 2020

RandallFlagg said:
That should be "CPU Framerate" not "FPS".

If this were a car, what you are doing would be like calculating 0-60 time based on engine HP and car weight, while ignoring *actual* 0-60 time. No one does that. In real FPS Ryzen 3800X loses all 3 and 5800X loses 2 out of 3.

They already do that for cars when they build them it's called estimated 0-60 times, and they do it with computers simulations.

Hell some cars are so fast they don't even do 0-60 mph anymore they do 0-100mph.

arbiter said:
Even all those numbers say amd is faster but then you look at "avg (all batches)" Intel win's cpu frame rate still and has a 5900 score vs 5800 of amd IN a benchmark that is known to favor AMD. So to me those numbers in general mean NOTHING. They need to get a Benchmark that isn't slanted instead of a game that is pretty much a glorfied tech demo for their hardware.

How is it AMD glorified if intel is winning ?

Where do you see 5900x ?, this 5800X 8 core 16 thread vs 10 core 20 thread.

The game is suppose to be really good at using multi thread it even shows the Threadripper 3960x is quite good on it

efikkan · Sep 30, 2020

Xex360 said:
You can target the strengths of one architure and the program will run faster on it.
This guy made different workloads and run them on a Phenom and i7 8th gen, even though the phenom is so ol it's still faster in some:

I find it hard to believe that game engines don't do that at least to some extent.

Sure, down to single instructions can be slightly faster or slower on various architectures. In my tests, I've seen some cases where Haswell is slower than Sandy Bridge, but in most cases it's faster. The problem here is that this is a benchmark of a single operation in a loop, this is a synthetic test case which will exaggerate the real world difference. The reason why he runs the loop 1.000.000.000.000 times is to get a measurable difference. Also, it's not like these operations are different alternatives to solve the same problem. It's not unlikely that you can find older architectures which can do certain simple operations like this faster, while modern architectures are optimized for saturating several execution ports and doing a mix of various types of operations. This is why such benchmarks can be very misguiding.

When doing real optimization of code, it's common to benchmark whole algorithms or larger pieces of code to see the real world difference of different approaches. It's very rare that you'll find a larger piece of code that performs much better on Skylake and a competing alternative which performs much better on let's say Zen 2. Any difference that you'll find for single instructions will be less important than the overall improvements of the architecture. And it's not like there will be an "Intel optimization", Intel has changed the resource balancing for every new architecture, so has AMD.

Interestingly the sample code in that video scales poorly with many cores, but should be able to scale nearly linearly if the work queue is implemented smarter.

InVasMani said:
Parallel single threading seems entirely plausible phase the clock skew peaks and dips on two chips and synchronize oscillation switching between one and the other. <snip>

Instruction level parallelism is already heavily used, there is no need to spread the ALUs, FPUs, etc. across several cores, the distance would make a synchronization nightmare. We should expect future architectures to continue to scale their superscalar abilities. But I don't doubt that someone will find a clever way to utilize "idle transistors" in some of these by manipulating clock cycles etc.

The problem with superscalar scaling is keeping execution units fed. Both Intel and AMD currently have four integer pipelines. Integer pipelines are cheap (both in transistors and power usage), so why not double or quadruple them? Because they would struggle to utilize them properly. Both of them have been increasing instruction windows with every generation to try to exploit more parallelism, and Intel's next gen Sapphire Rapids/Golde Cove is allegedly featuring a massive 800 entry instruction window (Skylake has 224, Sunny Cove 352 for comparison). And even with these massive CPU front-ends, execution units are generally under-utilized due to branch mispredictions and cache misses. Sooner or later the ISA needs to improve to help the CPU, which should be theoretically possible, as the compiler has much more context than is passed on through the x86 ISA, as well as eliminating more branching.

dragontamer5788 · Sep 30, 2020

efikkan said:
Sooner or later the ISA needs to improve to help the CPU, which should be theoretically possible, as the compiler has much more context than is passed on through the x86 ISA, as well as eliminating more branching.

I'm not sure how much a compiler can help:

Code:

if(blah()){
    foo();
} else {
    bar();
}

The above is the easy case. There's lots of pattern matching and heuristics that help the pipelines figure out if foo() needs to be shoved into the pipelines, or if bar() needs to be shoved into the pipelines (while calculating blah() in parallel).

Now consider the following instead:

Code:

for(int i=0; i<array.size(); i++){
    array[i]->virtualFunctionCall();
}

You simply can't "branch predict" the virtualFunctionCall() much better than what we're doing today. Today, there are ~4 or 5 histories stored into the Branch Target Buffer (BTB), so the most common 3 or 4 classes will have their virtualFunctionCall() successfully branch-predicted without much issue. There are also 3 levels of branch predictor pattern-matchers running in parallel, giving the CPU three different branch targets (L1 branch predictor is fastest but least accurate. L3 branch predictor is most accurate but almost the slowest: only slightly faster than a mispredicted branch).

This demonstrates the superiority of runtime information (if there's only 2 or 3 classes in the array[], the CPU will branch predict the virtualFunctionCall() pretty well). The compiler cannot make any assumptions about the contents of array.

---------

By the way: most "small branches" are compiled into CMOV sequences on x86, no branch at all.

--------------

The only things being done grossly different seem to be the GPU architectures, which favor no branch prediction at all, and instead just focus on wider-and-wider SMT to fill their pipelines (and non-uniform branches are very, very inefficient because of thread divergence. Uniform branches are efficient on both CPUs and GPUs, because CPUs will branch-predict a uniform branch while GPUs will not have any divergence). Throughput vs Latency strikes again: GPUs can optimize throughput but CPUs must optimize latency to be competitive.

efikkan · Sep 30, 2020

dragontamer5788 said:
You simply can't "branch predict" the virtualFunctionCall() much better than what we're doing today.

Of course not, you will never be able to do that, that's not what I meant.
I was thinking of branching logic inside a single scope, like a lot of ifs in a loop. Compilers already turn some of these into branchless alternatives, but I'm sure there is more potential here, especially if the ISA could express dependencies so the CPU could do things out of order more efficiently and hopefully some day limit the stalls in the CPU. As you know, with ever more superscalar CPUs, the relative cost of a cache miss or branch misprediction is growing.
Ideally code should be free of unnecessary branching, and there are a lot of clever tricks with and without AVX, which I believe we have discussed previously.

But about your virtual function calls. If your critical path is filled with virtual function calls and multiple levels of inheritance, you're pretty much screwed performance wise, no compiler will be able to untangle this at compile time. And in most cases (at least how most programmers use OOP), these function calls can't be statically analysed, inlined or dereferenced at compile time.

arbiter · Sep 30, 2020

DemonicRyzen666 said:
How is it AMD glorified if intel is winning ?

Look up history of the game, it was funded by AMD. it means it will Over perform on amd hardware vs what would happen in other games that aren't coded for 1 side.

DemonicRyzen666 said:
Where do you see 5900x ?, this 5800X 8 core 16 thread vs 10 core 20 thread.

Read what i said i never said 5900x. Go back to OP images where it shows the 2 cpu's on right side with summary. There is 2 numbers that are Score that which intel cpu scored 5900 points and amd cpu scored 5800. How could amd win with higher fps but lower score?

DemonicRyzen666 · Oct 1, 2020

@ arbiter oh I missed that, because everyone was comparing Cpu frame rates.

@ efikkan I kept hearing him talk about switching in that video. I remember somethings about that is why AMD multi threading always ended feeling more responsive then Intel. It was something about Hitting ALT tab in windows while gaming, it just seems to be quicker at odd stuff like that.

@dragontammer5877 There are some benches that show there is some bottleneck with zen 2. Everyone says it's it's infinity fabric. The best way to get around the Infinity fabric bottleneck would be to add another link. If's it's only one link, because sometimes you got that lowly 3300X getting up in-between things like the 3900x and 3950x. We know that is usually, because it's a single CCX. Then again If the 3900x is ahead that would put it down to it having a larger cache ratio to cores.

InVasMani · Oct 1, 2020

efikkan said:
Sure, down to single instructions can be slightly faster or slower on various architectures. In my tests, I've seen some cases where Haswell is slower than Sandy Bridge, but in most cases it's faster. The problem here is that this is a benchmark of a single operation in a loop, this is a synthetic test case which will exaggerate the real world difference. The reason why he runs the loop 1.000.000.000.000 times is to get a measurable difference. Also, it's not like these operations are different alternatives to solve the same problem. It's not unlikely that you can find older architectures which can do certain simple operations like this faster, while modern architectures are optimized for saturating several execution ports and doing a mix of various types of operations. This is why such benchmarks can be very misguiding.

When doing real optimization of code, it's common to benchmark whole algorithms or larger pieces of code to see the real world difference of different approaches. It's very rare that you'll find a larger piece of code that performs much better on Skylake and a competing alternative which performs much better on let's say Zen 2. Any difference that you'll find for single instructions will be less important than the overall improvements of the architecture. And it's not like there will be an "Intel optimization", Intel has changed the resource balancing for every new architecture, so has AMD.

Interestingly the sample code in that video scales poorly with many cores, but should be able to scale nearly linearly if the work queue is implemented smarter.

Instruction level parallelism is already heavily used, there is no need to spread the ALUs, FPUs, etc. across several cores, the distance would make a synchronization nightmare. We should expect future architectures to continue to scale their superscalar abilities. But I don't doubt that someone will find a clever way to utilize "idle transistors" in some of these by manipulating clock cycles etc.

The problem with superscalar scaling is keeping execution units fed. Both Intel and AMD currently have four integer pipelines. Integer pipelines are cheap (both in transistors and power usage), so why not double or quadruple them? Because they would struggle to utilize them properly. Both of them have been increasing instruction windows with every generation to try to exploit more parallelism, and Intel's next gen Sapphire Rapids/Golde Cove is allegedly featuring a massive 800 entry instruction window (Skylake has 224, Sunny Cove 352 for comparison). And even with these massive CPU front-ends, execution units are generally under-utilized due to branch mispredictions and cache misses. Sooner or later the ISA needs to improve to help the CPU, which should be theoretically possible, as the compiler has much more context than is passed on through the x86 ISA, as well as eliminating more branching.

Couldn't AMD take chip dies and use the I/O die modulate them much like system memory for double data rate or quadruple data rate to speed up single thread performance. They'd each retain their own cache so that itself is a perk of modulating between them in synchronized way controlled thru the I/O die to complete single thread task load. For all intents and purposes the CPU would behave as if it's a single faster chip. It could basically fill the L1 cache on one then swap to the next die and same with the L2 and L3 caches. In fact they synchronize each much like numerous latency timings. On top of that if you need multi-thread performance it could have some type of first serve access priority possibly based on condition criteria. It could be a bit like the windows setting for foreground/background tasks with time slices between single thread performance and multi-threaded performance that the I/O die manages and takes advantage of when it really need the multi-threaded performance.

The cache misses defiantly are harsh when they happen, but wouldn't automatically cycle modulating the individual L1/L2/L3 caches in different chip dies through the I/O die get around that? Cycle between the ones available basically. Perhaps they only do it with larger L2/L3 cache's though I mean maybe it doesn't make enough practical sense with the L1 cache being so small and switch times and such. Perhaps in a future design at some level or another I don't know.

Something else on the I/O die doing modulation switching between cores or die's at the core level in particular they could it based on poll chips and which ever can precision boosts the highest select that one for the single thread performance then poll it again after a set period and select whichever core gave the best results again and keep doing that approach. Basically no matter what it could always try to select the highest boost speed to optimize the single thread performance. Perhaps it does that between cores and die's as well so if one gets a little hot let it cool off while making use of the coolest die though switching between those might be less intermittent.

System Name	Megaporto
Processor	i7-8700
Motherboard	MSI H310M Pro-D
Cooling	Xilence I250PWM
Memory	2 x 8 GB Crucial DDR4-2400 MHz @ 2666 MHz / ID: CT8G4DFS824A.C8FE
Video Card(s)	KFA2 RTX 2080 Super EX ("Galax" being named KFA2 in EU) @+70MHz Core / +1400 MHz mem
Storage	SSD_1: ADATA SU800 256 GB / SSD_2: Samsung 860 QVO 1 TB / HDD_1: Toshiba DT01ACA100 1 TB
Display(s)	BenQ EL2870U / 4K / 60 Hz / TN
Case	Intertech Q2 Illuminator Blue (modified by Megaport: +branding / +red instead of blue fans)
Audio Device(s)	Bose Companion Series III
Power Supply	Corsair CX750M
Mouse	Hama Mirano (Black)
Keyboard	VicTsing Model PC116A
Benchmark Scores	3D Mark Time Spy: 10827 total / 11885 GPU / 7198 CPU Cinebench R20 (multi / single) : 3405 / 456

System Name	Blackbox
Processor	AMD Ryzen 7 3700X
Motherboard	Asus TUF B550-Plus WiFi
Cooling	Scythe Fuma 2
Memory	2x8GB DDR4 G.Skill FlareX 3200Mhz CL16
Video Card(s)	MSI RTX 3060 Ti Gaming Z
Storage	Kingston KC3000 1TB + WD SN550 1TB + Samsung 860 QVO 1TB
Display(s)	LG 27GP850-B
Case	Lian Li O11 Air Mini
Audio Device(s)	Logitech Z200
Power Supply	Seasonic Focus+ Gold 750W
Mouse	Logitech G305
Keyboard	MasterKeys Pro S White (MX Brown)
Software	Windows 10
Benchmark Scores	It plays games.

System Name	Cyberline
Processor	Intel Core i7 2600k -> 12600k
Motherboard	Asus P8P67 LE Rev 3.0 -> Gigabyte Z690 Auros Elite DDR4
Cooling	Tuniq Tower 120 -> Custom Watercoolingloop
Memory	Corsair (4x2) 8gb 1600mhz -> Crucial (8x2) 16gb 3600mhz
Video Card(s)	AMD RX480 -> RX7800XT
Storage	Samsung 750 Evo 250gb SSD + WD 1tb x 2 + WD 2tb -> 2tb MVMe SSD
Display(s)	Philips 32inch LPF5605H (television) -> Dell S3220DGF
Case	antec 600 -> Thermaltake Tenor HTCP case
Audio Device(s)	Focusrite 2i4 (USB)
Power Supply	Seasonic 620watt 80+ Platinum
Mouse	Elecom EX-G
Keyboard	Rapoo V700
Software	Windows 10 Pro 64bit

System Name	Legion
Processor	i7-12700KF
Motherboard	Asus Z690-Plus TUF Gaming WiFi D5
Cooling	Arctic Liquid Freezer 2 240mm AIO
Memory	PNY MAKO DDR5-6000 C36-36-36-76
Video Card(s)	PowerColor Hellhound 6700 XT 12GB
Storage	WD SN770 512GB m.2, Samsung 980 Pro m.2 2TB
Display(s)	Acer K272HUL 1440p / 34" MSI MAG341CQ 3440x1440
Case	Montech Air X
Power Supply	Corsair CX750M
Mouse	Logitech MX Anywhere 25
Keyboard	Logitech MX Keys
Software	Lots

System Name	Darkside
Processor	R7 3700X
Motherboard	Aorus Elite X570
Cooling	Deepcool Gammaxx l240
Memory	Thermaltake Toughram DDR4 3600MHz CL18
Video Card(s)	Gigabyte RX Vega 64 Gaming OC
Storage	ADATA & WD 500GB NVME PCIe 3.0, many WD Black 1-3TB HD
Display(s)	Samsung C27JG5x
Case	Thermaltake Level 20 XL
Audio Device(s)	iFi xDSD / micro iTube2 / micro iCAN SE
Power Supply	EVGA 750W G2
Mouse	Corsair M65
Keyboard	Corsair K70 LUX RGB
Benchmark Scores	Not sure, don't care

Processor	i7-13700k
Motherboard	Asus Tuf Gaming z790-plus
Cooling	Coolermaster Hyper 212 RGB
Memory	Corsair Vengeance RGB 32GB DDR5 7000mhz
Video Card(s)	Asus Dual Geforce RTX 4070 Super ( 2800mhz @ 1.0volt, ~60mhz overlock -.1volts. 180-190watt draw)
Storage	1x Samsung 980 Pro PCIe4 NVme, 2x Samsung 1tb 850evo SSD, 3x WD drives, 2 seagate
Display(s)	Acer Predator XB273u 27inch IPS G-Sync 165hz
Power Supply	Corsair RMx Series RM850x (OCZ Z series PSU retired after 13 years of service)
Mouse	Logitech G502 hero
Keyboard	Logitech G710+

System Name	S.L.I + RTX research rig
Processor	Ryzen 7 5800X 3D.
Motherboard	MSI MEG ACE X570
Cooling	Corsair H150i Cappellx
Memory	Corsair Vengeance pro RGB 3200mhz 16Gbs
Video Card(s)	2x Dell RTX 2080 Ti in S.L.I
Storage	Western digital Sata 6.0 SDD 500gb + fanxiang S660 4TB PCIe 4.0 NVMe M.2
Display(s)	HP X24i
Case	Corsair 7000D Airflow
Power Supply	EVGA G+1600watts
Mouse	Corsair Scimitar
Keyboard	Cosair K55 Pro RGB

Processor	AMD Ryzen 9 5900X \|\|\| Intel Core i7-3930K
Motherboard	ASUS ProArt B550-CREATOR \|\|\| Asus P9X79 WS
Cooling	Noctua NH-U14S \|\|\| Be Quiet Pure Rock
Memory	Crucial 2 x 16 GB 3200 MHz \|\|\| Corsair 8 x 8 GB 1333 MHz
Video Card(s)	MSI GTX 1060 3GB \|\|\| MSI GTX 680 4GB
Storage	Samsung 970 PRO 512 GB + 1 TB \|\|\| Intel 545s 512 GB + 256 GB
Display(s)	Asus ROG Swift PG278QR 27" \|\|\| Eizo EV2416W 24"
Case	Fractal Design Define 7 XL x 2
Audio Device(s)	Cambridge Audio DacMagic Plus
Power Supply	Seasonic Focus PX-850 x 2
Mouse	Razer Abyssus
Keyboard	CM Storm QuickFire XT
Software	Ubuntu

First Signs of AMD Zen 3 "Vermeer" CPUs Surface, Ryzen 7 5800X Tested

Deleted member 185088

Guest