Tuesday, November 6th 2018

AMD Unveils "Zen 2" CPU Architecture and 7 nm Vega Radeon Instinct MI60 at New Horizon

AMD today held its "New Horizon" event for investors, offering guidance and "color" on what the company's near-future could look like. At the event, the company formally launched its Radeon Instinct MI60 GPU-based compute accelerator; and disclosed a few interesting tidbits on its next-generation "Zen 2" mircroarchitecture. The Instinct MI60 is the world's first GPU built on the 7 nanometer silicon fabrication process, and among the first commercially available products built on 7 nm. "Rome" is on track to becoming the first 7 nm processor, and is based on the Zen 2 architecture.

The Radeon Instinct MI60 is based on a 7 nm rendition of the "Vega" architecture. It is not an optical shrink of "Vega 10," and could have more number-crunching machinery, and an HBM2 memory interface that's twice as wide that can hold double the memory. It also features on-die logic that gives it hardware virtualization, which could be a boon for cloud-computing providers.
If you've been paying attention to our "Zen 2" coverage over the past couple of weeks, you would've read our recent article citing a Singapore-based VLSI engineer claiming that AMD could disintegrate the northbridge for its high core-count enterprise CPUs, in an attempt to make the memory I/O "truly" wide, without compromising on the idea of MCM CPU chiplets. All of that is true.

"Rome" is codename for a multi-chip module of four to eight 7 nm CPU dies, wired to a centralized die over InfinityFabric. This 14 nm die, called "I/O die," handles memory and PCIe, providing a monolithic 8-channel memory interface, overcoming the memory bandwidth bottlenecks of current-generation 4-die MCMs. The CPU dies and an I/O die probably share an interposer. Assuming each die has 8 CPU cores, "Rome" could have up to 64 cores, an 8-channel DDR4 memory interface, and a 96-lane PCI-Express gen 4.0 root-complex, per socket. If AMD has increased its core-count per CPU die, Rome's core count could be even higher.
The broader memory I/O, assuming InfinityFabric does its job, could significantly improve performance of multi-threaded workloads that can scale across as many cores as you can throw at them, utilizing a truly broader memory interface. AMD also speaks of "increased IPC," which bodes well for the client-segment. AMD has managed to increase IPC (per-core performance), with several on-die enhancements to the core design.

With "Zen" and "Zen+," AMD recognized several components on the core that could be broadened or made faster, which could bring about tangible IPC improvements. This includes a significantly redesigned front-end. Zen/Zen+ feature a front-end that's not much different than AMD's past micro-architectures. The new front-end includes an improved branch-predictor, a faster instruction prefetcher, an improved/enlarged L1 instruction cache, and an improved prefetcher cache (L2).

The number-crunching machinery, the floating point unit, also receives a massive overhaul. "Zen 2" features 256-bit FPUs, which are doubled in width compared to Zen. load/store/dispatch/retire bandwidths have been doubled over the current generation. These changes are massive. Given that even without these core-level changes, by simply improving cache latencies, AMD managed to eke out a ~3% IPC uplift with "Zen+," one can expect double-digit percentage IPC gains with "Zen 2." Higher IPC, combined with possible increased core counts, higher clock speeds, and power benefits of switching to 7 nm, complete AMD's "Zen 2" proposition. Source: Tom's Hardware
Add your own comment

57 Comments on AMD Unveils "Zen 2" CPU Architecture and 7 nm Vega Radeon Instinct MI60 at New Horizon

#26
efikkan
Imsochobo
But thankfully memory always give IPC and that'll be on Zen3 (DDR5).
How does increased memory bandwidth help IPC? (hint: it doesn't)

DDR4 supports up to 3200 MHz, Zen+ up to 2933 MHz and Intel up to 2666 MHz, all JEDEC 1.2V. But I haven't yet found any DIMMs supporting beyond 2666 MHz at 1.2V JEDEC spec.

DDR5 at 1.1V(?) is probably still far away.
Posted on Reply
#27
Tomorrow
efikkan
How does increased memory bandwidth help IPC? (hint: it doesn't)

DDR4 supports up to 3200 MHz, Zen+ up to 2933 MHz and Intel up to 2666 MHz, all JEDEC 1.2V. But I haven't yet found any DIMMs supporting beyond 2666 MHz at 1.2V JEDEC spec.

DDR5 at 1.1V(?) is probably still far away.
IPC is not a fixed number. IPC increases if user uses faster memory, overclocks etc. Looking at single threaded benchmarks the same CPU can have score that varies 20% in either direction.
Posted on Reply
#28
efikkan
Tomorrow
IPC is not a fixed number. IPC increases if user uses faster memory, overclocks etc. Looking at single threaded benchmarks the same CPU can have score that varies 20% in either direction.
IPC doesn't improve with memory bandwidth, and overclocking memory doesn't impact latency.
AMD do have some impact with their Infinity fabric tied to the memory speed, but memory speed itself doesn't impact IPC.
Posted on Reply
#29
Imsochobo
efikkan
How does increased memory bandwidth help IPC? (hint: it doesn't)

DDR4 supports up to 3200 MHz, Zen+ up to 2933 MHz and Intel up to 2666 MHz, all JEDEC 1.2V. But I haven't yet found any DIMMs supporting beyond 2666 MHz at 1.2V JEDEC spec.

DDR5 at 1.1V(?) is probably still far away.
If we increase memory frequency we can feed the cpu more thus improving performance which equals increase in IPC just like Zen+ L2? cache latency was reduced to 12 cycles from 17 cycles which improved IPC.
Or the old pentiums getting vastly better performance by utilizing cache vs no cache previously ( this is the biggest example of feeding a cpu data = more ipc, cause it was so appearant at the time)
DDR4 memory is just L4 cache for a cpu, some tasks will see no improvements as long as the entire work can fit inside IE L1 cache
A bit simplified but should tell the story :)


As for 3200mhz kits on 1.2 v
https://www.gskill.com/en/product/f4-3200c16d-16gtzr
https://www.kingston.com/dataSheets/HX432C18FBK2_32.pdf
Posted on Reply
#30
efikkan
Imsochobo
If we increase memory frequency we can feed the cpu more thus improving performance which equals increase in IPC just like Zen+ L2? cache latency was reduced to 12 cycles from 17 cycles which improved IPC.
Increasing memory bandwidth doesn't decrease memory latency, and doesn't impact cache.

Imsochobo
As for 3200mhz kits on 1.2 v

https://www.gskill.com/en/product/f4-3200c16d-16gtzr
It clearly says 3200 MHz at 1.35 V, SPD speed 2133 MHz at 1.2 V.
Imsochobo
https://www.kingston.com/dataSheets/HX432C18FBK2_32.pdf
This one seems more promising, I don't see this in the standard JEDEC configurations, but there may be more recent additions than the list in Wikipedia.
Posted on Reply
#31
Vya Domus
Didn't thought I would see a day when AMD would outclass Intel in terms of innovation on just about every front. Very impressed.

Imsochobo
If we increase memory frequency we can feed the cpu more thus improving performance which equals increase in IPC just like Zen+ L2? cache latency was reduced to 12 cycles from 17 cycles which improved IPC.
Well, there are tow metrics here. An "absolute" ideal IPC value when there are no bottlenecks in the system and a real IPC metric which can indeed be improved by higher memory bandwidth.
Posted on Reply
#32
Xzibit
AMD Preview single Epyc Rome (Air cooled, non-Overclock) vs dual socket Intel 8180M

Posted on Reply
#33
RH92
Rumors say Intel won't be using HT on Cascadelake for TDP reasons so we are potentially looking at a 48/48 vs 64/128 battle . This will be a slaughter !
Posted on Reply
#34
moproblems99
Tomorrow
IPC is not a fixed number. IPC increases if user uses faster memory, overclocks etc. Looking at single threaded benchmarks the same CPU can have score that varies 20% in either direction.
Correct me if I am wrong, but isn't IPC = instructions per cycle. Memory bandwidth and overclocking are not going to affect ipc. I suppose memory bandwidth could if the processor was ridiculously starved by the memory pipeline but I doubt any modern processors are. Overclocking is not going to increase IPC but rather it is going to increase the number of cycles...thus allowing you to do more in the same time.
Posted on Reply
#35
krykry
moproblems99
Correct me if I am wrong, but isn't IPC = instructions per cycle. Memory bandwidth and overclocking are not going to affect ipc. I suppose memory bandwidth could if the processor was ridiculously starved by the memory pipeline but I doubt any modern processors are. Overclocking is not going to increase IPC but rather it is going to increase the number of cycles...thus allowing you to do more in the same time.
People at some point started confusing IPC with single core performance. IPC is a statistic describing the amount of operations the CPU core can do at one cycle.
Although IPC does affect single core performance, it does not describe the final performance which is also affected by delays, timing, bandwidth and so on.
Posted on Reply
#36
chaosmassive
bug
Idk, I'm rather disappointed. This wasn't aimed at the consumers at all :(
threadripper, this CPU consumer can get as close as from server grade CPU
as far as I know threadripper SKU is based on epyc chips with some imperfection here and there
Posted on Reply
#37
HTC
chaosmassive
threadripper as close as consumer can get from server grade CPU
as far as I know threadripper SKU is based on epyc chips with some imperfection here and there
One imperfection is soon to be out of the way: the NUMA necessity (for chips that have more then 16 cores). There may be others that follow suite.
Posted on Reply
#38
stimpy88
Exciting times, thanks to AMD! Interesting that they made no mention of IPC increases (there has to be some)... AMD mentioned a 2X performance increase from the previous generation Epyc, but considering that Rome has X2 the core count, this 2X performance number is a given.

This makes me conclude that Rome runs at at least 10% lower clocks than Naples.

This has me so excited for Zen2. X2 the floating point performance, PCIE-4, hopefully a 10% IPC uplift, and maybe even a couple of hundred MHz clock speed uptick to close the deal! Great stuff, take my money AMD!
Posted on Reply
#39
HTC
stimpy88
Exciting times, thanks to AMD! Interesting that they made no mention of IPC increases (there has to be some)... AMD mentioned a 2X performance increase from the previous generation Epyc, but considering that Rome has X2 the core count, this 2X performance number is a given.

This makes me conclude that Rome runs at at least 10% lower clocks than Naples.

This has me so excited for Zen2. X2 the floating point performance, PCIE-4, hopefully a 10% IPC uplift, and maybe even a couple of hundred MHz clock speed uptick to close the deal! Great stuff, take my money AMD!
Could actually be even less because of TDP reasons, @ least for the 64c / 128t flagship: less core chips may likely have higher speeds, though.
Posted on Reply
#40
bug
stimpy88
Exciting times, thanks to AMD! Interesting that they made no mention of IPC increases (there has to be some)... AMD mentioned a 2X performance increase from the previous generation Epyc, but considering that Rome has X2 the core count, this 2X performance number is a given.

This makes me conclude that Rome runs at at least 10% lower clocks than Naples.

This has me so excited for Zen2. X2 the floating point performance, PCIE-4, hopefully a 10% IPC uplift, and maybe even a couple of hundred MHz clock speed uptick to close the deal! Great stuff, take my money AMD!
HTC
Could actually be even less because of TDP reasons, @ least for the 64c / 128t flagship: less core chips may likely have higher speeds, though.
Frequencies are about the last thing to be set in stone for a CPU. AMD themselves don't know at this point how fast these will go. Unlike us, they have a ballpark figure though.
Posted on Reply
#41
stimpy88
bug
Frequencies are about the last thing to be set in stone for a CPU. AMD themselves don't know at this point how fast these will go. Unlike us, they have a ballpark figure though.
Very true, and they did stress that we were not seeing final production silicon. I would love AMD to claw back some more raw MHz on top of the new architecture, so as not to come out with lower clocks than Naples, but also keep it at a nice power and TDP level.

I suppose AMD want to keep the IPC improvements under their hat, as this 2X performance number was really like a Homer Simpson DOH! moment! Anything less than 2X performance, when core count is doubled, is a regression from Naples.
Posted on Reply
#43
R0H1T
They stressed more about the power/performance characteristics of TSMC 7nm node. The final perf/W number could be higher/lower than 2x coming from Naples, since clocks aren't finalized yet & IPC is unknown.
Posted on Reply
#44
Vayra86
bug
Frequencies are about the last thing to be set in stone for a CPU. AMD themselves don't know at this point how fast these will go. Unlike us, they have a ballpark figure though.
Frequency is also the playground of product differentiation, besides XFR. And you can bet something is left in the tank if they can be competitive with a slightly lower clock.
Posted on Reply
#45
Mysteoa
WikiFM
Perhaps is not functional, who knows?
So the chiplets were true, now let's wait the reviews.
Will these Rome be compatibles with the same motherboards as current EPYC?
Lisa said it is drop in replacement, they will just need to validate the board.
Posted on Reply
#46
bug
Mysteoa
Lisa said it is drop in replacement, they will just need to validate the board.
So it's a drop-in replacement that won't work on all boards? That's not a drop-in replacement, it's just reusing the socket where possible.
This is stuff people like to bash Intel about, but since you can't see into the future, you can never really guarantee a newly released CPU will work with boards built before it existed. So you must either change the socket (even if it's not really needed) or go through this revalidation, that may still leave you unable to use your old board if VRMs aren't up to the task or whatever.
Of course, you can reuse the socket if you stick to the exact same power spec. But more often than not, you're holding back the CPU by doing so.

Bottomline, reusing a socket is more complex than it seems. I'll happily take it when possible, but I won't fault chip makers when making me change the motherboard.
Posted on Reply
#47
stimpy88
bug
So it's a drop-in replacement that won't work on all boards? That's not a drop-in replacement, it's just reusing the socket where possible.
This is stuff people like to bash Intel about, but since you can't see into the future, you can never really guarantee a newly released CPU will work with boards built before it existed. So you must either change the socket (even if it's not really needed) or go through this revalidation, that may still leave you unable to use your old board if VRMs aren't up to the task or whatever.
Of course, you can reuse the socket if you stick to the exact same power spec. But more often than not, you're holding back the CPU by doing so.

Bottomline, reusing a socket is more complex than it seems. I'll happily take it when possible, but I won't fault chip makers when making me change the motherboard.
Don't you think it's most likely a BIOS related thing? I can't imagine the power requirements of this new CPU will increase much, and only by an amount that AMD specced in the VRM design specs for Naples. If there are any boards out there that have borderline VRM designs or components, then maybe that could cause an issue.
Posted on Reply
#48
bug
stimpy88
Don't you think it's most likely a BIOS related thing? I can't imagine the power requirements of this new CPU will increase much, and only by an amount that AMD specced in the VRM design specs for Naples. If there are any boards out there that have borderline VRM designs or components, then maybe that could cause an issue.
Sometimes it can be fixed with a BIOS update. But even then, if you buy your new CPU together with an old board, not carrying the new BIOS, how do you flash it? There are workarounds, but sometimes they're simply not worth the hassle (for the manufacturer, that is).

Also, the BIOS update route tends to work between incremental updates like Sandy to Ivy Bridge. It's trickier when you need to squeeze more cores into the same power envelope with voltage and current already set in stone.
Posted on Reply
#49
Mysteoa
bug
Sometimes it can be fixed with a BIOS update. But even then, if you buy your new CPU together with an old board, not carrying the new BIOS, how do you flash it? There are workarounds, but sometimes they're simply not worth the hassle (for the manufacturer, that is).

Also, the BIOS update route tends to work between incremental updates like Sandy to Ivy Bridge. It's trickier when you need to squeeze more cores into the same power envelope with voltage and current already set in stone.
Since 7nm has 50% power reduction, adding 50% more cores gets you back to the same power envelope or close to. The other thing you need to consider is ROME has PCIeX 4 and I'm no sure how that will work with the current boards. We know Amd was already working on ROME/ ZEN2 while designing the socket so they could have made it to support 64 cores.
Posted on Reply
#50
bug
Mysteoa
Since 7nm has 50% power reduction, adding 50% more cores gets you back to the same power envelope or close to. The other thing you need to consider is ROME has PCIeX 4 and I'm no sure how that will work with the current boards. We know Amd was already working on ROME/ ZEN2 while designing the socket so they could have made it to support 64 cores.
It's also a mix of 7nm cores and 14nm IO core. I know I wouldn't want to be in the team sorting this out :P
Posted on Reply
Add your own comment