Thursday, November 10th 2022

AMD Launches 4th Gen EPYC "Genoa" Zen 4 Server Processors: 100% Performance Uplift for 50% More Cores

AMD at a special media event titled "together we advance_data centers," formally launched its 4th generation EPYC "Genoa" server processors based on the "Zen 4" microarchitecture. These processors debut an all new platform, with modern I/O connectivity that includes PCI-Express Gen 5, CXL, and DDR5 memory. The processors come in CPU core-counts of up to 96-core/192-thread. There are as many as 18 processor SKUs, differentiated not just in CPU core-counts, but also the way the the cores are spread across the up to 12 "Zen 4" chiplets (CCDs). Each chiplet features up to 8 "Zen 4" CPU cores, depending on the model; up to 32 MB of L3 cache, and is built on the 5 nm EUV process at TSMC. The CCDs talk to a centralized server I/O die (sIOD), which is built on the 6 nm process.

The processors AMD is launching today are the EPYC "Genoa" series, targeting general purpose servers, although they can be deployed in large cloud data-centers, too. To large-scale cloud providers such as AWS, Azure, and Google Cloud, AMD is readying a different class of processor, codenamed "Bergamo," which is plans to launch later. In 2023, the company will launch the "Genoa-X" line of processor for technical-compute and HPC applications, which benefit from large on-die caches, as they feature the 3D Vertical Cache technology. There will also be "Siena," a class of EPYC processors targeting the telecom and edge-computing markets, which could see an integration of more Xilinx IP.
The EPYC "Genoa" processor, as we mentioned, comes in core-counts of up to 96-core/192-thread, dominating the 40-core/80-thread counts of the 3rd Gen Xeon Scalable "Ice Lake-SP," and also staying ahead of the 60-core/120-thread counts of the upcoming Xeon Scalable "Sapphire Rapids." The new AMD processor also sees a significant buff of its I/O capabilities, featuring a 12-channel (24 sub-channel) DDR5 memory interface, and a gargantuan 160-lane PCI-Express Gen 5 interface (that's ten Gen 5 x16 slots running at full bandwidth). and platform support for CXL and 2P xGMI links by subtracting some of those multipurpose lanes.
The new 6 nm server I/O die (sIOD) has a significantly higher transistor count than the 12 nm one powering past-gen EPYC processors. The high transistor count is due to two large 80-lane configurable SERDES (serializer-deserializer) components, which can be made to put out PCIe Gen 5 lanes, CXL 1.1 lanes, SATA 6 Gbps ports, or even the inter-socket Infinity Fabric enabling 2P platforms. The processor supports up to 64 CXL 1.1 lanes that can be used to connect to networked memory-pooling devices. 3rd generation Infinity Fabric connects the various components inside the sIOD, the sIOD to the twelve "Zen 4" CCDs via IFOP, and as an inter-socket interconnect. The processor features a 12-channel (24 x 40-bit sub-channels) memory interface, which supports up to 6 TB of ECC DDR5-4800 memory per socket. The latest generation Secure Processor provides SEV-SNP (secure nested paging), and AES-256-XTS, for a larger number of secure VMs.
Each of the 5 nm CPU complex dies (CCDs) is physically identical to the ones you find in Ryzen 7000-series "Raphael" desktop processors. It packs 8 "Zen 4" CPU cores, each with 1 MB of dedicated L2 cache, and 32 MB of L3 cache shared among the 8 cores. Each "Zen 4" core provides a 14% generational performance uplift compared to "Zen 3," with clock-speed kept constant. Much of this uplift comes from updates to the core's Front-end and Load/store unit, while the branch predictor, larger L2 cache, and execution engine, make smaller contributions. The biggest generational change is the ISA, which sees the introduction of support for the AVX-512 instruction-set, VNNI, and bfloat16. The new instruction sets should accelerate AVX-512 math workloads, as well as accelerate performance with AI applications. AMD says that its AVX-512 implementation is more die-efficient compared to Intel's, as it is using existing 256-bit wide FPU in a double-pumped fashion to enable 512-bit operations.
AMD is launching a total of 18 processor SKUs today, all meant for the Socket SP5 platform. It follows the nomenclature as described in the slide below. EPYC is the top-level brand, "9" is the product series. The next digit indicates core-count, with "0" denoting 8 cores, "1" denoting 16, "2" denoting 24, "3" denoting 32, "4" denoting 48, "5" being 64, and "6" being 84-96. The next digit denotes performance on a 1-10 scale. The last digit is actually a character, which could either be "P" or "F," with P denoting 2P-capable SKUs, and "F" denoting special SKUs that focus on fewer cores per CCD to improve per-core performance. The configurable TDP of all SKUs is rated up to 400 W, which seems high, but one should take into account the CPU core-count, and the impact it has on the number of server blades per rack. This is one of the reason AMD isn't scaling beyond 2 sockets per server. The company's core-density translates into 67% fewer servers, 52% less power.
In terms of performance, AMD only has Intel's dated 3rd Gen Xeon Scalable "Ice Lake-SP" processors for comparison, since "Sapphire Rapids" is still unreleased. With core-counts equalized, the 16-core EPYC 9174F is shown being 47% faster than the Xeon Gold 6346; the 32-core EPYC 9374F is 55% faster than the Xeon Platinum 8362; and the 48-core EPYC 9474F is 51% faster than the 40-core Xeon Platinum 8380. The same test group also sees 58-96% floating-point performance leadership in favor of AMD.

The complete slide-deck follows.
Add your own comment

22 Comments on AMD Launches 4th Gen EPYC "Genoa" Zen 4 Server Processors: 100% Performance Uplift for 50% More Cores

#1
Tek-Check
An onslaught of slides. Need a few hours to digest this news.
Posted on Reply
#3
zlobby
Given how Zen4 fares by far, it's a safe bet these will be monsters!
Tek-CheckAn onslaught of slides. Need a few hours to digest this news.
Them endnotes, though.
Posted on Reply
#5
Hofnaerrchen
zlobbyGiven how Zen4 fares by far, it's a safe bet these will be monsters!
They'd rather be. Desktop CPU sales are down and AM5 still is to expensive and I doubt it will change in the near future. The launch of 7600/7700 non-X will not change the problem of high motherboard and RAM prices.
Posted on Reply
#6
CapNemo72
I think that AMD has a big stock of 5000 series CPUs so is not very aggressive with 7000 series pricing. Once those stocks are gone, they will probably start to lower their prices.
By that time, there will be cheaper motherboards and DDR5 should go down in price too (I am aiming to get 64Gb DDR5 / 6000).

As for Epyc, now let's hope that OEMs will be pushing them more.
Posted on Reply
#8
Wirko
That guy in the blue Ferrari, he might need to fit larger rearview mirrors to it very soon.
Posted on Reply
#10
Tek-Check
AnotherReaderAs expected, these are monsters that'll probably increase AMD's server market share.
Not probably, but surely. Conservative prediction is 23-25% server market penetration by the end of next year. And this comes on the top of ARM's entry into the game. ARM is predicted to take 8-9% by Q4 2023. So, Intel's share is being eaten by two companies. See bellow.

Performance efficiency is the mantra in server now. Why? Well, if your company can save millions every year on electricity bills, it's no brainer what to do. In 5-6 years, 2017-2023, Intel is on track to lose ~30% of server market share. It's a massive and rapid shift.

Posted on Reply
#11
Minus Infinity
Can someone explain why v-cache for Epyc is being touted for HPC, but in Zen3 it only seemed to benefit gaming. I know their must be non-gaming software that surely will benefit but TechP doesn't seem to have anything in their benchmarks. I would be far more tempted to get a 7900X3D for example if I saw tangible gains in productivity apps like COMSOL, Ansys, other physics/chemistry simulations where currently Raptor Lake is much stronger than Zen 4 in general.
Posted on Reply
#12
Wirko
Minus InfinityCan someone explain why v-cache for Epyc is being touted for HPC, but in Zen3 it only seemed to benefit gaming. I know their must be non-gaming software that surely will benefit but TechP doesn't seem to have anything in their benchmarks. I would be far more tempted to get a 7900X3D for example if I saw tangible gains in productivity apps like COMSOL, Ansys, other physics/chemistry simulations where currently Raptor Lake is much stronger than Zen 4 in general.
There are some Epyc 7003 X3D benchmarks out there, like this one at Phoronix. Some of the results are impressive.
Posted on Reply
#13
evernessince
160 lanes of integrated IO? I want that on the consumer end. Leaves space on the board for plenty of PCIe and M.2 slots.
Posted on Reply
#14
Patriot
evernessince160 lanes of integrated IO? I want that on the consumer end. Leaves space on the board for plenty of PCIe and M.2 slots.
Only if you use 3 links instead of 4 between the cpus. 128-160 lanes depending on configuration.
Posted on Reply
#15
Minus Infinity
WirkoThere are some Epyc 7003 X3D benchmarks out there, like this one at Phoronix. Some of the results are impressive.
Cheers, very informative. I see OpenFoam loves cache. Given Zen 4 v-cache runs cooler and faster and there will be minimal clock speed regression this time around, Zen 4 x3d models should be very strong and at least for gaming wipe the floor with RL.
Posted on Reply
#16
Wirko
evernessince160 lanes of integrated IO? I want that on the consumer end. Leaves space on the board for plenty of PCIe and M.2 slots.
I won't comment on CPUs but given the price increases on the consumer end, a good Zen 3 Epyc board by Supermicro has become as cheap as an average X670E board.
Posted on Reply
#17
Patriot
WirkoI won't comment on CPUs but given the price increases on the consumer end, a good Zen 3 Epyc board by Supermicro has become as cheap as an average X670E board.
You can actually get a Gen3 H11 board+rome 16core off ebay for mid 500s. YMMV
Personally... I have an H12 for my Milan. :)
Posted on Reply
#18
Jism
evernessince160 lanes of integrated IO? I want that on the consumer end. Leaves space on the board for plenty of PCIe and M.2 slots.
They multiply the number based on the additional CCD added to the chip. You cant get so many lanes for a regular desktop CPU unless you opt for threadripper.
Posted on Reply
#20
Wirko
dgianstefaniBruh.

I didn't see that (even if I often catch missspellings), however, the increased L2 and L3 latency I did notice. Doubled size may be an excuse for L2 but what about L3? And it will probably be 4 cycles more for the 3D cache die.
Posted on Reply
#21
Xajel
I wonder how Zen4 based Threadripper will be.

Will it be based on the same socket as SP5 but repackaged for TR? liike TR5?

Or will it be smaller, target 64Cores and 8Channels max?

Will they have versions with AI, ML & FPGA chiplets there as well or these might come with Zen5?
Posted on Reply
#22
Wirko
XajelI wonder how Zen4 based Threadripper will be.

Will it be based on the same socket as SP5 but repackaged for TR? liike TR5?

Or will it be smaller, target 64Cores and 8Channels max?
There's an 80% probability that AMD will screw everything up. They are so good at that.
XajelWill they have versions with AI, ML & FPGA chiplets there as well or these might come with Zen5?
It's also possible that even the generally available Epycs won't have any special-purpose chiplets. Just the semi-custom models.
Posted on Reply
Add your own comment
May 4th, 2024 03:18 EDT change timezone

New Forum Posts

Popular Reviews

Controversial News Posts