AMD 64-core EPYC "Milan" Based on "Zen 3" Could Ship with 3.00 GHz Clocks

btarunr · Jul 13, 2020

AMD's 3rd generation EPYC line of enterprise processors that leverage the "Zen 3" microarchitecture, could innovate in two directions - towards increasing performance by doing away with the CCX (compute complex) multi-core topology; and taking advantage of a newer/refined 7 nm-class node to increase clock-speeds. Igor's Lab decoded as many as three OPNs of the upcoming 3rd gen EPYC series, including a 64-core/128-thread part that ships with frequency of 3.00 GHz. The top 2nd gen EPYC 64-core part, the 7662, ships with 2.00 GHz base frequency and 3.30 GHz boost; and 225 W TDP. AMD is expected to unveil its "Zen 3" microarchitecture within 2020.

View at TechPowerUp Main Site

Crackong · Jul 13, 2020

Base 1.6 Boost 3.0 ?

Vya Domus · Jul 13, 2020

Obviously, since this is stepping A0 it means it's the first tape out, meaning it's a waste of time to look at clock speeds.

Anyway, Zen 3 it's clearly on it's way to the EPYC line, meanwhile still no Xeon competitor in sight.

Rares · Jul 13, 2020

Epyc Xeon destroyer

Dredi · Jul 13, 2020

btarunr said:
towards increasing performance by doing away with the CCX

The CCX is just going to be 8 cores instead of 4. They cannot ”do away with the CCX” unless they go for monolithic dies.

Steevo · Jul 13, 2020

Dredi said:
The CCX is just going to be 8 cores instead of 4. They cannot ”do away with the CCX” unless they go for monolithic dies.

Considering the latency penalty for the CCX and node refinement they could put the CCX on multiple dies and stack them together at higher speed.

Dredi · Jul 13, 2020

Steevo said:
Considering the latency penalty for the CCX and node refinement they could put the CCX on multiple dies and stack them together at higher speed.

They would still have CCX based architecture unless they unify the L3 cache. CCX = group of processor cores that share L3 cache.

Punkenjoy · Jul 13, 2020

Dredi said:
They would still have CCX based architecture unless they unify the L3 cache. CCX = group of processor cores that share L3 cache.

Yes this is true. But the main thing about CCX is how Inter-core communication is done. In a CCX, the core to core communication can be done directly where between CCX, the data have to go thru the Infinity Fabric to the I/O die back to the Other CCX.

This lead to greatly increased latency. If we ignore thermal, frequency and other stuff, a CPU with 2x 4+0 CCD or chiplet will behave the same way as a 4+4 single CCD/chiplet.

A full die is considered a CCD. Since not only you remove the requirement to use the Infinity Fabric but also merge the L3 cache, you end up removing at all the concept of CCX to only keep a 8 core CCD. Altought there, it's more playing with the words. The whole concept of CCX is to split a CCD. if the CCD is no longer split, what is the CCX?

Don't really matter in the end but the good things we will get rid of the inter-core latency between CCX. That is the main thing.

A Shared L3 cache will probably help if the latency isn't too much affected. The problem with bigger cache is the larger it is, the longer it take to perform the cache lookup, hence reducing the latency.

The rumors are that AMD will use a new techology using hash for managing larger cache. We will see.

For workload like Blender render, Cinebench or other highly multithreadable application, the inter-core latency have probably low impact because there isn't much to do anyway. But for things like video game that use a lot of core, it's very probable that there will be significative gains from that change alone.

Dredi · Jul 13, 2020

Punkenjoy said:
Yes this is true. But the main thing about CCX is how Inter-core communication is done. In a CCX, the core to core communication can be done directly where between CCX, the data have to go thru the Infinity Fabric to the I/O die back to the Other CCX.

This lead to greatly increased latency. If we ignore thermal, frequency and other stuff, a CPU with 2x 4+0 CCD or chiplet will behave the same way as a 4+4 single CCD/chiplet.

A full die is considered a CCD. Since not only you remove the requirement to use the Infinity Fabric but also merge the L3 cache, you end up removing at all the concept of CCX to only keep a 8 core CCD. Altought there, it's more playing with the words. The whole concept of CCX is to split a CCD. if the CCD is no longer split, what is the CCX?

Don't really matter in the end but the good things we will get rid of the inter-core latency between CCX. That is the main thing.

A Shared L3 cache will probably help if the latency isn't too much affected. The problem with bigger cache is the larger it is, the longer it take to perform the cache lookup, hence reducing the latency.

The rumors are that AMD will use a new techology using hash for managing larger cache. We will see.

For workload like Blender render, Cinebench or other highly multithreadable application, the inter-core latency have probably low impact because there isn't much to do anyway. But for things like video game that use a lot of core, it's very probable that there will be significative gains from that change alone.

CCX is a memory topology concept while CCD is a physical component. Both can be of the same size and them being the same size does in no way invalidate the concept of CCX. Of cource for a 8 core gaming pc this means no more dealing with CCX peculiarities, but for this news items 64 core EPYCs it makes only a small-ish difference of 8 CCX’s vs 16. The change has more to do with effectively having more cache as less of data needs to be copied into multiple L3 caches in some workloads. Also for single/low threaded tasks you effectively double the amount of cache available.

System Name	RBMK-1000
Processor	AMD Ryzen 7 5700G
Motherboard	ASUS ROG Strix B450-E Gaming
Cooling	DeepCool Gammax L240 V2
Memory	2x 8GB G.Skill Sniper X
Video Card(s)	Palit GeForce RTX 2080 SUPER GameRock
Storage	Western Digital Black NVMe 512GB
Display(s)	BenQ 1440p 60 Hz 27-inch
Case	Corsair Carbide 100R
Audio Device(s)	ASUS SupremeFX S1220A
Power Supply	Cooler Master MWE Gold 650W
Mouse	ASUS ROG Strix Impact
Keyboard	Gamdias Hermes E2
Software	Windows 11 Pro

System Name	Personal Gaming Rig
Processor	Ryzen 7800X3D
Motherboard	MSI X670E Carbon
Cooling	MO-RA 3 420
Memory	32GB 6000MHz
Video Card(s)	RTX 4090 ICHILL FROSTBITE ULTRA
Storage	4x 2TB Nvme
Display(s)	Samsung G8 OLED
Case	Silverstone FT04

System Name	Good enough
Processor	AMD Ryzen R9 7900 - Alphacool Eisblock XPX Aurora Edge
Motherboard	ASRock B650 Pro RS
Cooling	2x 360mm NexXxoS ST30 X-Flow, 1x 360mm NexXxoS ST30, 1x 240mm NexXxoS ST30
Memory	32GB - FURY Beast RGB 5600 Mhz
Video Card(s)	Sapphire RX 7900 XT - Alphacool Eisblock Aurora
Storage	1x Kingston KC3000 1TB 1x Kingston A2000 1TB, 1x Samsung 850 EVO 250GB , 1x Samsung 860 EVO 500GB
Display(s)	LG UltraGear 32GN650-B + 4K Samsung TV
Case	Phanteks NV7
Power Supply	GPS-750C

System Name	Compy 386
Processor	7800X3D
Motherboard	Asus
Cooling	Air for now.....
Memory	64 GB DDR5 6400Mhz
Video Card(s)	7900XTX 310 Merc
Storage	Samsung 990 2TB, 2 SP 2TB SSDs and over 10TB spinning
Display(s)	56" Samsung 4K HDR
Audio Device(s)	ATI HDMI
Mouse	Logitech MX518
Keyboard	Razer
Software	A lot.
Benchmark Scores	Its fast. Enough.

AMD 64-core EPYC "Milan" Based on "Zen 3" Could Ship with 3.00 GHz Clocks

Editor & Senior Moderator