Intel Xe HPC Architecture Detailed, Has Dual-Use as Compute and Cloud-Gaming Accelerator

btarunr · Aug 20, 2021

Intel's Xe HPC (high performance compute) architecture powers the company's most powerful vector compute device to date, codenamed "Ponte Vecchio." The processor is designed for massive HPC and AI compute applications, but also features raster graphics and real-time raytracing hardware, giving it a dual-use as a cloud gaming GPU. Our Xe HPG architecture article covers the basics of how Intel is laying its client discrete GPUs out. The Xe HPC architecture both scales-up and scales-out from that. The Xe-core, the basic indivisible sub-unit, of the Xe HPC architecture is different from that of Xe HPG. While Xe HPG cores contain sixteen 256-bit vector engines alongside sixteen 1024-bit matrix engines; the Xe HPC cores features eight 512-bit vector engines, besides eight 4096-bit matrix engines. It also features higher load/store throughput, and a larger 512 KB L1 cache.

The Xe HPC core vector unit is designed for full FP64 performance, of 256 ops per clock, which is identical to its FP32 throughput. It also offers 512 ops/clock FP16. The matrix unit, on the other hand, packs a punch—2,048 TF32 ops/cycle, up to 4,096 FP16 and BFloat16 ops/cycle, and 8,192 INT8 ops/cycle. Things get interesting as we scale up from here. The Xe HPC Slice is a grouping of 16 Xe HPC cores, along with 16 dedicated Raytracing Units that are just as capable as the ones on the Xe HPG (calculating ray traversal, bounding box intersection, and triangle intersection). The Xe HPC Slice cumulatively has 8 MB of L1 cache on its own.

A Xe HPC compute tile, or Xe HPC Stack, contains four such Xe HPC Slices, 64 Xe HPC cores, 64 Raytracing Units, 4 hardware contexts, sharing a large 144 MB L2 cache. The uncore components include a PCI-Express 5.0 x16 interface, a 4096-bit wide HBM2E memory interface, a media-acceleration engine with fixed-function hardware to accelerate decode (and possibly encode) of popular video formats, and Xe Link, an interconnect designed to interface with up to 8 other Xe HPC dual-stacks, for a total of up to 16 stacks. Each dual-stack uses a low-latency stack-to-stack interconnect. A dual-stack hence ends up to 128 Xe HPC cores, 128 Raytracing Units, two media engines, and an 8192-bit wide HBM2E interface. The dual-stack is a relevant grouping here, as the "Ponte Vecchio" processor features two compute tiles (two Xe HPG Stacks), and eight HBM2E memory stacks.

It's important to note here, that the Xe HPC Slices sit in specialized dies called compute tiles that are fabricated TSMC's 5 nm N5 node, whie the rest of the hardware sits on a base die that's built on the Intel 7 node (10 nm Enhanced SuperFin). The two dies are Foveros-stacked with 36-micron bumps. The Xe Link tile is a separate piece of silicon dedicated for networking with neighboring packages. This die is built on TSMC 7 nm node, and consists mainly of SerDes (serializer-deserializer) components.

Each "Ponte Vecchio" OAM with two Xe HPC stacks (one MCM) a combined memory bandwidth of over 5 TB/s, and Xe Link connectivity bandwidth of over 2 TB/s. A "Ponte Vecchio" x4 Subsystem holds four such OAMs, and is designed for a 1U node with two Xeon "Sapphire Rapids" processors. The four "Ponte Vecchio" and two "Sapphire Rapids" packages are each liquid-cooled. Hardware is only part of the story, Intel is investing considerably on OneAPI, a collective programming environment for both the CPU and GPU.

View at TechPowerUp Main Site

ZoneDymo · Aug 20, 2021

honestly if Intel does not deliver after spending this much time and effort and money on marketing (hype machine) then.....idk man could you ever believe them about anything anymore?

Vya Domus · Aug 20, 2021

These are going to be ludicrously expensive, even for Intel.

prtskg · Aug 20, 2021

Are we having an Intel spam festival celebrating slides?

Deleted member 24505 · Aug 20, 2021

ZoneDymo said:
honestly if Intel does not deliver after spending this much time and effort and money on marketing (hype machine) then.....idk man could you ever believe them about anything anymore?

And if they do deliver, boy will some people on here have to eat their words with relish

ZoneDymo · Aug 20, 2021

Gruffalo.Soldier said:
And if they do deliver, boy will some people on here have to eat their words with relish

Not sure which words you are specifically refering to, most comments here are about being sick of "leaks" and "teases" all meant to fuel some hypemachine, they just want to see some actual product and what it actually can do.
Nothing that I can see about expecting this to be good or not.

Deleted member 24505 · Aug 20, 2021

ZoneDymo said:
Not sure which words you are specifically refering to, most comments here are about being sick of "leaks" and "teases" all meant to fuel some hypemachine, they just want to see some actual product and what it actually can do.
Nothing that I can see about expecting this to be good or not.

The usual jibes at Intel. Mebbe we should wait to see what these new GPU's are like before judging

Richards · Aug 20, 2021

It beats nvidia's a100 massively.. how are nvidia gonna respond to this.. this high level engineering from intel combining different process nodes

Punkenjoy · Aug 20, 2021

I would be so hyped by a GPU make in Intel Fabs that could compete with current lineup. Even if it's not against the top end GPU and had bad power consumption. It would just bring in the markets tons of FAB capacity and could maybe satisfy the demand and really challenge quickly AMD and Nvidia.

But with GPU fabricated at TSMC, i am Meh. Well it's good to have a third vendor, but they will all end up competing to buy the same fab capacity and in the end, it's the customer that will pay the bill...

v12dock · Aug 20, 2021

This sure sounds a lot like the brainchild of Raja... Vega.

z1n0x · Aug 20, 2021

Pinktulips7 said:
AMD FANBOYS/GIRLS BETTER RUN!!! TIME IS UP....intel will destroy AMD and NVIDEA OUT OF THE WATER......

DUDE YOU ARE FULL SHI$

You are full of Cra$

QFP. This is just too good.

medi01 · Aug 20, 2021

On a serious note, I never understood what's so cool about a100.

When it's about very well planned massive number crunching, isn't it, cough, straighforward (by the respective industry's standards

) to implement?

ratirt · Aug 24, 2021

I'm sure Intel will deliver something it is just hard to guess what exactly that would be at this point. Another player in the gaming market is welcome.

System Name	RBMK-1000
Processor	AMD Ryzen 7 5700G
Motherboard	Gigabyte B550 AORUS Elite V2
Cooling	DeepCool Gammax L240 V2
Memory	2x 16GB DDR4-3200
Video Card(s)	Galax RTX 4070 Ti EX
Storage	Samsung 990 1TB
Display(s)	BenQ 1440p 60 Hz 27-inch
Case	Corsair Carbide 100R
Audio Device(s)	ASUS SupremeFX S1220A
Power Supply	Cooler Master MWE Gold 650W
Mouse	ASUS ROG Strix Impact
Keyboard	Gamdias Hermes E2
Software	Windows 11 Pro

System Name	Cyberline
Processor	Intel Core i7 2600k -> 12600k
Motherboard	Asus P8P67 LE Rev 3.0 -> Gigabyte Z690 Auros Elite DDR4
Cooling	Tuniq Tower 120 -> Custom Watercoolingloop
Memory	Corsair (4x2) 8gb 1600mhz -> Crucial (8x2) 16gb 3600mhz
Video Card(s)	AMD RX480 -> RX7800XT
Storage	Samsung 750 Evo 250gb SSD + WD 1tb x 2 + WD 2tb -> 2tb MVMe SSD
Display(s)	Philips 32inch LPF5605H (television) -> Dell S3220DGF
Case	antec 600 -> Thermaltake Tenor HTCP case
Audio Device(s)	Focusrite 2i4 (USB)
Power Supply	Seasonic 620watt 80+ Platinum
Mouse	Elecom EX-G
Keyboard	Rapoo V700
Software	Windows 10 Pro 64bit

System Name	Good enough
Processor	AMD Ryzen R9 7900 - Alphacool Eisblock XPX Aurora Edge
Motherboard	ASRock B650 Pro RS
Cooling	2x 360mm NexXxoS ST30 X-Flow, 1x 360mm NexXxoS ST30, 1x 240mm NexXxoS ST30
Memory	32GB - FURY Beast RGB 5600 Mhz
Video Card(s)	Sapphire RX 7900 XT - Alphacool Eisblock Aurora
Storage	1x Kingston KC3000 1TB 1x Kingston A2000 1TB, 1x Samsung 850 EVO 250GB , 1x Samsung 860 EVO 500GB
Display(s)	LG UltraGear 32GN650-B + 4K Samsung TV
Case	Phanteks NV7
Power Supply	GPS-750C

System Name	Cyberline
Processor	Intel Core i7 2600k -> 12600k
Motherboard	Asus P8P67 LE Rev 3.0 -> Gigabyte Z690 Auros Elite DDR4
Cooling	Tuniq Tower 120 -> Custom Watercoolingloop
Memory	Corsair (4x2) 8gb 1600mhz -> Crucial (8x2) 16gb 3600mhz
Video Card(s)	AMD RX480 -> RX7800XT
Storage	Samsung 750 Evo 250gb SSD + WD 1tb x 2 + WD 2tb -> 2tb MVMe SSD
Display(s)	Philips 32inch LPF5605H (television) -> Dell S3220DGF
Case	antec 600 -> Thermaltake Tenor HTCP case
Audio Device(s)	Focusrite 2i4 (USB)
Power Supply	Seasonic 620watt 80+ Platinum
Mouse	Elecom EX-G
Keyboard	Rapoo V700
Software	Windows 10 Pro 64bit

Processor	6700K
Motherboard	M8G
Cooling	D15S
Memory	16GB 3k15
Video Card(s)	2070S
Storage	850 Pro
Display(s)	U2410
Case	Core X2
Audio Device(s)	ALC1150
Power Supply	Seasonic
Mouse	Razer
Keyboard	Logitech
Software	22H2

Intel Xe HPC Architecture Detailed, Has Dual-Use as Compute and Cloud-Gaming Accelerator

btarunr

Editor & Senior Moderator

ZoneDymo

Vya Domus

prtskg

Deleted member 24505

Guest

ZoneDymo

Deleted member 24505

Guest

Richards

Punkenjoy

v12dock

Block Caption of Rainey Street

z1n0x

medi01

ratirt

System Name	M3401 notebook
Processor	5600H
Motherboard	NA
Memory	16GB
Video Card(s)	3050
Storage	500GB SSD
Display(s)	14" OLED screen of the laptop
Software	Windows 10
Benchmark Scores	3050 scores good 15-20% lower than average, despite ASUS's claims that it has uber cooling.

System Name	Bro2
Processor	Ryzen 5800X
Motherboard	Gigabyte X570 Aorus Elite
Cooling	Corsair h115i pro rgb
Memory	32GB G.Skill Flare X 3200 CL14 @3800Mhz CL16
Video Card(s)	Powercolor 6900 XT Red Devil 1.1v@2400Mhz
Storage	M.2 Samsung 970 Evo Plus 500MB/ Samsung 860 Evo 1TB
Display(s)	LG 27UD69 UHD / LG 27GN950
Case	Fractal Design G
Audio Device(s)	Realtec 5.1
Power Supply	Seasonic 750W GOLD
Mouse	Logitech G402
Keyboard	Logitech slim
Software	Windows 10 64 bit