• We've upgraded our forums. Please post any issues/requests in this thread.

Haswell to Use 4th-Level On-Package Cache to Boost Graphics Performance

btarunr

Editor & Senior Moderator
Staff member
Joined
Oct 9, 2007
Messages
34,335 (9.22/day)
Likes
17,427
Location
Hyderabad, India
System Name Long shelf-life potato
Processor Intel Core i7-4770K
Motherboard ASUS Z97-A
Cooling Xigmatek Aegir CPU Cooler
Memory 16GB Kingston HyperX Beast DDR3-1866
Video Card(s) 2x GeForce GTX 970 SLI
Storage ADATA SU800 512GB
Display(s) Samsung U28D590D 28-inch 4K
Case Cooler Master CM690 Window
Audio Device(s) Creative Sound Blaster Recon3D PCIe
Power Supply Corsair HX850W
Mouse Razer Abyssus 2014
Keyboard Microsoft Sidewinder X4
Software Windows 10 Pro Creators Update
#1
Intel is making serious efforts to boost CPU-integrated graphics performance using homegrown architectures, without having to borrow/license any technologies from the other two major players in the PC graphics business that have technological edges over Intel, and hence make high-performance discrete-GPUs (NVIDIA and AMD). Intel's architecture that succeeds Ivy Bridge, codenamed Haswell, will be at the receiving-end of a significant advancement in GPU performance.

Show full news post
 
Joined
Oct 30, 2008
Messages
1,530 (0.46/day)
Likes
377
System Name Lailalo / Edelweiss
Processor FX 8320 @ 4.5Ghz / i7 3610QM @2.3-3.2Ghz
Motherboard ASrock 990FX Extreme 4 / Lenovo Y580
Cooling Cooler Master Hyper 212 Plus / Big hunk of copper
Memory 16GB Samsung 30nm DDR3 1600+ / 8GB Hyundai DDR3 1600
Video Card(s) XFX R9 390 / GTX 660M 2GB
Storage Seagate 3TB/1TB + OCZ Synapse 64GB SSD Cache / Western Digital 1TB 7200RPM
Display(s) LG Ultrawide 29in @ 2560x1080 / Lenovo 15.6 @ 1920x1080
Case Coolermaster Storm Sniper / Lenovo Y580
Audio Device(s) Asus Xonar DG / Whatever Lenovo used
Power Supply Antec Truepower Blue 750W + Thermaltake 5.25in 250W / Big Power Brick
Software Windows 10 Pro / Windows 10 Home
#2
They could have solved this years ago by just simply...not using shared VRAM on everything. But, it is one thing to use this technique, its another to have the hardware to back it up. Not to mention drivers. I won't hold my breath for Intel to finally deliver on both fronts. If they can, great...more competition and better specs for hardware.
 
Joined
Mar 27, 2008
Messages
697 (0.20/day)
Likes
70
Location
Zagreb, Croatia
Processor C2D E8400@3.9GHz (488x8, 1.4v :( )
Motherboard Abit IP35-E
Cooling Thermaltake Sonic Tower+120mm fan
Memory 2GB kingmax ddr1066@976MHz 5-5-5-15
Video Card(s) Radeon X1800GTO @700/1400MHz with Accelero S1+Glacialtech fancard
Storage 2xSeagate Barracuda 7200.10 160GB
Display(s) Samsung SyncMaster 793s... just you laugh...
Case some Aplus case
Audio Device(s) Realtek ALC888
Power Supply Chieftec 450W
Software Win7 x64
#3
ahh, the tried and true intel method of solving cpu problems - if it sucks, slap on MOAR CACHE! :rockout:

jk :p
 

faramir

New Member
Joined
May 20, 2011
Messages
203 (0.08/day)
Likes
27
#4
Anyspeculation as to what the size of this L4 cache is going to be ?

Is it actually going to be large enough to serve as dedicated video memory (= 256+ MB) ? IMHO such a solution would make the most sense, provided that there is enough room inside MCM for memory.
 
Joined
Oct 30, 2008
Messages
1,530 (0.46/day)
Likes
377
System Name Lailalo / Edelweiss
Processor FX 8320 @ 4.5Ghz / i7 3610QM @2.3-3.2Ghz
Motherboard ASrock 990FX Extreme 4 / Lenovo Y580
Cooling Cooler Master Hyper 212 Plus / Big hunk of copper
Memory 16GB Samsung 30nm DDR3 1600+ / 8GB Hyundai DDR3 1600
Video Card(s) XFX R9 390 / GTX 660M 2GB
Storage Seagate 3TB/1TB + OCZ Synapse 64GB SSD Cache / Western Digital 1TB 7200RPM
Display(s) LG Ultrawide 29in @ 2560x1080 / Lenovo 15.6 @ 1920x1080
Case Coolermaster Storm Sniper / Lenovo Y580
Audio Device(s) Asus Xonar DG / Whatever Lenovo used
Power Supply Antec Truepower Blue 750W + Thermaltake 5.25in 250W / Big Power Brick
Software Windows 10 Pro / Windows 10 Home
#5
Might not need that much, remember the 360 used this technique. Think I remember it listed only 10MB of this ultrafast cache. Then you had 512MB shared with system and VRAM. Still, they used that with a high end level GPU (for the time period), not low end stuff. Till the tech is here to test, I wouldn't get too excited. AMD could easily counter this.
 
Joined
Mar 27, 2008
Messages
697 (0.20/day)
Likes
70
Location
Zagreb, Croatia
Processor C2D E8400@3.9GHz (488x8, 1.4v :( )
Motherboard Abit IP35-E
Cooling Thermaltake Sonic Tower+120mm fan
Memory 2GB kingmax ddr1066@976MHz 5-5-5-15
Video Card(s) Radeon X1800GTO @700/1400MHz with Accelero S1+Glacialtech fancard
Storage 2xSeagate Barracuda 7200.10 160GB
Display(s) Samsung SyncMaster 793s... just you laugh...
Case some Aplus case
Audio Device(s) Realtek ALC888
Power Supply Chieftec 450W
Software Win7 x64
#6
doubt it, a 256mb sram chip would be huge and cost a shitload. i say 32mb tops, even less.
 

NHKS

New Member
Joined
Sep 28, 2011
Messages
596 (0.26/day)
Likes
375
#7
Intel is known to be moving towards SoC design with Haswell and MCM could just mean that..
I guess 3D-stacking of modules could enable this.. it saves die area.. Ivy Bridge already has incorporated 3D stacking at the transistor level.. so 3D stacking at the die level(chip over chip) might just be the start with Haswell.. so it is not impossible for Intel to have a considerably large L4 cache (i am guessing at least 128MB) with 3D die stacking.. well, just guessing & i could be wrong as more information leaks/releases
 
Joined
Mar 27, 2008
Messages
697 (0.20/day)
Likes
70
Location
Zagreb, Croatia
Processor C2D E8400@3.9GHz (488x8, 1.4v :( )
Motherboard Abit IP35-E
Cooling Thermaltake Sonic Tower+120mm fan
Memory 2GB kingmax ddr1066@976MHz 5-5-5-15
Video Card(s) Radeon X1800GTO @700/1400MHz with Accelero S1+Glacialtech fancard
Storage 2xSeagate Barracuda 7200.10 160GB
Display(s) Samsung SyncMaster 793s... just you laugh...
Case some Aplus case
Audio Device(s) Realtek ALC888
Power Supply Chieftec 450W
Software Win7 x64
#8
so, i found an article about some mad ibm's processor with a 96mb L4 cache on a separate die. the die area was 487sq mm @ 45nm (1.5 billion transistors). so, if my maths aren't terribly wrong (sleep deprived and pretty stupid atm), they should be able to pack something like this in some 120-ish sq mm.

this is much more than i expected, i completely forgot about the 22nm process for haswell, this thing might actually end up with some 128mb of L4 cache :twitch:
 

NHKS

New Member
Joined
Sep 28, 2011
Messages
596 (0.26/day)
Likes
375
#9
... and Intel's Itanium(server) cpu already had a 'L4' cache back in 2004, codenamed Hondo.. it was 32MB.. and 'Poulson' codenamed 8-core expected in 2012 is expected to have the world's biggest L3 cache size — 54 MB. poulson chip is based on 32nm and die size is about 544 mm²

so, expecting a L4 cache >100MB with Haswell might not be too high..
 

btarunr

Editor & Senior Moderator
Staff member
Joined
Oct 9, 2007
Messages
34,335 (9.22/day)
Likes
17,427
Location
Hyderabad, India
System Name Long shelf-life potato
Processor Intel Core i7-4770K
Motherboard ASUS Z97-A
Cooling Xigmatek Aegir CPU Cooler
Memory 16GB Kingston HyperX Beast DDR3-1866
Video Card(s) 2x GeForce GTX 970 SLI
Storage ADATA SU800 512GB
Display(s) Samsung U28D590D 28-inch 4K
Case Cooler Master CM690 Window
Audio Device(s) Creative Sound Blaster Recon3D PCIe
Power Supply Corsair HX850W
Mouse Razer Abyssus 2014
Keyboard Microsoft Sidewinder X4
Software Windows 10 Pro Creators Update
#10
I think that L4 could be a GDDR5 die. But I agree, such a big chunk of SRAM could drive up costs immensely.
 
Joined
May 19, 2009
Messages
110 (0.04/day)
Likes
14
#11
Still only quad-core!

What this article also states is that the top end Haswell (within the mid-range) is still only going to be a quad core CPU!

By 2013/2014 that is going to be almost 8 years of mid-range CPUs having a maximum of 4 cores, come on people move things along!
 

NHKS

New Member
Joined
Sep 28, 2011
Messages
596 (0.26/day)
Likes
375
#12
What this article also states is that the top end Haswell (within the mid-range) is still only going to be a quad core CPU!
By 2013/2014 that is going to be almost 8 years of mid-range CPUs having a maximum of 4 cores, come on people move things along!
somewhat agree, but for single user desktops, the software that make use of all 4 cores is rare.. multi-thread/multi-core apps exist but not used by the average user.. even most games don't use more than 2 cores.. once the developers start leveraging the quad cores & threads then i guess we can demand for more cores..
 

XoR

New Member
Joined
Jul 11, 2011
Messages
27 (0.01/day)
Likes
1
#15
By 2013/2014 that is going to be almost 8 years of mid-range CPUs having a maximum of 4 cores, come on people move things along!
4 core is performance/price (manufacturing costs) sweet-spot and with HT it can take advantage of >4 thread support

if someone needs or think he/she needs more processing power then there are 6 and 8 core CPUs in the market... (and I'm not talking AMD here :shadedshu)
 

btarunr

Editor & Senior Moderator
Staff member
Joined
Oct 9, 2007
Messages
34,335 (9.22/day)
Likes
17,427
Location
Hyderabad, India
System Name Long shelf-life potato
Processor Intel Core i7-4770K
Motherboard ASUS Z97-A
Cooling Xigmatek Aegir CPU Cooler
Memory 16GB Kingston HyperX Beast DDR3-1866
Video Card(s) 2x GeForce GTX 970 SLI
Storage ADATA SU800 512GB
Display(s) Samsung U28D590D 28-inch 4K
Case Cooler Master CM690 Window
Audio Device(s) Creative Sound Blaster Recon3D PCIe
Power Supply Corsair HX850W
Mouse Razer Abyssus 2014
Keyboard Microsoft Sidewinder X4
Software Windows 10 Pro Creators Update
#16
Yet another possibility is 32 MB of SRAM cache, which is big enough to be a frame-buffer, and fast enough to compensate for its size.
 

Aquinus

Resident Wat-man
Joined
Jan 28, 2012
Messages
10,404 (4.84/day)
Likes
5,486
Location
Concord, NH
System Name Kratos
Processor Intel Core i7 3930k @ 4.2Ghz
Motherboard ASUS P9X79 Deluxe
Cooling Zalman CPNS9900MAX 130mm
Memory G.Skill DDR3-2133, 16gb (4x4gb) @ 9-11-10-28-108-1T 1.65v
Video Card(s) MSI AMD Radeon R9 390 GAMING 8GB @ PCI-E 3.0
Storage 2x120Gb SATA3 Corsair Force GT Raid-0, 4x1Tb RAID-5, 1x500GB
Display(s) 1x LG 27UD69P (4k), 2x Dell S2340M (1080p)
Case Antec 1200
Audio Device(s) Onboard Realtek® ALC898 8-Channel High Definition Audio
Power Supply Seasonic 1000-watt 80 PLUS Platinum
Mouse Logitech G602
Keyboard Rosewill RK-9100
Software Ubuntu 17.10
Benchmark Scores Benchmarks aren't everything.
#17
Yet another possibility is 32 MB of SRAM cache, which is big enough to be a frame-buffer, and fast enough to compensate for its size.
That will still highly rely on system memory though and only benefits you if you can swap pages in and out of cache before they're needed. I can't imagine a whole lot of speed benefits by doing this. The latency going from L3 to system memory isn't a huge leap and an L4 cache placement should be slower than L3 but faster than system memory... but the real question is how much bandwidth is there going to be and what will the latencies look like?

Looks like another reason why the BCLK on mainstream chips will have practically no wiggle room.
 
Joined
Dec 20, 2005
Messages
245 (0.06/day)
Likes
20
Processor 2500K
Motherboard Asus P8Z68-V
Cooling Stock
Memory Samsung MV-3V4G3D/US 4x4 1866@99927 1.41v
Video Card(s) Sapphire 280x
Display(s) crossover
Case junk
Audio Device(s) usbstick
Power Supply enermax 82+pro 5years+ still good
#18
May teh force be with them, the 12 shaders that is :laugh:
 
Joined
Mar 10, 2010
Messages
4,988 (1.76/day)
Likes
1,553
Location
Manchester uk
System Name Quad GT evo V
Processor FX8350 @ 4.8ghz1.525c NB2.64ghz Ht2.84ghz
Motherboard Gigabyte 990X Gaming
Cooling 360EK extreme 360Tt rad all push/pull, cpu,NB/Vrm blocks all EK
Memory Corsair vengeance 32Gb @1333 cas9
Video Card(s) Rx vega 64 waterblockedEK + Rx580 waterblockedEK
Storage samsung 840(250), WD 1Tb+2Tb +3Tbgrn 1tb hybrid
Display(s) Samsung uea28"850R 4k freesync, samsung 40" 1080p
Case Custom(modded) thermaltake Kandalf
Audio Device(s) Xfi creative 7.1 on board ,Yamaha dts av setup
Power Supply corsair 1000Rmx
Mouse CM optane
Keyboard CM optane
Software Win 10 Pro
Benchmark Scores 15.69K best overall sandra so far
#19
So whilst Amd is trying to utilise system mem virtually for gfx in its apu.s and in the future on gfx cards intels finally goin old school tut. Tards
 

faramir

New Member
Joined
May 20, 2011
Messages
203 (0.08/day)
Likes
27
#20
I think that L4 could be a GDDR5 die. But I agree, such a big chunk of SRAM could drive up costs immensely.
Note that GDDR5 isn't SRAM, it's DRAM, which means smaller die size and consequently cheaper production for a given capacity. Today's 1-2 GB video cards employ 8 chips, meaning one of those has capacity of 128-256 MB. Take away chip's package and the raw die has to be even smaller - perhaps just small enough to fit into an MCM, especially if produced on world's smallest lithography (where Intel has definite advantage over others).

With dedicated VRAM the GPU can scale up much more easily with the addition of more fucntional units as it is no longer constrained by the crappy memory bandwidth.

L4 cache approach on the other hand permits rather uniform performance with vastly larger memory pool (borrowed system RAM) but requires far more complicated control logic, even for EDRAM. And if they indeed went with SRAM that would mean more transistors still.

It will be interesting to see which way Intel went with Haswell, the cache way or VRAM way :)
 

Aquinus

Resident Wat-man
Joined
Jan 28, 2012
Messages
10,404 (4.84/day)
Likes
5,486
Location
Concord, NH
System Name Kratos
Processor Intel Core i7 3930k @ 4.2Ghz
Motherboard ASUS P9X79 Deluxe
Cooling Zalman CPNS9900MAX 130mm
Memory G.Skill DDR3-2133, 16gb (4x4gb) @ 9-11-10-28-108-1T 1.65v
Video Card(s) MSI AMD Radeon R9 390 GAMING 8GB @ PCI-E 3.0
Storage 2x120Gb SATA3 Corsair Force GT Raid-0, 4x1Tb RAID-5, 1x500GB
Display(s) 1x LG 27UD69P (4k), 2x Dell S2340M (1080p)
Case Antec 1200
Audio Device(s) Onboard Realtek® ALC898 8-Channel High Definition Audio
Power Supply Seasonic 1000-watt 80 PLUS Platinum
Mouse Logitech G602
Keyboard Rosewill RK-9100
Software Ubuntu 17.10
Benchmark Scores Benchmarks aren't everything.
#21
Note that GDDR5 isn't SRAM, it's DRAM, which means smaller die size and consequently cheaper production for a given capacity. Today's 1-2 GB video cards employ 8 chips, meaning one of those has capacity of 128-256 MB. Take away chip's package and the raw die has to be even smaller - perhaps just small enough to fit into an MCM, especially if produced on world's smallest lithography (where Intel has definite advantage over others).

With dedicated VRAM the GPU can scale up much more easily with the addition of more fucntional units as it is no longer constrained by the crappy memory bandwidth.

L4 cache approach on the other hand permits rather uniform performance with vastly larger memory pool (borrowed system RAM) but requires far more complicated control logic, even for EDRAM. And if they indeed went with SRAM that would mean more transistors still.

It will be interesting to see which way Intel went with Haswell, the cache way or VRAM way :)
Static ram is faster. This is another cache level, so I doubt there will be on-die dram. (That also adds temperature restrictions.)
 
Joined
Feb 17, 2007
Messages
1,238 (0.31/day)
Likes
168
Location
SoCal
Processor AMD Phenom II 1055T @ 3.6ghz 1.3V
Motherboard Asus M5A97 EVO
Cooling Xigmatek SD1284
Memory 2x4GB Patriot Sector 5 PC3-12800 @ 7-8-7-24-1T 1.7V
Video Card(s) XFX Radeon HD 7950 DD @ 1100/1350 1.185V
Storage OCZ Agility 3 120GB + 2x7200.12 500GB Raid1
Display(s) QNIX QX2710 27" LCD 1440p @ 120hz
Case Cooler Master 690M
Audio Device(s) Realtek ALC892
Power Supply Enermax Liberty 620W Eco Edition
Software Windows 7 Professional x64 / Ubuntu 12.04 x64
#22
I wonder why they don't do what AMD is doing, in that AMD has the IMC running at very fast speeds? Llano's IMC supports DDR3-1866, and I think Trinity's supports DDR3-2166. Now the AMD processor hardly benefits from that speed at all, but when using the integrated graphics, the memory bandwidth makes a huge difference in performance.

I'll admit it's annoying that laptop manufacturers like to put DDR3-1066/1333 in laptops where the processor supports much faster (so that'll have to be dealt with), but I could imagine an Intel Haswell Ultrabook with HD 5000 GPU and DDR3 2166 speeds (and 2166 modules to go with it) being quite useful.