Intel Isolates Root Cause of Raptor Lake Stability Issues to a Faulty eTVB Microcode Algorithm

InVasMani · Jun 18, 2024

phints said:
It would something radical from Intel to go with them for my next build. My current Ryzen build is almost 4 years old and due for an upgrade. It runs cool and quiet, had a couple issues with the first 6 months of Windows 11 release, but who didn't at that time? After that was sorted everyhting is back to running flawless. Considering a Ryzen 9700X for my next build.

Intel needs a 3-5x performance per watt increase for me to go back (for real, just look at 7800X3D benchmarks right here at TPU, Intel is appalling in efficiency). A new architecture and moving away from their ancient lithography to Intel 20A might might do it.

https://tpucdn.com/review/intel-core-i7-14700k/images/efficiency-gaming.png

What that chart doesn't include though is E cores disabled. If you look at the 7800X3D is indicative of poor development progress on cores and threading we're stuck in due to consoles still being limited to 8 cores and 16 threads. Disable the E cores and suddenly the 14700K is a better 14600K with two more P cores with better binning. The 7800X3D and other 8 core 16t CPU's are in the sweet spot of what developers are targeting right now with current consoles on the market. Expecting that to simply remain the same indefinitely is fools gold though. It also has less need for better quality DDR5 memory with that slab of stacked cache, but still can benefit from it just not as greatly as Intel chips will with a smaller cache size and stronger IMC.

I don't really get why w1zzard tested that at 1080p though while in other cases 720p is used to better represent a CPU bottleneck. I don't think it would help things particularly, but it probably would push CPU core usage and thread usage higher in some scenario's. Anyways we need to transition away from 8c/16t consoles before we see forward progress beyond that become standard. You can find examples where developers have targeted better hardware resource, but it won't become common until we see a shift at the largest audience developers target which is the console market.

This really isn't about which is better and why for which purpose use case under which testing scenario example however. This is about Intel making a bad decision or blunder and yes and/or maybe is kind of what we've gathered on the matter to this point.

Dr_b_ · Jun 18, 2024

FoulOnWhite said:
Intel were competing though, AMD had to resort to slapping some cache on the top to compete. Without the 3Dvcache it's AMD who would be behind. In a straight non Vcahce contest regardless of power Intel is better in everything.

What is AMD better at? AVX-512, no e-Cores or problems shifting loads, price on some parts, power consumption which equals less heat and power, and of course it actually does have a vcache part that is more performant in gaming, Zen5 also has more PCIe lanes than alder/raptor-lakes.

Its not really clear how intel is "better in everything", what were you referring to specifically?

trparky said:
Yeah, but at what cost? In a lot of cases, our electricity bills and potential future silicon degradation.

yeah its a serious issue that can not be discounted, power consumption and heat on these things is out of control

FoulOnWhite · Jun 18, 2024

trsttte said:
How is it not their own tech? They are the ones to have the idea to add cache on top of the cpu and designed a working model of that idea, then used TSMC fabrication technology to put that into practice. Just like Intel is doing with foveros and emib except intel is vertically integrated with their own fabs so they have to design both parts of the solution. If AMD did nothing and just used someone else's tech how come they're the only ones doing it?

If you want to use that stupid argument, well neither of them does anything, they're all just using what ASML makes possible with their machines, it's a ridiculous idea.

yeah it was AMD's idea sure

https://www.techinsights.com/blog/amd-ships-3d-v-cache-processors

The company used two TSMC innovations to create it.

https://www.techpowerup.com/review/amd-ryzen-7-5800x3d/2.html

Without TSMC it would not exist.

Tomorrow · Jun 19, 2024

FoulOnWhite said:
Without TSMC it would not exist.

That's the same BS argument that Zen would not be successful without TSMC. Then i need to remind people that Zen actually started from GlobalFoundries 14nm process before transitioning to TSMC with Zen 2 (3000 series). Sure it made it better because it was 7nm vs 14nm first and foremost but the groundwork had already been laid.

Also slapping a heap of cache on top of the die is not a guaranteed success. HUB has videos exploring various Intel CPU's with varying amount of cache and while bigger=better helps it's not as universal for Intel's the architecture as higher clock speeds.

Also 3D V-Cache is not an AMD exclusive technology. Other TSMC customers can also use it, including Intel.
Die-thinning and TSV's are also not purely TSMC's innovation as TSV's had been used in HBM memory before that by Korean memory makers.

Both AMD and Nvidia (i believe Intel too) are also using another TSMC technology that's in the news: CoWoS.
I dont see you downplaying them for some reason - just AMD.

trsttte · Jun 19, 2024

FoulOnWhite said:
yeah it was AMD's idea sure
View attachment 351920
https://www.techinsights.com/blog/amd-ships-3d-v-cache-processors

The company used two TSMC innovations to create it.
View attachment 351921

https://www.techpowerup.com/review/amd-ryzen-7-5800x3d/2.html

Without TSMC it would not exist.

Well Nvidia also doesn't do anything, nor does Qualcomm, nor Apple, nor Mediatek, etc. Intel GPUs? More like TSMC GPUs :rockout:

Do you even know anything about chip design?

Dr. Dro · Jun 19, 2024

Tomorrow said:
That's the same BS argument that Zen would not be successful without TSMC. Then i need to remind people that Zen actually started from GlobalFoundries 14nm process before transitioning to TSMC with Zen 2 (3000 series). Sure it made it better because it was 7nm vs 14nm first and foremost but the groundwork had already been laid.

Also slapping a heap of cache on top of the die is not a guaranteed success. HUB has videos exploring various Intel CPU's with varying amount of cache and while bigger=better helps it's not as universal for Intel's the architecture as higher clock speeds.

Also 3D V-Cache is not an AMD exclusive technology. Other TSMC customers can also use it, including Intel.
Die-thinning and TSV's are also not purely TSMC's innovation as TSV's had been used in HBM memory before that by Korean memory makers.

Both AMD and Nvidia (i believe Intel too) are also using another TSMC technology that's in the news: CoWoS.
I dont see you downplaying them for some reason - just AMD.

But Zen would NOT be successful without TSMC. GlobalFoundries does not have a modern manufacturing process suitable to build these processors on, and more cache does not necessarily mean better, in fact, there are several scenarios where the Ryzen X3D chips regress in comparison to the standard models. This occurs because 3D V-Cache incurs a cycle penalty and data takes longer to be processed, which means the standard model is better if the data set fits within its capacity. Also, 3D V-Cache is an AMD technology, TSMC is just a foundry and builds chips to the specification of their customers.

Intel's 3D technology is called Foveros, which was first seen in the Lakefield processor. It can be used to integrate every component in an SoC. Lakefield was very much some sort of proof-of-concept that made to the market (released as a mobile Core i5 in very limited quantities for one certain Samsung laptop) and as an example, featured one P-core, four E-cores (both of the first-generation kind, similar to seen in Rocket Lake), GPU and DRAM fully integrated on-die. It was some sort of Alder Lake prototype, in a certain way.

Intel's Process Roadmap to 2025: with 4nm, 3nm, 20A and 18A?!

www.anandtech.com

CoWoS stands for Chip on Wafer on Substrate, and it's got nothing to do with 3D stacking technology, it's similar to Intel's EMIB, it's a 2.5D system.

CoWoS® - Taiwan Semiconductor Manufacturing Company Limited

3dfabric.tsmc.com

The breakthrough will be combining this 2.5D packaging with 3D stacked dies to maximize density.

trparky said:
According to a report over at Techspot.com, Intel still doesn't know what's going on with the Core i9. My thoughts are that this is simply of symptom of Intel pushing a 15-year-old microarchitecture way past the breaking point.

At this point, I think Intel needs to recall every single last Core i9 ever sold and to issue refunds for selling what is a defective product.

Intel still doesn't know what is causing its i9 desktop chips to crash | TechSpot

Raptorlake is Nehalem rehashed 15 times over every year in the same way Zen 4 is a direct descendant of the K5, yes. :kookoo:

I wasn't affected, but I can easily see where it's all going wrong: bad motherboards, bad real-world operating conditions, and underlying microcode bugs... no wonder it's the i9's that have a problem and i7's with more down to earth clocks and no fancy thermal boost are largely immune.

Tomorrow · Jun 19, 2024

Dr. Dro said:
But Zen would NOT be successful without TSMC.

Zen was successful already on 14nm. 7nm by TSMC just made it better.

Dr. Dro said:
GlobalFoundries does not have a modern manufacturing process suitable to build these processors on,

We dont know if GF would be competitive had they not axed their sub 10nm plans.

Dr. Dro said:
and more cache does not necessarily mean better, in fact, there are several scenarios where the Ryzen X3D chips regress in comparison to the standard models.

Mostly clock speeds.

Dr. Dro said:
This occurs because 3D V-Cache incurs a cycle penalty and data takes longer to be processed, which means the standard model is better if the data set fits within its capacity.

This penalty is very small. Standard models excel in tasks that benefit from raw clock speed.

Dr. Dro said:
CoWoS stands for Chip on Wafer on Substrate, and it's got nothing to do with 3D stacking technology, it's similar to Intel's EMIB, it's a 2.5D system.

I was not comparing the two. I was giving an example of another technology that all three companies use.

AusWolf · Jun 19, 2024

Dr. Dro said:
But Zen would NOT be successful without TSMC.

Does it matter, though?

Nvidia wouldn't be successful without TSMC and Samsung, either. So what?

Dr. Dro · Jun 19, 2024

AusWolf said:
Does it matter, though?

Nvidia wouldn't be successful without TSMC and Samsung, either. So what?

A is true because B is true; so that means B is true because A is true :kookoo:

I do not see the correlation with other customers' portfolio and the fact that... you couldn't build a modern Zen CPU on Globalfoundries' latest node

AusWolf · Jun 19, 2024

Dr. Dro said:
A is true because B is true; so that means B is true because A is true

I do not see the correlation with other customers' portfolio and the fact that... you couldn't build a modern Zen CPU on Globalfoundries' latest node

AMD relies on TSMC for their CPUs, which is bad. Nvidia relies on TSMC for their GPUs, which is good. Am I the only one seeing a massive gaping contradiction here? :kookoo:

the54thvoid · Jun 19, 2024

This is the topic:

Intel Isolates Root Cause of Raptor Lake Stability Issues to a Faulty eTVB Microcode Algorithm

Please stick to it and stop the pointless tribal bickering.

Sunny and 75 · Jun 19, 2024

We have a microcode fix and new Default Settings, though instability investigation still ongoing!

#22 · Jun 19, 2024

I would like to finally see example of somebody getting instability issues after having everything set correctly from the start. Maybe even not that hardcore as using 125W PL1, but having all or even majority settings from Intel's blue tablet like this thing shows and turned off mobo's inventions like e.g. multi core enhancement. Boards are known for stupid "default" ideas for long and to the point like you can't trust them, checking CPU behaviour being from the first things to do after building a computer.

N/A · Jun 19, 2024

It's not just the power nor temperature, no CPU should ever be allowed to boost at 1,45 volts. my comfort limit for 7Nm would be 1,35V and 1,25 for 2Nm and onwards.

ir_cow · Jun 19, 2024

For once my old school overclocking of x freq x voltage is better

. Never have to worry about the boosting problems.

InVasMani · Jun 20, 2024

I'm getting the impression that ICCMAX defaults and/or recommendations is one of the bigger instability faults. Intel really should've included ICCMAX in a easy to find location on it's product page for it's chip SKU's instead of buried in a obscure PDF file somewhere that you can maybe find on the dark web region of it's website if you're a internet archive website archeologist. Intel should know better than that. It's a huge oversight on their behalf to not do so and that will probably be argued against them in any class action lawsuits that this whole chip broken fiasco.

If they can figure it out and come up with a real solution and w/o it arbitrarily impacting performance in a meaningful way that would be ideal and nice, but I have my reservations about that actually happening. It seems a lot like another spectre meltdown situation of sorts. That said they got away with that mostly unscathed. I could still cope with that honestly, but I got a great deal on my CPU if I'd paid thru the nose for a 14900K I'd wouldn't be too thrilled by it even if it is just a minor scaling back of relative performance that's already very abundant.

AusWolf · Jun 20, 2024

I'm starting to get the feeling that buyers of high-end CPUs or GPUs need to be prepared for disaster these days. RTX 3090s burning down with that Amazon game I can't remember, cooler issues with AMD-made 7900 XTXes, ASUS motherboards frying X3D CPUs, and then this malarkey with i9 stability... This is what you get in a world when every single soul and every company wants to be 1% ahead in everything all the time, I guess.

InVasMani · Jun 20, 2024

That whole new PSU connector fiasco as well. One of my M.2's also mysterious cooked itself and label looked melted. Either that M.2 heat spreader label was conductive and shorted itself or something else went wrong it to do with the PCIE 5.0 slot perhaps though my older gen 3.0 M.2 in that slot's been just fine. I think when I bought it the label was dodgy and I installed it anyway and it worked fine initially then fried after a month or two of some heating and cooling cycles. I could've sworn one looked a bit funky and almost returned it immediately, but didn't and decided to just try it anyway. Certainly won't be taking that chance again in the future. It be worse though at least it wasn't a catastrophic PSU failure.

Airbrushkid · Jun 21, 2024

Question for you all. So the 14th gen Core i9 is not affected by this mess up?

Carillon · Jun 21, 2024

Airbrushkid said:
Question for you all. So the 14th gen Core i9 is not affected by this mess up?

They are affected

AusWolf · Jun 21, 2024

Airbrushkid said:
Question for you all. So the 14th gen Core i9 is not affected by this mess up?

It's in the title:

Raptor Lake Stability Issues

14th gen is Raptor Lake (as well as 13th gen).

Airbrushkid · Jun 21, 2024

Yes, but what I read in other sites is the 13th + 14th Gen i5 and i7. But no where do they bring up or mention i9. Sorry but am old.

AusWolf said:
It's in the title:

Raptor Lake Stability Issues
14th gen is Raptor Lake (as well as 13th gen).

Chomiq · Jul 11, 2024

Wendell has interesting analysis using the telemetry data from two game studios and feedback from data center companies and system integrators. Not only we see increased number of failures for 13900K and 14900K systems not only on consumer side but also on the server side, where they're often used for hosting game servers that make use of high single core performance at stock settings using the W680 boards.

It reaches a point where game server hosting companies will charge you extra $1000 of support if you opt for Intel:

chrcoluk · Jul 11, 2024

Airbrushkid said:
Yes, but what I read in other sites is the 13th + 14th Gen i5 and i7. But no where do they bring up or mention i9. Sorry but am old.

Other way round, TVB is on i9 chips only.

InVasMani · Jul 12, 2024

chrcoluk said:
Other way round, TVB is on i9 chips only.

They may have meant with stability issues which TVB would probably just exacerbate the problem further on the i9. Right now we haven't gotten a clear indication as to what the root of the problem is. One of the things I've speculated is maybe the socket bending issues is part of the problem. That would absolutely be a larger issue with Datacenter Service Providers purchasing pre-made's since they wouldn't normally being installing anti-bending brackets. In fact Wendell could probably try to look at some cross comparison analysis between what DataCenter Service Providers are seeing versus like Steam or a larger gaming company to look at.

I would think the case of gaming at least you'd see a stronger likelihood of at least some of them using anti-bending brackets more so than with DataCenter so then digging further if the incidents of problem actually higher it might be a good indicator that the socket bending issue is a underlying culprit possibly. I'd say especially so given Gamer's are more likely to also overclock and push memory clock speeds and things higher so you'd actually expect instability to be inherently worse by a decent amount just based on that fact alone.

On the other hand if the data is more the opposite and much higher with like data around gaming and telemetry of that it might point more towards memory and/or ring bus perhaps possibly even the cache and just IMC in general and pushed far beyond general Intel recommendations around memory support. That most gamer's are pretty guilty of doing.

The fact that we still don't have a legitimate answer yet is crazy though. I mean this issues impacted people since 13th gen. How have they not pin pointed a cause by now? It's understandable that some finger pointing has happened at MB maker's with questionable bios decisions honestly and they fully deserve that criticism in light of a situation like this especially. It's a wake up call not do stupid questionable things with default settings. Anyways yeah is what it is, but insane that we still have no answers though we've got some insight into the widespread severity of the problem.

System Name	BigRed
Processor	ryzen 7 7800X3D
Motherboard	Asus Rog Strix B650E-E Gaming WIFI
Cooling	Thermalright Phantom Spirit 120 EVO
Memory	Corsair Vengeance 2x16GB DDR5 6000c30
Video Card(s)	MSI RTX 3080 Gaming Trio X 10GB
Storage	M.2 drives WD SN850X 1TB 4x4 BOOT/WD SN850X 4TB 4x4 STEAM/USB3 4TB OTHER
Display(s)	Dell s3422dwg 34" 3440x1440p 144hz ultrawide
Case	Corsair 7000D
Audio Device(s)	Logitech Z5450/KEF uniQ speakers/Bowers and Wilkins P7 Headphones
Power Supply	Corsair RM850x 80% gold
Mouse	Logitech G604 lightspeed wireless
Keyboard	Steelseries Apex Pro
Software	Windows 10 Pro X64
Benchmark Scores	Who cares

System Name	DarkStar
Processor	AMD Ryzen 7 5800X3D
Motherboard	Gigabyte X570 Aorus Master 1.0 (BIOS F39g)
Cooling	Arctic Liquid Freezer II 420mm AIO (rev4)
Memory	4x8GB Patriot Viper DDR4 4400C19 @ 3733Mhz 14-14-13-27 1T
Video Card(s)	Gigabyte Radeon RX 9070 XT Gaming OC 16GB GDDR6 @ 3400Mhz Core/22Gbps Mem
Storage	1TB Samsung 990 Pro (OS);2TB Samsung PM9A1;4TB XPG S70 Blade (Games);14TB WD UltraStar HC530 (Video)
Display(s)	27" LG UltraGear 27GS85Q-B @ 2560x1440 @ 200Hz, Nano-IPS
Case	be quiet! Dark Base Pro 900 Rev.2
Audio Device(s)	SteelSeries Arctis Nova Pro Wireless
Power Supply	1000W Seasonic PRIME Ultra Titanium;600W APC SMT750i UPS
Mouse	Logitech G604
Keyboard	Logitech G910 Orion Spark
Software	Windows 11 Pro x64 24H2 (Build 26100.4351)

Processor	13th Gen Intel Core i9-13900KS
Motherboard	ASUS ROG Maximus Z790 Apex Encore
Cooling	Pichau Lunara ARGB 360 + Honeywell PTM7950
Memory	32 GB G.Skill Trident Z5 RGB @ 7600 MT/s
Video Card(s)	Palit GameRock OC GeForce RTX 5090 32 GB
Storage	500 GB WD Black SN750 + 4x 300 GB WD VelociRaptor WD3000HLFS HDDs
Display(s)	55-inch LG G3 OLED
Case	Cooler Master MasterFrame 700 benchtable
Audio Device(s)	EVGA NU Audio + Sony MDR-V7 headphones
Power Supply	EVGA 1300 G2 1.3kW 80+ Gold
Mouse	Microsoft Classic IntelliMouse
Keyboard	IBM Model M type 1391405
Software	Windows 10 Enterprise 22H2
Benchmark Scores	I pulled a Qiqi~

System Name	DarkStar
Processor	AMD Ryzen 7 5800X3D
Motherboard	Gigabyte X570 Aorus Master 1.0 (BIOS F39g)
Cooling	Arctic Liquid Freezer II 420mm AIO (rev4)
Memory	4x8GB Patriot Viper DDR4 4400C19 @ 3733Mhz 14-14-13-27 1T
Video Card(s)	Gigabyte Radeon RX 9070 XT Gaming OC 16GB GDDR6 @ 3400Mhz Core/22Gbps Mem
Storage	1TB Samsung 990 Pro (OS);2TB Samsung PM9A1;4TB XPG S70 Blade (Games);14TB WD UltraStar HC530 (Video)
Display(s)	27" LG UltraGear 27GS85Q-B @ 2560x1440 @ 200Hz, Nano-IPS
Case	be quiet! Dark Base Pro 900 Rev.2
Audio Device(s)	SteelSeries Arctis Nova Pro Wireless
Power Supply	1000W Seasonic PRIME Ultra Titanium;600W APC SMT750i UPS
Mouse	Logitech G604
Keyboard	Logitech G910 Orion Spark
Software	Windows 11 Pro x64 24H2 (Build 26100.4351)

System Name	My second and third PCs are Intel + Nvidia
Processor	AMD Ryzen 7 7800X3D @ 45 W TDP Eco Mode
Motherboard	MSi Pro B650M-A Wifi
Cooling	Noctua NH-U9S chromax.black push+pull
Memory	2x 24 GB Corsair Vengeance DDR5-6000 CL36
Video Card(s)	PowerColor Reaper Radeon RX 9070 XT
Storage	2 TB Corsair MP600 GS, 4 TB Seagate Barracuda
Display(s)	Dell S3422DWG 34" 1440 UW 144 Hz
Case	Corsair Crystal 280X
Audio Device(s)	Logitech Z333 2.1 speakers, AKG Y50 headphones
Power Supply	750 W Seasonic Prime GX
Mouse	Logitech MX Master 2S
Keyboard	Logitech G413 SE
Software	Bazzite (Fedora Linux) KDE Plasma

Processor	Ryzen 7800X3D
Motherboard	MSI MAG Mortar B650 (wifi)
Cooling	be quiet! Dark Rock Pro 4
Memory	32GB Kingston Fury
Video Card(s)	MSI RTX 5080 Vanguard SOC
Storage	Seagate FireCuda 530 M.2 1TB / Samsumg 960 Pro M.2 512Gb
Display(s)	LG 32" 165Hz 1440p GSYNC
Case	Asus Prime AP201
Audio Device(s)	On Board
Power Supply	be quiet! Pure POwer M12 850w Gold (ATX3.0)
Software	W10

System Name	Old friend
Processor	3550 Ivy Bridge x 39.0 Multiplier
Memory	2x8GB 2400 RipjawsX
Video Card(s)	1070 Gaming X
Storage	870 EVO 500GB
Display(s)	27" QHD VA Curved @120Hz
Power Supply	Platinum 650W
Mouse	Light² 200
Keyboard	G610 Red

Processor	E5-4627 v4
Motherboard	VEINEDA X99
Memory	32 GB
Video Card(s)	2080 Ti
Storage	NE-512
Display(s)	G27Q
Case	MATREXX 50
Power Supply	SF850L

Processor	Ryzen 7 5800X3D
Motherboard	Gigabyte X570 Aorus Elite
Cooling	Thermalright Phantom Spirit 120 SE
Memory	2x16 GB Crucial Ballistix 3600 CL16 Rev E @ 3600 CL14
Video Card(s)	RTX3080 Ti FE
Storage	SX8200 Pro 1 TB, Plextor M6Pro 256 GB, WD Blue 2TB
Display(s)	LG 34GN850P-B
Case	Lancool 207
Audio Device(s)	SoundBlaster G6 \| Fidelio X2 \| Sennheiser 6XX
Power Supply	SeaSonic Focus Plus Gold 750W
Mouse	Endgame Gear XM1R
Keyboard	Wooting Two HE

System Name	Main PC
Processor	13700k
Motherboard	Asrock Z690 Steel Legend D4 - Bios 13.02
Cooling	Noctua NH-D15S
Memory	32 Gig 3200CL14
Video Card(s)	4080 RTX SUPER FE 16G
Storage	1TB 980 PRO, 2TB SN850X, 2TB DC P4600, 1TB 860 EVO, 4TB WD SA510, 2x 3TB WD Red, 1x 4TB WD Red
Display(s)	LG 27GL850
Case	Fractal Define R4
Audio Device(s)	Soundblaster AE-9
Power Supply	Antec HCG 750 Gold
Software	Windows 10 21H2 LTSC

Intel Isolates Root Cause of Raptor Lake Stability Issues to a Faulty eTVB Microcode Algorithm

Super Intoxicated Moderator

Intel Isolates Root Cause of Raptor Lake Stability Issues to a Faulty eTVB Microcode Algorithm​

Raptor Lake Stability Issues​

Raptor Lake Stability Issues​

Intel Isolates Root Cause of Raptor Lake Stability Issues to a Faulty eTVB Microcode Algorithm

Raptor Lake Stability Issues

Raptor Lake Stability Issues