• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

Possible Listings of AMD Ryzen 9 3800X, Ryzen 7 3700X, Ryzen 5 3600X Surface in Online Stores

Aquinus

Resident Wat-man
Joined
Jan 28, 2012
Messages
13,147 (2.94/day)
Location
Concord, NH, USA
System Name Apollo
Processor Intel Core i9 9880H
Motherboard Some proprietary Apple thing.
Memory 64GB DDR4-2667
Video Card(s) AMD Radeon Pro 5600M, 8GB HBM2
Storage 1TB Apple NVMe, 4TB External
Display(s) Laptop @ 3072x1920 + 2x LG 5k Ultrafine TB3 displays
Case MacBook Pro (16", 2019)
Audio Device(s) AirPods Pro, Sennheiser HD 380s w/ FIIO Alpen 2, or Logitech 2.1 Speakers
Power Supply 96w Power Adapter
Mouse Logitech MX Master 3
Keyboard Logitech G915, GL Clicky
Software MacOS 12.1
Dont get me wrong, i understand the hardware needs to be here, but weve had hex and octo cores for 8/6 years already and we havent really seen a momentum shift yet. I think it's a lot closer to reality, but still a generation or two away from really making a difference for the majority. The 'use it later' argument is something of a given as weve been hearing that argument for years. It just depends on use models for the pc/user.
Sure, but you have to consider what that hardware has been in and what typical consumers are buying. The reality is that it hasn't been in laptops and the market is hungry for mobile devices. We're only now starting to see laptops with 6c/12t.
 
Joined
Oct 5, 2008
Messages
1,802 (0.32/day)
Location
ATL, GA
System Name My Rig
Processor AMD 3950X
Motherboard X570 TUFF GAMING PLUS
Cooling EKWB Custom Loop, Lian Li 011 G1 distroplate/DDC 3.1 combo
Memory 4x16GB Corsair DDR4-3466
Video Card(s) MSI Seahawk 2080 Ti EKWB block
Storage 2TB Auros NVMe Drive
Display(s) Asus P27UQ
Case Lian Li 011-Dynamic XL
Audio Device(s) JBL 30X
Power Supply Seasonic Titanium 1000W
Mouse Razer Lancehead
Keyboard Razer Widow Maker Keyboard
Software Window's 10 Pro
I am really hopeful AMD pulls out a winner(s) here. Competition is always good for consumers.
 
Joined
Apr 12, 2013
Messages
6,749 (1.68/day)
notb Specifically for 2990WX an argument can be made that it's not the best TR chip out there, in fact I did say that 2970WX is better at launch time. The reason is clear ~ AMD disabled 4 memory channels & the (dis)connected dies needed an additional hop to access memory. Having said that Windows should still be blamed for the awful performance we see on that platform specific to these high core count CPUs, especially wrt Linux. So coming from 2950x or 2920x, the WX variants aren't great VFM. However if your software isn't memory bound, chances are you'll make good use of the additional cores, on Linux!

You also seem to think that Zen suffers from high latencies, which is absolutely wrong. AFAIK the IF itself is (arguably) the biggest bottleneck in their memory subsystem, in fact Zen+ does beat Intel in L2/L3 cache latencies ~ https://www.anandtech.com/show/12625/amd-second-generation-ryzen-7-2700x-2700-ryzen-5-2600x-2600/3



Intel's much better with mem latency, that may change slightly with IF2 & zen2 however.
 
Last edited:
Joined
Jun 28, 2016
Messages
3,595 (1.26/day)
Enthusiast sometimes means people who actually use computers to do useful things, like software engineers, DBAs, or people who do things genomics.
By all means, no.
Enthusiast is someone who is enthusiastic about PCs. He likes to talk about them, he likes to read reviews, he likes to spend more than needed.
It has absolutely nothing to do with how you use a PC.
It's like buying a 20 core Xeon or something and then whining about single threaded performance when you opt'ed for more cores. It's laughable.
You don't know much about how servers are used, do you - mister "enthusiast"? ;-)
Because normally one buys more cores to... you know... get more cores?
Nope. Single-thread performance is improving slowly and will hit a wall soon. If one wants more processing power, he is forced to buy more cores.
But having more cores doesn't automatically mean software will run faster. Someone has to write them to do so (assuming it's possible in the first place).
That's the main advantage of increasing single-core performance.
Of course it's easier to write single-threaded code. You have fewer issues to deal with, but that doesn't mean it's the right decision given the workload. Also, I write multithreaded code all the time and I do it in the day job and let me tell you something, I don't write any data processing job that uses a single core. I use stream abstractions and pipelines all over the place because changing a single argument to a function call can change the amount of parallelism I get at any stage in the pipeline. It also helps when you use a language that's conducive to writing multi-threaded code. Take my main language of choice, Clojure, it's a Lisp-1 with immutability through and through with a bunch of mechanisms to have controlled behavior around mutable state. It's a very different animal than writing multi-threaded code in say, Java or C# and it's really not that difficult.
Not everyone is a programmer and not everyone has the comfort you do. Data processing is by definition the best case possible for multi-threaded computing.

I know it may be difficult, but you have to consider that coding today is also done by analyst and it has to happen fast. I also write code as a day job (and as a hobby). But I also train analysts to use stuff like R or VBA. They need high single-core performance. And yes, they work on Xeons.

And as usual, I have to repeat the fundamental fact: some programs are sequential - no matter how well you code and what language you use. It can't be helped.
Also, you're running on the assumption that time to write the application is the only cost. What about the time it takes for that application to run? Time is money. My ETL jobs would be practically useless if they take a full day to run which is why they're setup in a way where concurrency is tunable in terms of both parallelism and batch size.
Well exactly! The balance between writing and running a program is the key issue.
Some people write ETLs and spend hours optimizing code. It's their job.
But other people have other needs. You have to open a bit and try to understand them as well.
I was doing physics once, now I'm an analyst. A common fact: a lot of the programs are going to be used few times at best, often just once.
There's really no point in spending a lot of time optimizing (and often you don't have time to do it).

Also Graphene is vaporware until we actually see it in production at a price that's not outlandish, otherwise it's just a pipe dream.
I don't know what you mean. Graphene exists and PoC transistors have been made a while ago.
Yes, it is a distant future for personal computers, but that's the whole point of technological advancement - we have to plan many years ahead. And that's what makes science and engineering interesting.
When graphene CPUs arrive in laptops, they'll be very well controlled and boring.
We can make CPUs out of a number of different materials, but that doesn't mean it's a viable option. Once again, all of this needs to be measure in reality.
And the reality is that some applications would benefit from very fast cores, not dozens of them. That's all I'm saying.

A fast single 50Ghz core, Your so far away from possible your into dream land , we are no where near buying optic based transistors or any grephene version of transistor but i doubt they would be sold in single units / cores , that makes for a binning nightmare ,it works or its actually in the bin, shows what you knows.
To be honest, I don't really care when such CPUs will be available for PCs. I'm interested when they'll arrive in datacenters. I always hoped it'll happen before quantum processors, but who knows?
50GHz will produce a lot of heat. In fact the whole idea of GaN in processors is that it can sustain much higher temperatures than silicon.
Quantum computers need extreme cooling solutions just to actually work.

It's always good to look back at the progress computers made in the last decade. Just how much faster have cores became? And why stop now?

Moreover, can you imagine the discussions people had in the early 50s? Transistors already existed. PoC microprocessors as well. And still many didn't believe microprocessors will be stable enough to make the idea feasible. So it wasn't very different from the situation we have with GaN and graphene in 2019.
A few years later microprocessors were already mass produced. When I was born ~30 years later, computers were as normal and omnipresent as meat mincers.
The nodes in the future clearly pre dictate that its too expensive to make all the chip on the cutting edge node so you Will see intel follow suit with chiplets and they have already stated they will , they're busy on that now, see Foveros and intels many statements towards a modular future with Emib connects and 3d stacking.
I have absolutely nothing against MCM. It's a very good idea.

notb Specifically for 2990WX an argument can be made that it's not the best TR chip out there
Well, I'm precisely mentioning 2990WX because it represents a similar scenario to how Zen2 will work.
And not just the 16-core variant. Every Zen2 processor will have to make an additional "hop" because the whole communication with memory will be done via the I/O die. No direct wires.
Some of this will be mitigated by huge cache, but in many loads the disadvantage will be obvious. You'll see soon enough.

Also, I understand many people are waiting for Zen2 APUs (8 cores + Navi or whatever). Graphics memory will also have to be accessed via the I/O die. Good luck with that.
Intel's much better with mem latency, that may change slightly with IF2 & zen2 however.
Just don't bet your life savings on that. ;-)
 

Aquinus

Resident Wat-man
Joined
Jan 28, 2012
Messages
13,147 (2.94/day)
Location
Concord, NH, USA
System Name Apollo
Processor Intel Core i9 9880H
Motherboard Some proprietary Apple thing.
Memory 64GB DDR4-2667
Video Card(s) AMD Radeon Pro 5600M, 8GB HBM2
Storage 1TB Apple NVMe, 4TB External
Display(s) Laptop @ 3072x1920 + 2x LG 5k Ultrafine TB3 displays
Case MacBook Pro (16", 2019)
Audio Device(s) AirPods Pro, Sennheiser HD 380s w/ FIIO Alpen 2, or Logitech 2.1 Speakers
Power Supply 96w Power Adapter
Mouse Logitech MX Master 3
Keyboard Logitech G915, GL Clicky
Software MacOS 12.1
By all means, no.
Enthusiast is someone who is enthusiastic about PCs. He likes to talk about them, he likes to read reviews, he likes to spend more than needed.
It has absolutely nothing to do with how you use a PC.
If you read the two sentences that followed, you would realize that's what I was saying. :slap:
You don't know much about how servers are used, do you - mister "enthusiast"? ;-)
Yeah, I do. Servers have more cores and lower clocks instead of higher clocks and fewer cores for a reason... Mister "enthusiast". A person buying it for gaming will be thoroughly disappointed.
Nope. Single-thread performance is improving slowly and will hit a wall soon. If one wants more processing power, he is forced to buy more cores.
But having more cores doesn't automatically mean software will run faster. Someone has to write them to do so (assuming it's possible in the first place).
That's the main advantage of increasing single-core performance.
...and reality suggests that single threaded performance has already hit a wall which is why we're seeing more cores. None of that invalidates what I'm saying.
Not everyone is a programmer and not everyone has the comfort you do. Data processing is by definition the best case possible for multi-threaded computing.
Data processing is literally what 90% of programs do. You might have stateful portions of your application, but most of it is data processing most of the time. Existing code is hard to make multi-threaded because a lot of times it's done in a language or technologies with poor constructs for effectively doing concurrent workloads because it had already been made to not be, I can even use your own statement as an example:
I know it may be difficult, but you have to consider that coding today is also done by analyst and it has to happen fast. I also write code as a day job (and as a hobby). But I also train analysts to use stuff like R or VBA. They need high single-core performance. And yes, they work on Xeons.
They need high single core performance because the software is the limitation. Simple fact is more cores is easier than higher clocks. It also scales better. Another fun fact, R and VBA are archaic. Another excellent example of why we shouldn't make architectual decisions for hardware based on old, archaic designs, designed for older machines. When VBA and R were released, computers had 1 core, so you know what they were designed for? 1 core.
And as usual, I have to repeat the fundamental fact: some percentage of programs are sequential - no matter how well you code and what language you use. It can't be helped.
Fixed that for you. Most applications aren't purely sequential in nature. An entire workload doesn't need to be made to run in parallel so long as there are parts of it that you can.
Well exactly! The balance between writing and running a program is the key issue.
Some people write ETLs and spend hours optimizing code. It's their job.
Writing ETL jobs is hardly the entirety of my job and making it multithreaded wasn't a substantial cost for me to do. That's my point, but you seem to like making a lot of assumptions about what using different technologies that aren't garbage gets you in these situations.
I don't know what you mean. Graphene exists and PoC transistors have been made a while ago.
Yes, it is a distant future for personal computers, but that's the whole point of technological advancement - we have to plan many years ahead. And that's what makes science and engineering interesting.
When graphene CPUs arrive in laptops, they'll be very well controlled and boring.
Can I buy it and run my software on it? How about in 5 years? 10? Yeah, definitely a pipedream. Maybe one day, but that gets us nowhere right now or even in the foreseeable long term. That gets us nothing right now. What we have right now, are more cores. A real, tangible, thing that can be bought and used.
Well, I'm precisely mentioning 2990WX because it represents a similar scenario to how Zen2 will work.
Performance of the 2950X isn't too shabby for what it is and it's got the same design. Also, having I/O resources spread out instead of having them centralized is a big difference. We should be careful about equating the two because they definitely are apples and oranges with different benefits and shortcomings.
 

drayzen

New Member
Joined
Dec 7, 2018
Messages
8 (0.00/day)
The image of the 9 box is fake.
The perspective of the number is wrong.

YgRffHjzVos68Bcl.jpg
 
Joined
Oct 22, 2014
Messages
13,210 (3.81/day)
Location
Sunshine Coast
System Name Black Box
Processor Intel Xeon E3-1260L v5
Motherboard MSI E3 KRAIT Gaming v5
Cooling Tt tower + 120mm Tt fan
Memory G.Skill 16GB 3600 C18
Video Card(s) Asus GTX 970 Mini
Storage Kingston A2000 512Gb NVME
Display(s) AOC 24" Freesync 1m.s. 75Hz
Case Corsair 450D High Air Flow.
Audio Device(s) No need.
Power Supply FSP Aurum 650W
Mouse Yes
Keyboard Of course
Software W10 Pro 64 bit
The image of the 9 box is fake.
The perspective of the number is wrong.
Some people fail to realise that it hasn't been released yet, and a placeholder has been mocked up :banghead:
it has also been mentioned by a couple of others that fail to grasp the concept of a placeholder.
 

drayzen

New Member
Joined
Dec 7, 2018
Messages
8 (0.00/day)
Some people fail to realise that it hasn't been released yet, and a placeholder has been mocked up :banghead:
it has also been mentioned by a couple of others that fail to grasp the concept of a placeholder.
Sheesh, no need to get bitchy.
I've spent many years working in online retail/WS so am well aware of what placeholders are. It's simply another indicator that the entire thing could be fake. Given some of the name formatting it's even more likely.
I didn't see any other examples posted so I put it up. Relax huh...
 
Joined
Mar 21, 2016
Messages
2,197 (0.74/day)
IPC will narrow the single threaded gap and widen the multi threaded advantages so it's in AMD's interest to have a mix of emphasis on IPC as well as additional cores. It's also important in terms of efficiency so it'll help them better compete in SFF and mobile market segments as well. Ryzen is actually more well catered toward a stronger IPC emphasis than Intel's current chip designs. Better IPC also means precision boost should be able to work even better in turn as well. I don't have much doubt about the 15% IPC gains on the new 7nm Ryzen chips. I think it was Lisa Su that alluded to it in the first place in "certain" work loads compared to 14nm Ryzen. It's really hard to say what it'll be on average in terms of IPC gain, but I'd suspect around 10-12.5% gains to be had. I don't think Intel's IPC gains over the last decade at all reflects the potential for larger IPC gains to be had from AMD. For starters AMD being further behind at present in terms of IPC just means they've got a wider gap to improve upon in terms of IPC it would a lot harder in Intel's current position to have gains that large comparatively speaking. It's also readily obvious based on Intel's own designs that there is little reason why AMD can't follow suit and improve some of the weaker differences between the two companies designs.

People that are being hard headed about it since it's AMD rather than Intel are being foolish is the bottom line. It shouldn't really be any harder for AMD to improve it's IPC as it is for Intel to copy it's chiplet approach is how I see it. The one clear difference is Intel obviously has a larger budget to work from, but AMD today isn't the same cash strapped mismanaged company from a decade ago that also had to contend with Intel's anti competitive behavior on top of all of that at the time. I've got a lot of faith 7nm Ryzen will be great overall and the closest thing to AMD64 performance and competitiveness out of AMD on the CPU side since that point in time. I'm sure Intel will bounce back and do so aggressively, but we could see a good boxing match between the two companies in the next 5-6years if I had to guess.
 
Joined
Apr 12, 2013
Messages
6,749 (1.68/day)
Well, I'm precisely mentioning 2990WX because it represents a similar scenario to how Zen2 will work.
And not just the 16-core variant. Every Zen2 processor will have to make an additional "hop" because the whole communication with memory will be done via the I/O die. No direct wires.
Some of this will be mitigated by huge cache, but in many loads the disadvantage will be obvious. You'll see soon enough.

Also, I understand many people are waiting for Zen2 APUs (8 cores + Navi or whatever). Graphics memory will also have to be accessed via the I/O die. Good luck with that.
That's not true either & I suspect you know it. This is zen 2 ~


This is TR 2 ~



The IO die is strategically placed between zen 2 dies & there is no additional hop, though admittedly we don't know how TR3 will look but I'd be seriously disappointed if AMD redid this disable entire memory channels for a couple of dies!
 
Joined
May 31, 2016
Messages
4,324 (1.50/day)
Location
Currently Norway
System Name Bro2
Processor Ryzen 5800X
Motherboard Gigabyte X570 Aorus Elite
Cooling Corsair h115i pro rgb
Memory 16GB G.Skill Flare X 3200 CL14 @3800Mhz CL16
Video Card(s) Powercolor 6900 XT Red Devil 1.1v@2400Mhz
Storage M.2 Samsung 970 Evo Plus 500MB/ Samsung 860 Evo 1TB
Display(s) LG 27UD69 UHD / LG 27GN950
Case Fractal Design G
Audio Device(s) Realtec 5.1
Power Supply Seasonic 750W GOLD
Mouse Logitech G402
Keyboard Logitech slim
Software Windows 10 64 bit
This zen2 looks amazing. If this is true then Intel is in trouble. Maybe Jim from AdoredTV was right. Reaching 5GHz for Ryzen is outstanding. I might be changing my CPU soon for that 12core monster :)
 
Joined
Sep 17, 2014
Messages
20,917 (5.97/day)
Location
The Washing Machine
Processor i7 8700k 4.6Ghz @ 1.24V
Motherboard AsRock Fatal1ty K6 Z370
Cooling beQuiet! Dark Rock Pro 3
Memory 16GB Corsair Vengeance LPX 3200/C16
Video Card(s) ASRock RX7900XT Phantom Gaming
Storage Samsung 850 EVO 1TB + Samsung 830 256GB + Crucial BX100 250GB + Toshiba 1TB HDD
Display(s) Gigabyte G34QWC (3440x1440)
Case Fractal Design Define R5
Audio Device(s) Harman Kardon AVR137 + 2.1
Power Supply EVGA Supernova G2 750W
Mouse XTRFY M42
Keyboard Lenovo Thinkpad Trackpoint II
Software W10 x64
Well, I'm precisely mentioning 2990WX because it represents a similar scenario to how Zen2 will work.


Also, I understand many people are waiting for Zen2 APUs (8 cores + Navi or whatever). Graphics memory will also have to be accessed via the I/O die. Good luck with that.

I'm astounded by your logic sometimes. 2990WX was the worst, most situational performing TR part of the whole line up, and you think they straight up copy paste that design to a whole CPU stack to make sure it sucks just as hard.

Yep. AMD offer you that engineering job yet?
 
Joined
Jun 28, 2016
Messages
3,595 (1.26/day)
I'm astounded by your logic sometimes. 2990WX was the worst, most situational performing TR part of the whole line up, and you think they straight up copy paste that design to a whole CPU stack to make sure it sucks just as hard.
Kind of. TR are limited by memory access. Too few channels for so many cores.
To use all cores, you have to configure it as NUMA, basically adding a layer (a "hop") that centralizes memory access.

To limit latency and get better performance in interactive software (like games), you could have run it in "game mode", which uses just 8 cores.

Zen2 Ryzen may be subject to similar treatment.

And another thing is ratio of cores vs memory channels which could give 16-core Ryzen similar problems the 32-core Threadripper had.

Yep. AMD offer you that engineering job yet?
I don't know why you keep writing this (and why AMD this time? I preferred Intel!). What's the point?
 
Joined
Sep 17, 2014
Messages
20,917 (5.97/day)
Location
The Washing Machine
Processor i7 8700k 4.6Ghz @ 1.24V
Motherboard AsRock Fatal1ty K6 Z370
Cooling beQuiet! Dark Rock Pro 3
Memory 16GB Corsair Vengeance LPX 3200/C16
Video Card(s) ASRock RX7900XT Phantom Gaming
Storage Samsung 850 EVO 1TB + Samsung 830 256GB + Crucial BX100 250GB + Toshiba 1TB HDD
Display(s) Gigabyte G34QWC (3440x1440)
Case Fractal Design Define R5
Audio Device(s) Harman Kardon AVR137 + 2.1
Power Supply EVGA Supernova G2 750W
Mouse XTRFY M42
Keyboard Lenovo Thinkpad Trackpoint II
Software W10 x64
Kind of. TR are limited by memory access. Too few channels for so many cores.
To use all cores, you have to configure it as NUMA, basically adding a layer (a "hop") that centralizes memory access.

To limit latency and get better performance in interactive software (like games), you could have run it in "game mode", which uses just 8 cores.

Zen2 Ryzen may be subject to similar treatment.

And another thing is ratio of cores vs memory channels which could give 16-core Ryzen similar problems the 32-core Threadripper had.


I don't know why you keep writing this (and why AMD this time? I preferred Intel!). What's the point?

Its tongue-in-cheek really because of the things you write. Half of it is absolutely true and then the other half makes zero sense.
 
Joined
Mar 21, 2016
Messages
2,197 (0.74/day)
AMD should make a TR based APU setup with two APU's flanked by a I/O hub and HBM on opposite sides of them both and twin NVMe M.2 hardwired to it on the reverse side of the CPU socket on the motherboard. That setup would probably be a screamer. The NVMe devices would be for HBCC and along with the quad channel and HBM they could have a tiered storage managed by the I/O die itself between the two APU die's. Hopefully for Zen3 AMD does something along that line among other things.

I'd be shocked if AMD doesn't make a post process die for scaling/denoise and other stuff at some point. Essentially RTX/tensor cores for Turing is basically those two things. It wouldn't be a bad idea for AMD to have die that can do those things on the fly quickly and efficiently for it's APU's and even for TR/Epyc that were actually pretty quick at ray tracing and could be quicker if they had specialized instruction sets or die's to do some of those things better.
 
Joined
Jun 10, 2014
Messages
2,900 (0.81/day)
Processor AMD Ryzen 9 5900X ||| Intel Core i7-3930K
Motherboard ASUS ProArt B550-CREATOR ||| Asus P9X79 WS
Cooling Noctua NH-U14S ||| Be Quiet Pure Rock
Memory Crucial 2 x 16 GB 3200 MHz ||| Corsair 8 x 8 GB 1333 MHz
Video Card(s) MSI GTX 1060 3GB ||| MSI GTX 680 4GB
Storage Samsung 970 PRO 512 GB + 1 TB ||| Intel 545s 512 GB + 256 GB
Display(s) Asus ROG Swift PG278QR 27" ||| Eizo EV2416W 24"
Case Fractal Design Define 7 XL x 2
Audio Device(s) Cambridge Audio DacMagic Plus
Power Supply Seasonic Focus PX-850 x 2
Mouse Razer Abyssus
Keyboard CM Storm QuickFire XT
Software Ubuntu
The IO die is strategically placed between zen 2 dies & there is no additional hop, though admittedly we don't know how TR3 will look but I'd be seriously disappointed if AMD redid this disable entire memory channels for a couple of dies!
Technically there is an additional "hop" in Zen 2:
Zen(1): Die -> Memory (best case or single die)
Zen(1): Die -> Die -> Memory (worst case)
Zen 2: Die -> IO controller -> Memory

Zen 2 should at least be more consistent, and benchmarks will reveal the actual latencies and performance penalties. But thinking that Zen 2 will have no such issues is naive.
 

0x6A7232

New Member
Joined
May 3, 2019
Messages
11 (0.01/day)
What's your thoughts on this? (software / hardware taking single-threaded code and distributing it using AI)
At 8 minutes in:
 
Joined
Jun 10, 2014
Messages
2,900 (0.81/day)
Processor AMD Ryzen 9 5900X ||| Intel Core i7-3930K
Motherboard ASUS ProArt B550-CREATOR ||| Asus P9X79 WS
Cooling Noctua NH-U14S ||| Be Quiet Pure Rock
Memory Crucial 2 x 16 GB 3200 MHz ||| Corsair 8 x 8 GB 1333 MHz
Video Card(s) MSI GTX 1060 3GB ||| MSI GTX 680 4GB
Storage Samsung 970 PRO 512 GB + 1 TB ||| Intel 545s 512 GB + 256 GB
Display(s) Asus ROG Swift PG278QR 27" ||| Eizo EV2416W 24"
Case Fractal Design Define 7 XL x 2
Audio Device(s) Cambridge Audio DacMagic Plus
Power Supply Seasonic Focus PX-850 x 2
Mouse Razer Abyssus
Keyboard CM Storm QuickFire XT
Software Ubuntu
What's your thoughts on this? (software / hardware taking single-threaded code and distributing it using AI)
At 8 minutes in:
Aaah, Amdahl's law, I always cringe when I hear people talking about it.
In most cases it doesn't matter how much of the code is parallel or not, but how much of the execution time is spent on which part of the code, i.e. in some cases 99% of the execution time is spent in 1% of the code.
A much better way of thinking of it (even for non-coders) is how many tasks/subtasks/work chunks can be done independently, because you can scale into hundreds if not thousands of threads as long as each thread doesn't need to be synchronized, and it's the synchronization between cores which kills your performance. A good example of a workload which scales this way is a web server which spawns a thread per request, or a software rendering which splits pars of the scene up into separate worker threads. Workloads like this can scale well with very high core counts, but they do so because each thread essentially work on their own subtask, which is also why Amdahl's law is irrelevant, if anything it has to be applied on this level, not the application level.

Most real world applications are highly synchronized, and it has little to do with the skills or willingness of the developers, but the nature of the task the application solves, and as I will get back to, the overall structure of the codebase. The "heavy" parts of most applications is usually an algorithm where the application is stuck in a loop before it proceeds, but most applications are usually written in overly complex and abstracted codebases making it nearly impossible to separate out algorithms and the CPU also spends most cycles stuck idling due to the bloat. The first step to optimize the code is always to make it more dense, remove all possible abstractions and make it cache optimized. Then it usually becomes obvious at which level the task can be split into subtasks and potentially even multiple threads. I usually refer to this tendency among developers and "software architects" to overcomplicate and abstract things as a "decease".
-
Back to the video you referred to;
Well, it will be "impossible" to take an existing thread and split it up across multiple cores on an OS scheduling level, by "impossible" I mean impossible in real time and without a slowdown of 10.000x or more.
What this video appears to be showing is just "smarter" OS scheduling. Even if you make your glorious single threaded application, it probably relies on system calls or libraries which will span multiple threads. If your application relies heavily on this kind of interaction, your slowdown may actually be OS scheduling overhead, not overhead within your application. Most desktops and even mobile devices these days run hundreds of tiny background threads which constantly "disturbs" the scheduler with "unnecessary" scheduling overhead. If only the OS could prioritize better which threads are waiting for each other etc. you can get a huge improvement in performance for certain use cases. But this, as with anything is just trying to remove a bottleneck, not actually making code more parallel, so the scaling here will also be declining with core count.

But tweaking kernel scheduling is not new, it's well known in the industry. In Linux you can choose between various schedulers which have their pros/cons depending on workload. One of them is the "low latency" kernel which is optional in some Linux distributions, which increases the scheduling interval and is more aggressive in prioritizing threads, which have huge impacts on latencies in some thread-heavy workloads. There are probably more potential to do smarter schedulers which uses more statistics for the thread allocation, or "AI" as they call it these days.

As for optimizations in hardware, CPUs already do instruction level parallelism, and Intel CPUs since the first Pentium have been superscalar. The automatic optimizations today are however very limited due to branching in code. Even with branch prediction, CPUs are pretty much guaranteed a stall after just a few branching instructions, which is why most applications are actually stalled 95-99% of the time. If however the CPU was given more context and able to distinguish branching which only affects the "local scope(s)" (which is probably what they mean by "threadlets" in the video) and branching which affects the control flow of the program, then we could see huge improvements in performance, 2-3x is quite possible in the long term.
 
Joined
Jan 15, 2018
Messages
55 (0.02/day)
Ryzen 3800X will not only fuck up all Xeon E CPUs but also affect Xeon D & Xeon W market.
With Ryzen 3800X & ECC UDIMM we can easily build a 16C32T workstation superior to all I mentioned above.
 
Joined
Jun 28, 2016
Messages
3,595 (1.26/day)
Ryzen 3800X will not only fuck up all Xeon E CPUs but also affect Xeon D & Xeon W market.
With Ryzen 3800X & ECC UDIMM we can easily build a 16C32T workstation superior to all I mentioned above.
AFAIK no AM4 Ryzen to date had ECC certification. Grow up.

Not to mention these are very different CPUs and different platforms. No unintentional forced sex is going to happen.
 
Joined
Sep 22, 2012
Messages
1,010 (0.24/day)
Location
Belgrade, Serbia
System Name Intel® X99 Wellsburg
Processor Intel® Core™ i7-5820K - 4.5GHz
Motherboard ASUS Rampage V E10 (1801)
Cooling EK RGB Monoblock + EK XRES D5 Revo Glass PWM
Memory CMD16GX4M4A2666C15
Video Card(s) ASUS GTX1080Ti Poseidon
Storage Samsung 970 EVO PLUS 1TB /850 EVO 1TB / WD Black 2TB
Display(s) Samsung P2450H
Case Lian Li PC-O11 WXC
Audio Device(s) CREATIVE Sound Blaster ZxR
Power Supply EVGA 1200 P2 Platinum
Mouse Logitech G900 / SS QCK
Keyboard Deck 87 Francium Pro
Software Windows 10 Pro x64
Intel harder and harder compete with new versions of AMD Zen Core.
When you add price on top really become ugly investing in Intel 10 core example, or even i9-9900K.
 

TheLostSwede

News Editor
Joined
Nov 11, 2004
Messages
16,056 (2.26/day)
Location
Sweden
System Name Overlord Mk MLI
Processor AMD Ryzen 7 7800X3D
Motherboard Gigabyte X670E Aorus Master
Cooling Noctua NH-D15 SE with offsets
Memory 32GB Team T-Create Expert DDR5 6000 MHz @ CL30-34-34-68
Video Card(s) Gainward GeForce RTX 4080 Phantom GS
Storage 1TB Solidigm P44 Pro, 2 TB Corsair MP600 Pro, 2TB Kingston KC3000
Display(s) Acer XV272K LVbmiipruzx 4K@160Hz
Case Fractal Design Torrent Compact
Audio Device(s) Corsair Virtuoso SE
Power Supply be quiet! Pure Power 12 M 850 W
Mouse Logitech G502 Lightspeed
Keyboard Corsair K70 Max
Software Windows 10 Pro
Benchmark Scores https://valid.x86.fr/5za05v
Top