• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

Why not one really big core?

hat

Enthusiast
Joined
Nov 20, 2006
Messages
21,731 (3.42/day)
Location
Ohio
System Name Starlifter :: Dragonfly
Processor i7 2600k 4.4GHz :: i5 10400
Motherboard ASUS P8P67 Pro :: ASUS Prime H570-Plus
Cooling Cryorig M9 :: Stock
Memory 4x4GB DDR3 2133 :: 2x8GB DDR4 2400
Video Card(s) PNY GTX1070 :: Integrated UHD 630
Storage Crucial MX500 1TB, 2x1TB Seagate RAID 0 :: Mushkin Enhanced 60GB SSD, 3x4TB Seagate HDD RAID5
Display(s) Onn 165hz 1080p :: Acer 1080p
Case Antec SOHO 1030B :: Old White Full Tower
Audio Device(s) Creative X-Fi Titanium Fatal1ty Pro - Bose Companion 2 Series III :: None
Power Supply FSP Hydro GE 550w :: EVGA Supernova 550
Software Windows 10 Pro - Plex Server on Dragonfly
Benchmark Scores >9000
I've seen many die shots detailing sections of a processor, and I'm sure many of us here have seen the same shots. All these chips with 4 or more cores... what if they just designed all that to be one big single core?
 

newtekie1

Semi-Retired Folder
Joined
Nov 22, 2005
Messages
28,472 (4.23/day)
Location
Indiana, USA
Processor Intel Core i7 10850K@5.2GHz
Motherboard AsRock Z470 Taichi
Cooling Corsair H115i Pro w/ Noctua NF-A14 Fans
Memory 32GB DDR4-3600
Video Card(s) RTX 2070 Super
Storage 500GB SX8200 Pro + 8TB with 1TB SSD Cache
Display(s) Acer Nitro VG280K 4K 28"
Case Fractal Design Define S
Audio Device(s) Onboard is good enough for me
Power Supply eVGA SuperNOVA 1000w G3
Software Windows 10 Pro x64
Eventually, now matter how big you make a single core, you reach the point where making it bigger and more complex doesn't really improve performance. So we need muti-core to increase processing power.

The reason all the cores aren't just clustered together into one big core is for easy of reconfiguration. It make it easy to disable parts of the CPU to make lower CPUs, or even just cutting out part of the CPU to make a low CPU. They design the big CPU first, with the knowledge they are going to carve out lower models from the die. So not having all the cores merged into one makes doing this easier.
 
Joined
Oct 2, 2004
Messages
13,791 (1.93/day)
That's not how chips or more specifically processors work.

Imagine that one core is an 8 lane highway. You can tweak it a bit to make traffic on it go faster back and forth (frequency and compute stages aka pipeline length) and maybe you can add 2 more lanes, but then it just becomes too impractical because it's too big. So, adding more lanes to it is not possible anymore and you can't make cars on it go at the speed of sound either. So, what do you do? You route another highway between places. And you make a second highway from some other direction. And a third one. And forth. Etc. That's basically multicore processors. A stack of fast highways because making one massive highway is impractical, but routing many of smaller ones is.

Same reason why workstation and compute cluster processors run at relatively low frequencies, but they have shit tons of cores. Some of it is reliability because high clocks are more difficult to maintain stable and because stupid high clocks only give you as much gain. Where every added core, even at slower clocks gives huge gains. So, thy just stack more and more cores together to get huge amounts of traffic through without all the inconveniences and impracticalities. It's similar with GPU's. Technically, with GPU's, every shader is a core. And modern ones have them in 4 digit figures. It just works better than having 8 shaders that have to run at the speed of light to be as efficient as those 4096 shaders running at just 1.6 GHz.

What is limiting us is the frequency and how electrons behave at such frequencies. The higher you go, the more problems you start encountering. It's why we aren't seeing any 20 GHz processors, instead we are somewhat capped at 5GHz. Anything beyond that requires extreme cooling to ease off the electrical issues we start facing in those scenarios. Just like you can't have cars going at the speed of sound on only 8 lane highways, just the same you can't make processors go at 20 GHz. Natural course to overcome that is to add more of slower parts and stack them up.

That's about as far as I could dumb it down so it should be easy to understand even for non techy people. Hope it helps.
 

qubit

Overclocked quantum bit
Joined
Dec 6, 2007
Messages
17,865 (2.99/day)
Location
Quantum Well UK
System Name Quantumville™
Processor Intel Core i7-2700K @ 4GHz
Motherboard Asus P8Z68-V PRO/GEN3
Cooling Noctua NH-D14
Memory 16GB (2 x 8GB Corsair Vengeance Black DDR3 PC3-12800 C9 1600MHz)
Video Card(s) MSI RTX 2080 SUPER Gaming X Trio
Storage Samsung 850 Pro 256GB | WD Black 4TB | WD Blue 6TB
Display(s) ASUS ROG Strix XG27UQR (4K, 144Hz, G-SYNC compatible) | Asus MG28UQ (4K, 60Hz, FreeSync compatible)
Case Cooler Master HAF 922
Audio Device(s) Creative Sound Blaster X-Fi Fatal1ty PCIe
Power Supply Corsair AX1600i
Mouse Microsoft Intellimouse Pro - Black Shadow
Keyboard Yes
Software Windows 10 Pro 64-bit
One big reason is the end of Moore's Law, which prevents clock speed from going ever higher which we used to see, until around 2003.

We should have been running at something like 15-20GHz if it had continued to scale.
 
Joined
Feb 8, 2012
Messages
3,013 (0.68/day)
Location
Zagreb, Croatia
System Name Windows 10 64-bit Core i7 6700
Processor Intel Core i7 6700
Motherboard Asus Z170M-PLUS
Cooling Corsair AIO
Memory 2 x 8 GB Kingston DDR4 2666
Video Card(s) Gigabyte NVIDIA GeForce GTX 1060 6GB
Storage Western Digital Caviar Blue 1 TB, Seagate Baracuda 1 TB
Display(s) Dell P2414H
Case Corsair Carbide Air 540
Audio Device(s) Realtek HD Audio
Power Supply Corsair TX v2 650W
Mouse Steelseries Sensei
Keyboard CM Storm Quickfire Pro, Cherry MX Reds
Software MS Windows 10 Pro 64-bit
CPUs are traditionally made for sequential algorithms that can be translated into ordered stream of instructions.
Generally speaking execution of any sequential algorithm always depend on a previous step because of the nature of a sequence, but there are also always many parts inside those steps that are mutually independent. More so, when compiled, depending how complex the instruction set is, there is much instruction level parallelism to be extracted from the machine code.
Super scalar processors (every cpu from this millennium) execute instructions in parallel on a single core in a single thread. Add to that thread level parallelism via SMT/HT.
How much instruction/thread level parallelism can be extracted from a code on a architecture, depends on code and architecture. It's always limited and not constant.
How many full cores can be added for true parallelism is much less limiting and scaling is constant.
Having one huge core would mean being able to exploit only instruction/thread level parallelism and not being able to run multiple independent hardware threads concurrently. Single thread performance may even be ok but it would be one power hungry cpu with so much unused powered die areas, and it would never see 100% usage.
 

silentbogo

Moderator
Staff member
Joined
Nov 20, 2013
Messages
5,473 (1.44/day)
Location
Kyiv, Ukraine
System Name WS#1337
Processor Ryzen 7 3800X
Motherboard ASUS X570-PLUS TUF Gaming
Cooling Xigmatek Scylla 240mm AIO
Memory 4x8GB Samsung DDR4 ECC UDIMM
Video Card(s) Inno3D RTX 3070 Ti iChill
Storage ADATA Legend 2TB + ADATA SX8200 Pro 1TB
Display(s) Samsung U24E590D (4K/UHD)
Case ghetto CM Cosmos RC-1000
Audio Device(s) ALC1220
Power Supply SeaSonic SSR-550FX (80+ GOLD)
Mouse Logitech G603
Keyboard Modecom Volcano Blade (Kailh choc LP)
VR HMD Google dreamview headset(aka fancy cardboard)
Software Windows 11, Ubuntu 20.04 LTS
what if they just designed all that to be one big single core?
One big drawback - context switching.
Each CPU core has registers, ALU(s), FPU(s) which are available to a single program(thread) running at a time. Every time you switch to another thread, you need to dump the contents of all registers and the current CPU state (status register) onto the stack (either memory, or cache if used as memory), load all of the above for thread #2 and keep on going with execution. If you did any assembly programming or at least familiar with basic CPU inner workings, you should know that anything involving reading or writing to memory is slo-o-o-ow comparing to any other instruction execution. This is a significant delay, considering that modern PCs run hundreds or even thousands of threads. Hyperthreading just gives you the ability to kind-of run two threads simultaneously, given that one thread won't use all of the core resources at once, but it still does not eliminate the problem of context switching.
Having a quad-core CPU reduces the switching time by a factor of 4 and you don't want to sacrifice that in favor of some mythical mega-single-core performance.
 
Joined
Jan 8, 2017
Messages
8,929 (3.36/day)
System Name Good enough
Processor AMD Ryzen R9 7900 - Alphacool Eisblock XPX Aurora Edge
Motherboard ASRock B650 Pro RS
Cooling 2x 360mm NexXxoS ST30 X-Flow, 1x 360mm NexXxoS ST30, 1x 240mm NexXxoS ST30
Memory 32GB - FURY Beast RGB 5600 Mhz
Video Card(s) Sapphire RX 7900 XT - Alphacool Eisblock Aurora
Storage 1x Kingston KC3000 1TB 1x Kingston A2000 1TB, 1x Samsung 850 EVO 250GB , 1x Samsung 860 EVO 500GB
Display(s) LG UltraGear 32GN650-B + 4K Samsung TV
Case Phanteks NV7
Power Supply GPS-750C
There are two way to make a CPU core faster , make it execute instructions faster or make it execute more instructions concurrently. The limitations involving the first aspect of it are pretty obvious , clock speed is limited and at this point pretty much all instruction are as efficient as they can possibly get.

Making a CPU execute things concurrently is more complicated because there is so much instruction level parallelism you can achieve at any given moment.

Modern CPUs have something called an "instruction window" , which basically says how many instructions can it look at ahead of time to figure out which can be executed in parallel. Problem is the larger the instruction window the less effective it is because programs branch out very often. You do have something called a "branch predictor" to deal with that , but again , this has it's limitations as well. Essentially you can't realistically look ahead of time throughout all the programs you have to run and figure out the right decisions so that you can execute everything at once and get it right enough of the time to make it work.

So to wrap it up , making one big core would involve a lot of diminishing returns therefore it's not really worth it.

We should have been running at something like 15-20GHz if it had continued to scale.

That's a common misconception. Clock rate did scale pretty much how everyone expected it to , the problem is core width and count did as well , that's why you don't see 20 Ghz processors. I am pretty sure you can have a 20 Ghz 80186 if you wanted to.
 
Joined
Feb 8, 2012
Messages
3,013 (0.68/day)
Location
Zagreb, Croatia
System Name Windows 10 64-bit Core i7 6700
Processor Intel Core i7 6700
Motherboard Asus Z170M-PLUS
Cooling Corsair AIO
Memory 2 x 8 GB Kingston DDR4 2666
Video Card(s) Gigabyte NVIDIA GeForce GTX 1060 6GB
Storage Western Digital Caviar Blue 1 TB, Seagate Baracuda 1 TB
Display(s) Dell P2414H
Case Corsair Carbide Air 540
Audio Device(s) Realtek HD Audio
Power Supply Corsair TX v2 650W
Mouse Steelseries Sensei
Keyboard CM Storm Quickfire Pro, Cherry MX Reds
Software MS Windows 10 Pro 64-bit
I am pretty sure you can have a 20 Ghz 80186 if you wanted to.
Nope, there's this thing called transistor switching speed and even if silicon fin fets were that fast, pipeline would have to be unmanageable level deep
 
Joined
Jan 8, 2017
Messages
8,929 (3.36/day)
System Name Good enough
Processor AMD Ryzen R9 7900 - Alphacool Eisblock XPX Aurora Edge
Motherboard ASRock B650 Pro RS
Cooling 2x 360mm NexXxoS ST30 X-Flow, 1x 360mm NexXxoS ST30, 1x 240mm NexXxoS ST30
Memory 32GB - FURY Beast RGB 5600 Mhz
Video Card(s) Sapphire RX 7900 XT - Alphacool Eisblock Aurora
Storage 1x Kingston KC3000 1TB 1x Kingston A2000 1TB, 1x Samsung 850 EVO 250GB , 1x Samsung 860 EVO 500GB
Display(s) LG UltraGear 32GN650-B + 4K Samsung TV
Case Phanteks NV7
Power Supply GPS-750C
Nope, there's this thing called transistor switching speed and even if silicon fin fets were that fast, pipeline would have to be unmanageable level deep

:) I know but what I was trying to say was that clock speed would have increased by a lot more if there weren't any other additions throughout time that introduced more concerns about power consumption , on-die-latency , etc.
 
Joined
Nov 13, 2007
Messages
10,232 (1.70/day)
Location
Austin Texas
Processor 13700KF Undervolted @ 5.6/ 5.5, 4.8Ghz Ring 200W PL1
Motherboard MSI 690-I PRO
Cooling Thermalright Peerless Assassin 120 w/ Arctic P12 Fans
Memory 48 GB DDR5 7600 MHZ CL36
Video Card(s) RTX 4090 FE
Storage 2x 2TB WDC SN850, 1TB Samsung 960 prr
Display(s) Alienware 32" 4k 240hz OLED
Case SLIGER S620
Audio Device(s) Yes
Power Supply Corsair SF750
Mouse Xlite V2
Keyboard RoyalAxe
Software Windows 11
Benchmark Scores They're pretty good, nothing crazy.
token inaccurate and unrelated car analogy: It would be like having a one cylinder engine, bro.
 

Aquinus

Resident Wat-man
Joined
Jan 28, 2012
Messages
13,147 (2.94/day)
Location
Concord, NH, USA
System Name Apollo
Processor Intel Core i9 9880H
Motherboard Some proprietary Apple thing.
Memory 64GB DDR4-2667
Video Card(s) AMD Radeon Pro 5600M, 8GB HBM2
Storage 1TB Apple NVMe, 4TB External
Display(s) Laptop @ 3072x1920 + 2x LG 5k Ultrafine TB3 displays
Case MacBook Pro (16", 2019)
Audio Device(s) AirPods Pro, Sennheiser HD 380s w/ FIIO Alpen 2, or Logitech 2.1 Speakers
Power Supply 96w Power Adapter
Mouse Logitech MX Master 3
Keyboard Logitech G915, GL Clicky
Software MacOS 12.1
token inaccurate and unrelated car analogy: It would be like having a one cylinder engine, bro.
This. It's like saying a 1-cylinder engine that's 2.0L large is just as good as a 4-cylinder engine with the same displacement. In theory, they both can breathe the same amount of air which would means it can produce the same amount of power but, you don't consider things like, how balanced is the engine and how smooth is power delivery? The actually similar part of this is actually with engine balance. A big single cylinder engine would need huge balancing shafts to counteract the movement of the single piston and connecting rod. This limits the RPM, so while it can breathe the same, you're limited by the the speed at which the engine can run. CPUs will be the same kind of way in the sense that a larger monolithic circuit is less likely to operate at higher frequencies because of losses as circuit length increases as well as latency that needs to be considered because electricity can't travel faster than the speed of light (and actually typically goes slower in most circumstances.)

So yes, you could build a huge monolithic core. You would get something like what SPARC has with a crap ton of integer cores paired with a single FPU. The result in some operations being able to be effectively executed in parallel but, it does absolutely nothing for clock speeds. In fact, I would definitely say that it gets worse. Generally speaking, I believe smaller cores yield better results and have a much better opportunity to scale. Large dies don't and have the added issue of yields being lower since it's not like you can disable one core if it's bad or something for lower models. All in all, it's not a cost effective or a good way to improve performance... and it costs more money. All in all, it's just a bad idea.
 
Joined
Sep 17, 2014
Messages
20,917 (5.97/day)
Location
The Washing Machine
Processor i7 8700k 4.6Ghz @ 1.24V
Motherboard AsRock Fatal1ty K6 Z370
Cooling beQuiet! Dark Rock Pro 3
Memory 16GB Corsair Vengeance LPX 3200/C16
Video Card(s) ASRock RX7900XT Phantom Gaming
Storage Samsung 850 EVO 1TB + Samsung 830 256GB + Crucial BX100 250GB + Toshiba 1TB HDD
Display(s) Gigabyte G34QWC (3440x1440)
Case Fractal Design Define R5
Audio Device(s) Harman Kardon AVR137 + 2.1
Power Supply EVGA Supernova G2 750W
Mouse XTRFY M42
Keyboard Lenovo Thinkpad Trackpoint II
Software W10 x64
For the same reason two hands can do more than one, or two sets of eyes see more than one.

We don't need car analogies :)
 

Aquinus

Resident Wat-man
Joined
Jan 28, 2012
Messages
13,147 (2.94/day)
Location
Concord, NH, USA
System Name Apollo
Processor Intel Core i9 9880H
Motherboard Some proprietary Apple thing.
Memory 64GB DDR4-2667
Video Card(s) AMD Radeon Pro 5600M, 8GB HBM2
Storage 1TB Apple NVMe, 4TB External
Display(s) Laptop @ 3072x1920 + 2x LG 5k Ultrafine TB3 displays
Case MacBook Pro (16", 2019)
Audio Device(s) AirPods Pro, Sennheiser HD 380s w/ FIIO Alpen 2, or Logitech 2.1 Speakers
Power Supply 96w Power Adapter
Mouse Logitech MX Master 3
Keyboard Logitech G915, GL Clicky
Software MacOS 12.1
For the same reason two hands can do more than one, or two sets of eyes see more than one.

We don't need car analogies :)
Well, I think that's even more inaccurate because 1 core can do everything two cores can do. There are tasks that (typically,) require 2 hands to be done effectively. Same thing with sight, two eyes helps with things like depth. The car analogy works because, like engines, there are consequences to making things too big.
 
Joined
Sep 17, 2014
Messages
20,917 (5.97/day)
Location
The Washing Machine
Processor i7 8700k 4.6Ghz @ 1.24V
Motherboard AsRock Fatal1ty K6 Z370
Cooling beQuiet! Dark Rock Pro 3
Memory 16GB Corsair Vengeance LPX 3200/C16
Video Card(s) ASRock RX7900XT Phantom Gaming
Storage Samsung 850 EVO 1TB + Samsung 830 256GB + Crucial BX100 250GB + Toshiba 1TB HDD
Display(s) Gigabyte G34QWC (3440x1440)
Case Fractal Design Define R5
Audio Device(s) Harman Kardon AVR137 + 2.1
Power Supply EVGA Supernova G2 750W
Mouse XTRFY M42
Keyboard Lenovo Thinkpad Trackpoint II
Software W10 x64
Well, I think that's even more inaccurate because 1 core can do everything two cores can do. There are tasks that (typically,) require 2 hands to be done effectively. Same thing with sight, two eyes helps with things like depth. The car analogy works because, like engines, there are consequences to making things too big.

Nope, 1 core can't do everything simultaneously while two cores can, it has limited resources/access capabilities that a single type of task can saturate. Also, there is the sequential nature of things, and with two cores, you can do two sequential things side by side.
 

Aquinus

Resident Wat-man
Joined
Jan 28, 2012
Messages
13,147 (2.94/day)
Location
Concord, NH, USA
System Name Apollo
Processor Intel Core i9 9880H
Motherboard Some proprietary Apple thing.
Memory 64GB DDR4-2667
Video Card(s) AMD Radeon Pro 5600M, 8GB HBM2
Storage 1TB Apple NVMe, 4TB External
Display(s) Laptop @ 3072x1920 + 2x LG 5k Ultrafine TB3 displays
Case MacBook Pro (16", 2019)
Audio Device(s) AirPods Pro, Sennheiser HD 380s w/ FIIO Alpen 2, or Logitech 2.1 Speakers
Power Supply 96w Power Adapter
Mouse Logitech MX Master 3
Keyboard Logitech G915, GL Clicky
Software MacOS 12.1
Nope, 1 core can't do everything simultaneously while two cores can, it has limited resources/access capabilities that a single type of task can saturate. Also, there is the sequential nature of things, and with two cores, you can do two sequential things side by side.
...it can do the same tasks. Might take longer but, it's not incapable of doing it. When you lose an eye, you lose something, a capability of seeing some level of depth that you had before. More cores just means you can do more at once, you're not gaining the ability to do something you couldn't do before.

In a technical sense, a single core is still a Turing-complete computation engine. Adding cores doesn't change that.
 

FordGT90Concept

"I go fast!1!11!1!"
Joined
Oct 13, 2008
Messages
26,259 (4.63/day)
Location
IA, USA
System Name BY-2021
Processor AMD Ryzen 7 5800X (65w eco profile)
Motherboard MSI B550 Gaming Plus
Cooling Scythe Mugen (rev 5)
Memory 2 x Kingston HyperX DDR4-3200 32 GiB
Video Card(s) AMD Radeon RX 7900 XT
Storage Samsung 980 Pro, Seagate Exos X20 TB 7200 RPM
Display(s) Nixeus NX-EDG274K (3840x2160@144 DP) + Samsung SyncMaster 906BW (1440x900@60 HDMI-DVI)
Case Coolermaster HAF 932 w/ USB 3.0 5.25" bay + USB 3.2 (A+C) 3.5" bay
Audio Device(s) Realtek ALC1150, Micca OriGen+
Power Supply Enermax Platimax 850w
Mouse Nixeus REVEL-X
Keyboard Tesoro Excalibur
Software Windows 10 Home 64-bit
Benchmark Scores Faster than the tortoise; slower than the hare.
Some great responses above. To them, I'll add one point: Processors are a race against time. The more work it can do in a given timeframe, the more powerful it is. It is impossible to make one logic circuit do the work of 16+ in parallel in the same power envelope because of physics.

To extend that: a clock is literally a window in which a processor can perform an action. If you have a dual core running at 1 GHz, it would take a single core of approximately 1.8 GHz to equal it (90% efficiency accounting for thread management overhead). Quad core, 3.6 GHz. Octo core, 7.2 GHz. I'm already up to clock speeds that aren't possible and we already have octo-cores running close to 4 GHz which would require a single core operating at a staggering 28.8 GHz to match. There simply isn't enough time in a second to cycle a processor 28.8 billion times without parallelism.
 

FreedomEclipse

~Technological Technocrat~
Joined
Apr 20, 2007
Messages
23,363 (3.76/day)
Location
London,UK
System Name Codename: Icarus Mk.VI
Processor Intel 8600k@Stock -- pending tuning
Motherboard Asus ROG Strixx Z370-F
Cooling CPU: BeQuiet! Dark Rock Pro 4 {1xCorsair ML120 Pro|5xML140 Pro}
Memory 32GB XPG Gammix D10 {2x16GB}
Video Card(s) ASUS Dual Radeon™ RX 6700 XT OC Edition
Storage Samsung 970 Evo 512GB SSD (Boot)|WD SN770 (Gaming)|2x 3TB Toshiba DT01ACA300|2x 2TB Crucial BX500
Display(s) LG GP850-B
Case Corsair 760T (White)
Audio Device(s) Yamaha RX-V573|Speakers: JBL Control One|Auna 300-CN|Wharfedale Diamond SW150
Power Supply Corsair AX760
Mouse Logitech G900
Keyboard Duckyshine Dead LED(s) III
Software Windows 10 Pro
Benchmark Scores (ノಠ益ಠ)ノ彡┻━┻
I've seen many die shots detailing sections of a processor, and I'm sure many of us here have seen the same shots. All these chips with 4 or more cores... what if they just designed all that to be one big single core?

Im more in the field of AMD.... They know how to make Dual Core, Quad Core and Hexa Core CPUs, why not apply that knowledge and make DC, QC or HC GPU Processors??
 

FordGT90Concept

"I go fast!1!11!1!"
Joined
Oct 13, 2008
Messages
26,259 (4.63/day)
Location
IA, USA
System Name BY-2021
Processor AMD Ryzen 7 5800X (65w eco profile)
Motherboard MSI B550 Gaming Plus
Cooling Scythe Mugen (rev 5)
Memory 2 x Kingston HyperX DDR4-3200 32 GiB
Video Card(s) AMD Radeon RX 7900 XT
Storage Samsung 980 Pro, Seagate Exos X20 TB 7200 RPM
Display(s) Nixeus NX-EDG274K (3840x2160@144 DP) + Samsung SyncMaster 906BW (1440x900@60 HDMI-DVI)
Case Coolermaster HAF 932 w/ USB 3.0 5.25" bay + USB 3.2 (A+C) 3.5" bay
Audio Device(s) Realtek ALC1150, Micca OriGen+
Power Supply Enermax Platimax 850w
Mouse Nixeus REVEL-X
Keyboard Tesoro Excalibur
Software Windows 10 Home 64-bit
Benchmark Scores Faster than the tortoise; slower than the hare.
Overhead. SLI/Crossfire worked by sending all the things to all the cards and the driver instructing each card what to do with the data individually. This approach is somewhat efficient at doing work but wasteful on cache resources.

If they're going to make SLI/Crossfire work in silicon, they need something like Vega's HBCC to take care of the cache management. HBCC was a necessary step towards Navi which is a dual (or more) core GPU.
 
Joined
Jan 8, 2017
Messages
8,929 (3.36/day)
System Name Good enough
Processor AMD Ryzen R9 7900 - Alphacool Eisblock XPX Aurora Edge
Motherboard ASRock B650 Pro RS
Cooling 2x 360mm NexXxoS ST30 X-Flow, 1x 360mm NexXxoS ST30, 1x 240mm NexXxoS ST30
Memory 32GB - FURY Beast RGB 5600 Mhz
Video Card(s) Sapphire RX 7900 XT - Alphacool Eisblock Aurora
Storage 1x Kingston KC3000 1TB 1x Kingston A2000 1TB, 1x Samsung 850 EVO 250GB , 1x Samsung 860 EVO 500GB
Display(s) LG UltraGear 32GN650-B + 4K Samsung TV
Case Phanteks NV7
Power Supply GPS-750C
Apple is a prime example of designing wider and wider cores compared to other ARM or semi-custom cores. Each year they extract more and more performance without much increase in power consumption and with untouched battery life.

But it's not for free , they most likely pay more than any other competitor for their dies as they are bigger and more prone to defects. And it's not just that , at some point this tactic will bite them back the way Netburst did to Intel. They will reach a point when they wont be able to increase their cores size by much and they will end up with an incredibly complicated monster of a core whose power consumption will skyrocket even with a minimal increase in clock speed.
 

qubit

Overclocked quantum bit
Joined
Dec 6, 2007
Messages
17,865 (2.99/day)
Location
Quantum Well UK
System Name Quantumville™
Processor Intel Core i7-2700K @ 4GHz
Motherboard Asus P8Z68-V PRO/GEN3
Cooling Noctua NH-D14
Memory 16GB (2 x 8GB Corsair Vengeance Black DDR3 PC3-12800 C9 1600MHz)
Video Card(s) MSI RTX 2080 SUPER Gaming X Trio
Storage Samsung 850 Pro 256GB | WD Black 4TB | WD Blue 6TB
Display(s) ASUS ROG Strix XG27UQR (4K, 144Hz, G-SYNC compatible) | Asus MG28UQ (4K, 60Hz, FreeSync compatible)
Case Cooler Master HAF 922
Audio Device(s) Creative Sound Blaster X-Fi Fatal1ty PCIe
Power Supply Corsair AX1600i
Mouse Microsoft Intellimouse Pro - Black Shadow
Keyboard Yes
Software Windows 10 Pro 64-bit
That's a common misconception. Clock rate did scale pretty much how everyone expected it to , the problem is core width and count did as well , that's why you don't see 20 Ghz processors. I am pretty sure you can have a 20 Ghz 80186 if you wanted to.
Not really. The problem is the amount of power consumed by the CPU as the clock speed goes up, which goes up massively. Yes, you can get a very simple chip to run at that speed, but I doubt that even a 486 will run that quickly without consuming masses of power and putting out gobs of heat.

The original Pentium 4 was intended to run at 10GHz and above on later models, but that didn't happen due to this problem. If you Google Pentium 4, I'm sure you'll find articles that explain this issue. The only way to go after that was sideways, ie multicore.

From what I can see, a slower multicore is better than a faster single core, because it's harder to bottleneck it, so this situation is probably not altogether a bad thing.
 
Joined
Jul 25, 2006
Messages
12,137 (1.87/day)
Location
Nebraska, USA
System Name Brightworks Systems BWS-6 E-IV
Processor Intel Core i5-6600 @ 3.9GHz
Motherboard Gigabyte GA-Z170-HD3 Rev 1.0
Cooling Quality case, 2 x Fractal Design 140mm fans, stock CPU HSF
Memory 32GB (4 x 8GB) DDR4 3000 Corsair Vengeance
Video Card(s) EVGA GEForce GTX 1050Ti 4Gb GDDR5
Storage Samsung 850 Pro 256GB SSD, Samsung 860 Evo 500GB SSD
Display(s) Samsung S24E650BW LED x 2
Case Fractal Design Define R4
Power Supply EVGA Supernova 550W G2 Gold
Mouse Logitech M190
Keyboard Microsoft Wireless Comfort 5050
Software W10 Pro 64-bit
All these chips with 4 or more cores... what if they just designed all that to be one big single core?
Because BY FAR most computing tasks (number crunching tasks) are tiny. When you hand off 4 small tasks to 4 small cores, those tasks are completed in the same amount of time as a big single core takes to complete just 1 of those small tasks. This is because if you have only one core, you have to hand those tasks off one at a time sequentially - waiting for the first to be completed before even starting on the second.

And as qubit and others noted, Moore's Law applies. Processors can only work so fast (the Laws of Physics gets in the way). Those 4 small cores are working at the same clock speed as one big core.

So because there are 4 small cores vs 1 big core, and because most tasks are tiny, even if the 4-core processor has a slower clock speed, for most computing jobs, it will be faster than a faster clocked 1 core processor.

Remember too while you, the user, may only be doing one thing at a time with your computer, the operating system is doing many things at once. It is multi-tasking big time. It is managing memory, updating graphics information, scanning every I/O port, waiting for mouse and keyboard input, checking the Internet for updates, scanning for malware and so much more.
 

cakehunter

New Member
Joined
Nov 22, 2017
Messages
27 (0.01/day)
System Name Main / Backup
Processor FX 8350 (ebay) / Athlon 64 x2 6400+ (ebay)
Motherboard Gigabyte 990FXA UD3 Revision 4.0 (ebay) / Abit KN9 (brother`s in law)
Cooling Stock Am3 heatsing with ziptied fan
Memory G.skill 2400 4GB x 2 TridentX (ebay) / 8GB Mix of G.skill and Corsair DDR2 800Mhz @ CAS 5 (ebay)
Video Card(s) XFX Nvidia 9800 GTX (ebay)
Storage WD Black 1 TB Sata 3 (retail)
Power Supply Corsair CXM 550W (retail)
Contrary to popular beliefs, you can multitask on 1 core... be it old athlon/duron or p3/p4.
Running a game, playing a mp3, a text editor in background, browser, voice-comms (okay skype a bad example), managing plug and play USB- all of this was possible on 1 core without major slowdowns, if your processor was adequate for the task (adequate being relative, cause you dont run a datacenter off todays i7 or Ryzen).

Power saving was introduced with Athlon I believe and P4? Because of Spectre/Meltdown now we all know what branch prediction is, and it was present on Athlons and P4`s already.
But I think more cores arent neccessarily the solution. Imagine 1 million cores, each corse running a simple trivial task, say 1+1 or 1+2. Now this needs some managing seeing what core is free, what is running what, and what is the result of that calculation, it will need a almost a separate managing processor. IIRC there is some stuff in x86 that runs at 1mhz (or was it 100 mhz?) always and governs some internal stuff. Some popular reposting target:
 

FordGT90Concept

"I go fast!1!11!1!"
Joined
Oct 13, 2008
Messages
26,259 (4.63/day)
Location
IA, USA
System Name BY-2021
Processor AMD Ryzen 7 5800X (65w eco profile)
Motherboard MSI B550 Gaming Plus
Cooling Scythe Mugen (rev 5)
Memory 2 x Kingston HyperX DDR4-3200 32 GiB
Video Card(s) AMD Radeon RX 7900 XT
Storage Samsung 980 Pro, Seagate Exos X20 TB 7200 RPM
Display(s) Nixeus NX-EDG274K (3840x2160@144 DP) + Samsung SyncMaster 906BW (1440x900@60 HDMI-DVI)
Case Coolermaster HAF 932 w/ USB 3.0 5.25" bay + USB 3.2 (A+C) 3.5" bay
Audio Device(s) Realtek ALC1150, Micca OriGen+
Power Supply Enermax Platimax 850w
Mouse Nixeus REVEL-X
Keyboard Tesoro Excalibur
Software Windows 10 Home 64-bit
Benchmark Scores Faster than the tortoise; slower than the hare.
Imagine 1 million cores, each corse running a simple trivial task, say 1+1 or 1+2. Now this needs some managing seeing what core is free, what is running what, and what is the result of that calculation, it will need a almost a separate managing processor.
That's basically a GPU.
 
Joined
Jul 25, 2006
Messages
12,137 (1.87/day)
Location
Nebraska, USA
System Name Brightworks Systems BWS-6 E-IV
Processor Intel Core i5-6600 @ 3.9GHz
Motherboard Gigabyte GA-Z170-HD3 Rev 1.0
Cooling Quality case, 2 x Fractal Design 140mm fans, stock CPU HSF
Memory 32GB (4 x 8GB) DDR4 3000 Corsair Vengeance
Video Card(s) EVGA GEForce GTX 1050Ti 4Gb GDDR5
Storage Samsung 850 Pro 256GB SSD, Samsung 860 Evo 500GB SSD
Display(s) Samsung S24E650BW LED x 2
Case Fractal Design Define R4
Power Supply EVGA Supernova 550W G2 Gold
Mouse Logitech M190
Keyboard Microsoft Wireless Comfort 5050
Software W10 Pro 64-bit
Contrary to popular beliefs, you can multitask on 1 core... be it old athlon/duron or p3/p4.
But that is not true multitasking. It just seems like multitasking because it is happening so fast. It is still juggling with one hand. You can only have one ball in the hand at once.
 
Joined
Mar 4, 2006
Messages
448 (0.07/day)
Not really. The problem is the amount of power consumed by the CPU as the clock speed goes up, which goes up massively. Yes, you can get a very simple chip to run at that speed, but I doubt that even a 486 will run that quickly without consuming masses of power and putting out gobs of heat.

The original Pentium 4 was intended to run at 10GHz and above on later models, but that didn't happen due to this problem. If you Google Pentium 4, I'm sure you'll find articles that explain this issue. The only way to go after that was sideways, ie multicore.

From what I can see, a slower multicore is better than a faster single core, because it's harder to bottleneck it, so this situation is probably not altogether a bad thing.
It's not even solely a heat or power issue. The fact of the matter is the substrate simply can't handle those speeds and "leakage" pops up long before you ever get there. It would be like saying you can take a small one-cylinder engine and compensate for its lack of cylinders and displacement by making it run 100.000 rpm. Not only will that not yield the performance you're looking for, the engine will simply blow up because you are running into a whole scope of issues that simply don't exist below 10.000 rpm.
 
Top