• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

Core i7 940 Review Shows SMT and Tri-Channel Memory Let-down

FordGT90Concept

"I go fast!1!11!1!"
Joined
Oct 13, 2008
Messages
26,259 (4.65/day)
Location
IA, USA
System Name BY-2021
Processor AMD Ryzen 7 5800X (65w eco profile)
Motherboard MSI B550 Gaming Plus
Cooling Scythe Mugen (rev 5)
Memory 2 x Kingston HyperX DDR4-3200 32 GiB
Video Card(s) AMD Radeon RX 7900 XT
Storage Samsung 980 Pro, Seagate Exos X20 TB 7200 RPM
Display(s) Nixeus NX-EDG274K (3840x2160@144 DP) + Samsung SyncMaster 906BW (1440x900@60 HDMI-DVI)
Case Coolermaster HAF 932 w/ USB 3.0 5.25" bay + USB 3.2 (A+C) 3.5" bay
Audio Device(s) Realtek ALC1150, Micca OriGen+
Power Supply Enermax Platimax 850w
Mouse Nixeus REVEL-X
Keyboard Tesoro Excalibur
Software Windows 10 Home 64-bit
Benchmark Scores Faster than the tortoise; slower than the hare.
I think what it really boils down to is the IMC in Nehalem is over engineered and slower because of it. The designed the IMC to handle not just four cores, but eight, and maybe even more (as much as 72). I doubt we will see any significant changes to the IMC as far off as Sandybridge chips.

But yeah... maybe there is a bug in it. When AMD shifted to IMC, they got a huge memory performance boost. Intel appears to be taking a hit instead. CIS is a more complex scheme but you'd think that would show throw in more than just gaming benchmarks.

I just wonder how what Nehalem runs with FB-DIMMs.
 

DarkMatter

New Member
Joined
Oct 5, 2007
Messages
1,714 (0.28/day)
Processor Intel C2Q Q6600 @ Stock (for now)
Motherboard Asus P5Q-E
Cooling Proc: Scythe Mine, Graphics: Zalman VF900 Cu
Memory 4 GB (2x2GB) DDR2 Corsair Dominator 1066Mhz 5-5-5-15
Video Card(s) GigaByte 8800GT Stock Clocks: 700Mhz Core, 1700 Shader, 1940 Memory
Storage 74 GB WD Raptor 10000rpm, 2x250 GB Seagate Raid 0
Display(s) HP p1130, 21" Trinitron
Case Antec p180
Audio Device(s) Creative X-Fi PLatinum
Power Supply 700W FSP Group 85% Efficiency
Software Windows XP
I think what it really boils down to is the IMC in Nehalem is over engineered and slower because of it. The designed the IMC to handle not just four cores, but eight, and maybe even more (as much as 72). I doubt we will see any significant changes to the IMC as far off as Sandybridge chips.

But yeah... maybe there is a bug in it. When AMD shifted to IMC, they got a huge memory performance boost. Intel appears to be taking a hit instead. CIS is a more complex scheme but you'd think that would show throw in more than just gaming benchmarks.

I just wonder how what Nehalem runs with FB-DIMMs.

What are you talking about? A hit? Core2 peaks at 8000 MB/s while Nehalem does 15000 MB/s, both in dual channel, where do you see a hit there?
 

Wile E

Power User
Joined
Oct 1, 2006
Messages
24,318 (3.81/day)
System Name The ClusterF**k
Processor 980X @ 4Ghz
Motherboard Gigabyte GA-EX58-UD5 BIOS F12
Cooling MCR-320, DDC-1 pump w/Bitspower res top (1/2" fittings), Koolance CPU-360
Memory 3x2GB Mushkin Redlines 1600Mhz 6-8-6-24 1T
Video Card(s) Evga GTX 580
Storage Corsair Neutron GTX 240GB, 2xSeagate 320GB RAID0; 2xSeagate 3TB; 2xSamsung 2TB; Samsung 1.5TB
Display(s) HP LP2475w 24" 1920x1200 IPS
Case Technofront Bench Station
Audio Device(s) Auzentech X-Fi Forte into Onkyo SR606 and Polk TSi200's + RM6750
Power Supply ENERMAX Galaxy EVO EGX1250EWT 1250W
Software Win7 Ultimate N x64, OSX 10.8.4
I knew you'd say that.
So take a look at the results from certified ones later.

They will be exactly the same. ;)

I also still fail to see how you could do quad pumped SDRAM, that is that each memory cell performs 4 ops per clock cycle. And I also don't understand what would be the benefit of that, versus a DDR RAM with double the speed. I.e if your memory cells can perform 1600MT/s wouldn't it be better (simpler, easy to implement, cheaper...) a DDR running at 800Mhz than a "QDR" at 400Mhz?
I don't know much about the technical side of ram, but what about GDDR5? It's rated as QDR, and as far as I was aware, it gets it's roots from SDRAM as well.

As a side note, do you think triple channel would come in handy on lower speed modules? Say, DDR3 1066 cas7?
 
Last edited:
Joined
Aug 15, 2008
Messages
5,941 (1.04/day)
Location
Watauga, Texas
System Name Univac SLI Edition
Processor Intel Xeon 1650 V3 @ 4.2GHz
Motherboard eVGA X99 FTW K
Cooling EK Supremacy EVO, Swiftech MCP50x, Alphacool NeXXos UT60 360, Black Ice GTX 360
Memory 2x16GB Corsair Vengeance LPX 3000MHz
Video Card(s) Nvidia Titan X Tri-SLI w/ EK Blocks
Storage HyperX Predator 240GB PCI-E, Samsung 850 Pro 512GB
Display(s) Dell UltraSharp 34" Ultra-Wide (U3415W) / (Samsung 48" Curved 4k)
Case Phanteks Enthoo Pro M Acrylic Edition
Audio Device(s) Sound Blaster Z
Power Supply Thermaltake 1350watt Toughpower Modular
Mouse Logitech G502
Keyboard CODE 10 keyless MX Clears
Software Windows 10 Pro
Well, I wouldn't say so. Just because Nehalem doesn't seem capable of using all the bandwidth it has unlocked by himself, that doesn't mean it's a fail. It's still way faster than Core2 clock for clock and has a lot more bandwidth even on single channel mode! SMT also seems to work very well.

It's maybe not worth it for games, but for everything else is faster enough to justify the expenses for many people (not me TBH), just as any other new CPU. Don't forget it's suposed to be aimed at the server market. Off course an upgrade from a C2Q is not worth it either, but if you were to build a completely new PC and you care about more than gaming, Nehalem is worth a look or two.

I only game so its not worth it to me to even have a quad really. 4+ghz dualy is ideal for me :rockout:
 

FordGT90Concept

"I go fast!1!11!1!"
Joined
Oct 13, 2008
Messages
26,259 (4.65/day)
Location
IA, USA
System Name BY-2021
Processor AMD Ryzen 7 5800X (65w eco profile)
Motherboard MSI B550 Gaming Plus
Cooling Scythe Mugen (rev 5)
Memory 2 x Kingston HyperX DDR4-3200 32 GiB
Video Card(s) AMD Radeon RX 7900 XT
Storage Samsung 980 Pro, Seagate Exos X20 TB 7200 RPM
Display(s) Nixeus NX-EDG274K (3840x2160@144 DP) + Samsung SyncMaster 906BW (1440x900@60 HDMI-DVI)
Case Coolermaster HAF 932 w/ USB 3.0 5.25" bay + USB 3.2 (A+C) 3.5" bay
Audio Device(s) Realtek ALC1150, Micca OriGen+
Power Supply Enermax Platimax 850w
Mouse Nixeus REVEL-X
Keyboard Tesoro Excalibur
Software Windows 10 Home 64-bit
Benchmark Scores Faster than the tortoise; slower than the hare.
What are you talking about? A hit? Core2 peaks at 8000 MB/s while Nehalem does 15000 MB/s, both in dual channel, where do you see a hit there?
Game performance which tighter timings yield better FPS than higher bandwidth. Games don't need a lot of bandwidth (only enough to satisfy the engaged cores) but, they need very quick response times.

Maybe the IMC/memory has little to do with poor Nehalem game performance which reminds me of something else. Nehalem's architecture has strong ties to Pentium 4 w/ Hyperthreading more so than Core 2 architecture. We all remember how Athlon 64 was the better gamer but Pentium 4 w/ Hyperthreading took the cake in terms of multimedia. That pretty much explains everything.
 
Last edited:

e6600

New Member
Joined
Sep 28, 2008
Messages
43 (0.01/day)
right now, a Q9550 E0 should last us many years. very good price:performance chip
 

DarkMatter

New Member
Joined
Oct 5, 2007
Messages
1,714 (0.28/day)
Processor Intel C2Q Q6600 @ Stock (for now)
Motherboard Asus P5Q-E
Cooling Proc: Scythe Mine, Graphics: Zalman VF900 Cu
Memory 4 GB (2x2GB) DDR2 Corsair Dominator 1066Mhz 5-5-5-15
Video Card(s) GigaByte 8800GT Stock Clocks: 700Mhz Core, 1700 Shader, 1940 Memory
Storage 74 GB WD Raptor 10000rpm, 2x250 GB Seagate Raid 0
Display(s) HP p1130, 21" Trinitron
Case Antec p180
Audio Device(s) Creative X-Fi PLatinum
Power Supply 700W FSP Group 85% Efficiency
Software Windows XP
They will be exactly the same. ;)


I don't know much about the technical side of ram, but what about GDDR5? It's rated as QDR, and as far as I was aware, it gets it's roots from SDRAM as well.

As a side note, do you think triple channel would come in handy on lower speed modules? Say, DDR3 1066 cas7?

I have noticed there's an insanely amount of things which are called QDR and are not exactly. In this regard Quad/Octal Pumping is a much better description to say it can do 4/8 operations per cycle. AFAIK GDDR5 uses two DDR streams that are conbined (multiplexed) to form a 2x faster memory module. Again it's not that each memory cell performs 4 ops per clock cycle, but 2 cells perform 2 ops per cycle and send the data together. In practice is almost the same as you are doubling the frequency per pin, but it's important to note that the difference with a QDR signal is evident: you don't have control to every operation being made, you only can control it by pairs. For GPUs that kind of parallelism is not a problem, but in CPUs it could create an insanely and undesired amount of latency (it already does in GPU BTW, but it's masked by the parallelism of the GPU.). In this regard is as if someone called a 2.4 Ghz Dual core CPU a 4.8 Ghz CPU. It's not the same. In the case of GDDR5 that's exactly what we do. Although as I understand it, the memory controler must work at twice the speed of the memory cells, so from that point of view it is twice as fast. It's what I said, if your memory is twice as fast (by any method), make the external clock twice as fast.

This is how I think it works, just from the clock being use perspective:



Consider the input the external clock generator.

- In DDR a spike is created for every rising and falling edge.

- QDR would create 4 spikes. I don't know how, that's what I'm being asking all the time.

- Note how GDDR5's input clock is twice as fast. A completely different thing would be if the input was 100Mhz and it was doubled inside the memory itself and not externally. That is what I said it would be pointless IMO.
 
Last edited:
Joined
Apr 21, 2008
Messages
5,250 (0.90/day)
Location
IRAQ-Baghdad
System Name MASTER
Processor Core i7 3930k run at 4.4ghz
Motherboard Asus Rampage IV extreme
Cooling Corsair H100i
Memory 4x4G kingston hyperx beast 2400mhz
Video Card(s) 2X EVGA GTX680
Storage 2X Crusial M4 256g raid0, 1TbWD g, 2x500 WD B
Display(s) Samsung 27' 1080P LED 3D monitior 2ms
Case CoolerMaster Chosmos II
Audio Device(s) Creative sound blaster X-FI Titanum champion,Creative speakers 7.1 T7900
Power Supply Corsair 1200i, Logitch G500 Mouse, headset Corsair vengeance 1500
Software Win7 64bit Ultimate
Benchmark Scores 3d mark 2011: testing
anyone expect more performance in games with tri channel ram , why it is go down
 

Poisonsnak

New Member
Joined
Feb 13, 2005
Messages
362 (0.05/day)
Location
Saskatoon, SK
Processor FX-8150
Motherboard Gigabyte 990FXA-UD3
Cooling Thermalright Ultra-120 Extreme lapped & shimmed
Memory Patriot Viper Extreme 8GB DDR3-1866 9-11-9-27 1.65V
Video Card(s) MSI 6950 2GB Twin Frozr II @ 6970 875/1375
Storage Crucial M4 0309 64GB
Display(s) Dell U2410
Case Corsair 650D
Audio Device(s) onboard
Power Supply Corsair AX750
Software Windows [8] Developer Preview
... This is how I think it works, just from the clock being use perspective: ...

Yeah you're definitely on th right track there. I have a bit of background in digital signaling so I know how this stuff works. Interestingly enough if you read up on AGP (yes AGP) it's the same concept. AGP = SDR, AGP2x = DDR, AGP4x = QDR, AGP8x = ODR.

Some references here:
http://en.wikipedia.org/wiki/Agp
http://en.wikipedia.org/wiki/Quadruple_data_rate

I can summarize it here though:

SDR = transmits data on the rising edge of each clock
DDR = transmits data on the rising and falling edge of each clock
QDR = uses 2 clock generators, same frequency, one is 90° ahead (or behind) of the other (e.g. clock #1 has a rising edge, then halfway before its falling edge clock #2 has a rising edge). Transmits data on the rising and falling edge of both clocks.

In practice you wouldn't actually use 2 generators but instead delay the second signal by 90° somehow but you get the idea. ODR follows the same sort of pattern except you're using 4 clocks instead of 2.

The 2 clocks 90° apart thing can be hard to visualize so if I have time I'll draw a picture later today. Another way to think of it (maybe easier) is that the falling edge of the clock could be considered to be a clock signal that is 180° behind the original clock. In that case:

SDR = 1 clock 0°
DDR = 2 clocks, 0° and 180°
QDR = 4 clocks, 0°, 90°, 180°, and 270°
ODR = 8 clocks, 0°, 45°, 90°, 135°, etc.
 

Swansen

New Member
Joined
Nov 18, 2007
Messages
182 (0.03/day)
We see this going from DDR, to DDR2, to DDR3. Ultimately, I think it's time to stop with DDR and move to something that accomplishes more per work cycle like QDR or ODR.
my vote goes for a completely different memory architecture, my choice being XDR.
 

DarkMatter

New Member
Joined
Oct 5, 2007
Messages
1,714 (0.28/day)
Processor Intel C2Q Q6600 @ Stock (for now)
Motherboard Asus P5Q-E
Cooling Proc: Scythe Mine, Graphics: Zalman VF900 Cu
Memory 4 GB (2x2GB) DDR2 Corsair Dominator 1066Mhz 5-5-5-15
Video Card(s) GigaByte 8800GT Stock Clocks: 700Mhz Core, 1700 Shader, 1940 Memory
Storage 74 GB WD Raptor 10000rpm, 2x250 GB Seagate Raid 0
Display(s) HP p1130, 21" Trinitron
Case Antec p180
Audio Device(s) Creative X-Fi PLatinum
Power Supply 700W FSP Group 85% Efficiency
Software Windows XP
Yeah you're definitely on th right track there. I have a bit of background in digital signaling so I know how this stuff works. Interestingly enough if you read up on AGP (yes AGP) it's the same concept. AGP = SDR, AGP2x = DDR, AGP4x = QDR, AGP8x = ODR.

Some references here:
http://en.wikipedia.org/wiki/Agp
http://en.wikipedia.org/wiki/Quadruple_data_rate

I can summarize it here though:

SDR = transmits data on the rising edge of each clock
DDR = transmits data on the rising and falling edge of each clock
QDR = uses 2 clock generators, same frequency, one is 90° ahead (or behind) of the other (e.g. clock #1 has a rising edge, then halfway before its falling edge clock #2 has a rising edge). Transmits data on the rising and falling edge of both clocks.

In practice you wouldn't actually use 2 generators but instead delay the second signal by 90° somehow but you get the idea. ODR follows the same sort of pattern except you're using 4 clocks instead of 2.

The 2 clocks 90° apart thing can be hard to visualize so if I have time I'll draw a picture later today. Another way to think of it (maybe easier) is that the falling edge of the clock could be considered to be a clock signal that is 180° behind the original clock. In that case:

SDR = 1 clock 0°
DDR = 2 clocks, 0° and 180°
QDR = 4 clocks, 0°, 90°, 180°, and 270°
ODR = 8 clocks, 0°, 45°, 90°, 135°, etc.

Yeah thanks, that's what I thought, phased signals and multiplexed input/output data (or something similar). I now just need someone to explain why would be better to use QDR, instead of a faster bus, when AFAIK only reason FSBs (or the like) are not faster is because of the slow memories. With DDR the benefit is clear as you can use the rising and falling edges of same 1 signal, but QDR and ODR require an additional signal (be it a different one or the same one phased out) and I don't see much benefit there for main memory where good latency is must*. Maybe the answer is really simple and I'm just missing it out, I dunno.

* One of the requirements to convince me is that the advantage of using QDR is not use for memory cell/bank parallelization (like in GDDR5) as that wouldn't be a good solution for main memory.
 

Morgoth

Fueled by Sapphire
Joined
Aug 4, 2007
Messages
4,226 (0.69/day)
Location
Netherlands
System Name Wopr "War Operation Plan Response"
Processor 5900x ryzen 9 12 cores 24 threads
Motherboard aorus x570 pro
Cooling air (GPU Liquid graphene) rad outside case mounted 120mm 68mm thick
Memory kingston 32gb ddr4 3200mhz ecc 2x16gb
Video Card(s) sapphire RX 6950 xt Nitro+ 16gb
Storage 300gb hdd OS backup. Crucial 500gb ssd OS. 6tb raid 1 hdd. 1.8tb pci-e nytro warp drive LSI
Display(s) AOC display 1080p
Case SilverStone SST-CS380 V2
Audio Device(s) Onboard
Power Supply Corsair 850MX watt
Mouse corsair gaming mouse
Keyboard Microsoft brand
Software Windows 10 pro 64bit, Luxion Keyshot 7, fusion 360, steam
Benchmark Scores timespy 19 104
psst remember where using now imc goes no longer trough nortbridge
 

DarkMatter

New Member
Joined
Oct 5, 2007
Messages
1,714 (0.28/day)
Processor Intel C2Q Q6600 @ Stock (for now)
Motherboard Asus P5Q-E
Cooling Proc: Scythe Mine, Graphics: Zalman VF900 Cu
Memory 4 GB (2x2GB) DDR2 Corsair Dominator 1066Mhz 5-5-5-15
Video Card(s) GigaByte 8800GT Stock Clocks: 700Mhz Core, 1700 Shader, 1940 Memory
Storage 74 GB WD Raptor 10000rpm, 2x250 GB Seagate Raid 0
Display(s) HP p1130, 21" Trinitron
Case Antec p180
Audio Device(s) Creative X-Fi PLatinum
Power Supply 700W FSP Group 85% Efficiency
Software Windows XP
psst remember where using now imc goes no longer trough nortbridge

FSBs (or the like)

I used FSB as a generic term. I should have said bus interconnect or something like that. QPI or HT are still buses and in some way it's still in the "up front" of the architecture. FSB is a very specific technology, but IMHO it also describes more or less what HT or QPI is in a generic way. Much like HD means 720/1080p, but the word itself could mean any high resolution.

Anyway IMC indeed helps my point. As I understand it IMC allows for much faster inerconnects between the CPU and the memory, so whenever faster memory (by any method) is available, the bus should be made faster instead of using "multiple instances" of the same one. Am I right or not?
 

Poisonsnak

New Member
Joined
Feb 13, 2005
Messages
362 (0.05/day)
Location
Saskatoon, SK
Processor FX-8150
Motherboard Gigabyte 990FXA-UD3
Cooling Thermalright Ultra-120 Extreme lapped & shimmed
Memory Patriot Viper Extreme 8GB DDR3-1866 9-11-9-27 1.65V
Video Card(s) MSI 6950 2GB Twin Frozr II @ 6970 875/1375
Storage Crucial M4 0309 64GB
Display(s) Dell U2410
Case Corsair 650D
Audio Device(s) onboard
Power Supply Corsair AX750
Software Windows [8] Developer Preview
... I now just need someone to explain why would be better to use QDR, instead of a faster bus ...

I think (?) the main reason to use a lower clock frequency is for signal synchronization over long distances (PCB traces). If 1 1GHz signal is travelling along 10cm of PCB trace at the speed of light then it takes 3 ns to make the trip, or more importantly 3 clock cycles.
 

DarkMatter

New Member
Joined
Oct 5, 2007
Messages
1,714 (0.28/day)
Processor Intel C2Q Q6600 @ Stock (for now)
Motherboard Asus P5Q-E
Cooling Proc: Scythe Mine, Graphics: Zalman VF900 Cu
Memory 4 GB (2x2GB) DDR2 Corsair Dominator 1066Mhz 5-5-5-15
Video Card(s) GigaByte 8800GT Stock Clocks: 700Mhz Core, 1700 Shader, 1940 Memory
Storage 74 GB WD Raptor 10000rpm, 2x250 GB Seagate Raid 0
Display(s) HP p1130, 21" Trinitron
Case Antec p180
Audio Device(s) Creative X-Fi PLatinum
Power Supply 700W FSP Group 85% Efficiency
Software Windows XP
I think (?) the main reason to use a lower clock frequency is for signal synchronization over long distances (PCB traces). If 1 1GHz signal is travelling along 10cm of PCB trace at the speed of light then it takes 3 ns to make the trip, or more importantly 3 clock cycles.

That doesn't make sense at all. Those numbers don't make sense to me. An electron or a hole travelling at light speed will cover 10 cm in 0,33 ns = 10cm/(300.000km/s*1000m/km*100cm/m). So it would have the time to do 3 travels.

Anyway, why would that matter? As I see it, it doesn't. It would be like saying that in a chain montage, you can't have a product every second because it takes 4 hours to each of them to go from start to finish. It's the production rate which matters, and unless I'm missing something important the same happens in electronics. Besides clock speed limits are constituted by much slower elements, such as the gate's (NAND, NOR...) state change delay, or the delay in transistors state change (one depends on the other really). Following the analogy, we can compare that to the time it takes to fullfill the trailers that will carry the goods to another place.

I understand there are limitations on clock speed, but considering the speeds at which GDDR5 runs, doubling what we have in our mobos several times wouldn't be a problem yet.
 

Morgoth

Fueled by Sapphire
Joined
Aug 4, 2007
Messages
4,226 (0.69/day)
Location
Netherlands
System Name Wopr "War Operation Plan Response"
Processor 5900x ryzen 9 12 cores 24 threads
Motherboard aorus x570 pro
Cooling air (GPU Liquid graphene) rad outside case mounted 120mm 68mm thick
Memory kingston 32gb ddr4 3200mhz ecc 2x16gb
Video Card(s) sapphire RX 6950 xt Nitro+ 16gb
Storage 300gb hdd OS backup. Crucial 500gb ssd OS. 6tb raid 1 hdd. 1.8tb pci-e nytro warp drive LSI
Display(s) AOC display 1080p
Case SilverStone SST-CS380 V2
Audio Device(s) Onboard
Power Supply Corsair 850MX watt
Mouse corsair gaming mouse
Keyboard Microsoft brand
Software Windows 10 pro 64bit, Luxion Keyshot 7, fusion 360, steam
Benchmark Scores timespy 19 104

DarkMatter

New Member
Joined
Oct 5, 2007
Messages
1,714 (0.28/day)
Processor Intel C2Q Q6600 @ Stock (for now)
Motherboard Asus P5Q-E
Cooling Proc: Scythe Mine, Graphics: Zalman VF900 Cu
Memory 4 GB (2x2GB) DDR2 Corsair Dominator 1066Mhz 5-5-5-15
Video Card(s) GigaByte 8800GT Stock Clocks: 700Mhz Core, 1700 Shader, 1940 Memory
Storage 74 GB WD Raptor 10000rpm, 2x250 GB Seagate Raid 0
Display(s) HP p1130, 21" Trinitron
Case Antec p180
Audio Device(s) Creative X-Fi PLatinum
Power Supply 700W FSP Group 85% Efficiency
Software Windows XP
Top