Monday, October 13th 2008

Core i7 940 Review Shows SMT and Tri-Channel Memory Let-down

As the computer enthusiast community gears up for Nehalem November, with reports suggesting a series of product launches for both Intel's Core i7 processors and compatible motherboards, Industry observer PC Online.cn have already published an in-depth review of the Core i7 940 2.93 GHz processor. The processor is based on the Bloomfield core, and essentially the Nehalem architecture that has been making news for over an year now. PC Online went right to the heart of the matter, evaluating the 192-bit wide (tri-channel) memory interface, and the advantage of HyperThreading on four physical cores. In the tests, the 2.93 GHz Bloomfield chip was pitted against a Core 2 Extreme QX9770 operating at both its reference speed of 3.20 GHz, and underclocked to 2.93 GHz, so a clock to clock comparison could be brought about.

The evaluation found that the performance increments tri-channel offers over dual-channel memory, in real world applications and games, are just about insignificant. Super Pi Mod 1.4 shows only a fractional lead for tri-channel over dual-channel, and the trend continued with Everest Memory Benchmark. On the brighter side, the integrated memory controller does offer improvements over the previous generation setup, with the northbridge handling memory. Even in games such as Call of Duty 4 and Crysis, tri-channel memory did not shine.

As for the other architectural change, simultaneous multi-threading, that makes its comeback on the desktop scene with the Bloomfield processors offering as many as eight available logical processors for the operating system to talk to, it appears to be a mixed bag, in terms of performance. The architecture did provide massive boosts in WinRAR and Cinebench tests Across tests, enabling SMT brought in performance increments of roughly 10~20% with general benchmarks that included Cinebench, WinRAR, TMPGEnc, and Fritz Chess. With 3DMark Vantage, SMT provided a very significant boost to the scores, with about 25% increments. It didn't do the same, to current generation games such as Call of Duty 4, World in Conflict and Company of Heroes. What's more, the games didn't seem to benefit from Bloomfield in the first place. The QX9770 underclocked at 2.93 GHz, outperformed i7 940, both with and without SMT, in some games.

Source: PC Online
Add your own comment

91 Comments on Core i7 940 Review Shows SMT and Tri-Channel Memory Let-down

#1
InnocentCriminal
Resident Grammar Amender
Hmm... interesting. So that's an in-sight into the performance. Now for the power consumption...
Posted on Reply
#2
Basard
is it just me, or is the FPS on the games going down with tri-channel?
Posted on Reply
#3
InnocentCriminal
Resident Grammar Amender
Once Far Cry 2, Left 4 Dead and other natively multi-threaded games come out, we'll have a better look at how well they compare to the Core 2 line.
Posted on Reply
#4
Mussels
Moderprator
this is kind of the same old argument - go a quad + HT if you use multithreaded apps, but if all you do is games, a faster clocked quad/even faster dual, is the better choice.

Video encoders are going to cream themselves, at least.
Posted on Reply
#5
eidairaman1
more of a server upgrade is what this is primarily
Posted on Reply
#6
Mussels
Moderprator
after reading through the page, one thing caught my attention. The Core i7 CPU has 8MB cache, while the QX9770 has 12MB.

I'm not sure what the i7 CPU will cost, but isnt comparing it to the Extreme core 2 model giving it a bit of a disadvantage? Wouldnt hte performance different be a lot smaller vs a lower cached CPU?


edit:

load and idle power graph.

Posted on Reply
#8
D4S4
by: Mussels
after reading through the page, one thing caught my attention. The Core i7 CPU has 8MB cache, while the QX9770 has 12MB.

I'm not sure what the i7 CPU will cost, but isnt comparing it to the Extreme core 2 model giving it a bit of a disadvantage? Wouldnt hte performance different be a lot smaller vs a lower cached CPU?


Not really, Core i7 doesn't even need 8MB since it has integrated memory controller.


One thing that should go wild on that thing is photoshop
Posted on Reply
#9
FordGT90Concept
"I go fast!1!11!1!"
by: Basard
is it just me, or is the FPS on the games going down with tri-channel?
Memory is a funny thing because the faster it goes in terms of bandwidth, the slower it goes in terms of clock cycles. We see this going from DDR, to DDR2, to DDR3. Ultimately, I think it's time to stop with DDR and move to something that accomplishes more per work cycle like QDR or ODR. DDR2 and DDR3 have been able to keep pace with processor needs but it fails to improve not only on bandwidth, but also clock speeds and/or latency. Basically, the original DDR technology is being stretched to meet modern demands when it is long over due to explore the possibilities of something else.

Ultimately, bandwidth doesn't matter so long as it doesn't run out. For instance, interstate highways are great--until they have a traffic jam. As such, memory never really weighs very heavily into benchmarking. It only has a major impact if it errors or if there is a traffic jam--both are for the worse. So in regards to memory, uneventful is a great thing.


Now directly to your question: tri-channel is addressing the need of the processor more than anything else. In order to add another two DIMMs, they increase the distance and therefore the latency. Specific references such as FPS goes down slightly in order to prevent a disaster (traffic jam). I think they are being very pro-active on this whole memory bandwidth issue but I really don't like the way it is progressing (has been progressing for almost a decade). This small decrease in performance with tri-channel is basically universal until they try to reinvent memory.
Posted on Reply
#10
Mussels
Moderprator
as a very good analogy to explain fords post, i shall use video cards.

Look at a video card such as the 8600GT and its 128 bit memory bus. You can slap more ram on it (256/512/1024MB), and that will prevent running out of ram (texture swapping) _without_ improving performance.

You could also add more channels with the same amount of ram, which gives you better performance, but not if the game/application was designed for a 128 bit bus/256MB of ram.

Two examples:
128 bit bus with 1GB of ram, or a 512 bit bus with 256MB of ram.

Tri channel will help prevent bottlenecks as we all go 8GB+ in our systems, and i think we need benchmarks with MORE than a measly 2GB of ram, before calling it a failure.
Posted on Reply
#11
FordGT90Concept
"I go fast!1!11!1!"
by: Mussels
Tri channel will help prevent bottlenecks as we all go 8GB+ in our systems, and i think we need benchmarks with MORE than a measly 2GB of ram, before calling it a failure.
To add to that, Nehalem seems to be designed for tomorrow--not today. This is a processor essentially built for when most software is demanding and multithreaded, 64-bit environment is common place, and memory flows like a beer tap at Oktoberfest. When it is launched, it will only seem like it belongs in high-bandwidth server environments and not in your desktop computer. It will be a few years from now when Core 2 seems like Pentium 4 and Core i7 is a must-have chip.

I believe Core i7 (or at least the concepts that it is pushing for) won't go mainstream until Microsoft releases a 64-bit only operating system. I just hope AMD takes the ques from Intel and moves towards the same direction. AMD will get kicked in the balls again if they don't.
Posted on Reply
#12
Mussels
Moderprator
in summary, tri channel is for users of massive bandwidth, with multithreaded applications.

Todays games do not fit that category, directX 11 games will (native multithreading), and any form of media encoding definately will.

I bet Vmwares will run uber fast on these systems.
Posted on Reply
#13
_jM
this is like one of those .. " I told ya so" posts. I knew with the new chips there would be some kinda flaw and it happens to be in the tri-channel ram.
Posted on Reply
#14
FordGT90Concept
"I go fast!1!11!1!"
by: Mussels
Todays games do not fit that category, directX 11 games will (native multithreading), and any form of media encoding definately will.
Which reminds me of yet another thing: I believe the future of GPUs will be much like CPUs. That is, less complicated but more of them like Intel is in Larrabee development. The more GPUs you have, the more requests the CPU receives and therefore, more leads to more.

It is awkward how manufacturers are pushing for server technology in home computers. I mean, ten years ago, it was all about the clockspeed. Today, they realized that clockspeed isn't virtually unlimited and have looked to mainframe servers in to how to fix it: more processors. Because more processors means more of everything (sockets, DIMMs, power, etc.), they had to find a way to make more affordable and marketable. The answer was in the form of cores on the same CPU die. The GPU crowd is now realising the same thing but there is a great deal of latency involved. Once the GPU crowd jumps on the same multi-core bandwagon as the CPU crowd has been on for several years, games will start to benefit from CPUs with lots of cores.
Posted on Reply
#15
tkpenalty
Well a triple memory channel setup basically, is somewhat ahead of its time for the general consumers, I mean the games aren't even designed to make use of this advantage.
Posted on Reply
#16
Mussels
Moderprator
by: tkpenalty
Well a triple memory channel setup basically, is somewhat ahead of its time for the general consumers, I mean the games aren't even designed to make use of this advantage.
Indeed. If a game was, it'd perform terribly on modern hardware. Kinda like running 1920x1200 on card with a 64 bit memory bus.
Posted on Reply
#17
DarkMatter
by: FordGT90Concept
Ultimately, I think it's time to stop with DDR and move to something that accomplishes more per work cycle like QDR or ODR.
How in hell do you make QDR or ODR? (I understand it as Quad/Octo Data Rate)

DDR is such because it works in clock's low and high states. That is Dual Data Rate, so again how do you do Quad/Octo data rate if your clock signal only has two states?
As I see it, multiplexing the clock signal in time is not an option, because isn't that the same as just running the memory at twice the speed?

EDIT: :banghead: Forget about QDR, I'm stupid, I forgot you still have rising and falling edges like in Intel's Quad pumped FSB. I still fail to see how you could do 8 ops per cycle though.
Posted on Reply
#18
DanTheBanjoman
Señor Moderator
by: DarkMatter
How in hell do you make QDR or ODR? (I understand it as Quad/Octo Data Rate)

DDR is such because it works in clock's low and high states. That is Dual Data Rate, so again how do you do Quad/Octo data rate if your clock signal only has two states?
As I see it, multiplexing the clock signal in time is not an option, because isn't that the same as just running the memory at twice the speed?

EDIT: :banghead: Forget about QDR, I'm stupid, I forgot you still have rising and falling edges like in Intel's Quad pumped FSB. I still fail to see how you could do 8 ops per cycle though.
XDR sends 8 bits per clock. Not sure about the theory behind it though. Either way it already exists so it is possible apparently :)
Posted on Reply
#20
btarunr
Editor & Senior Moderator
Yes, claimed faster than GDDR5 in its applications: http://www.rambus.com/us/products/xdr2/xdr2_vs_gddr5.html Comes from Rambus itself though. And a broad memory interface isn't something software needs "optimizations" for, it's just a physical thing. You have a beefy 192-bit wide memory interface, and an accordingly stepped-up memory bandwidth.
Posted on Reply
#21
SimFreak47
by: btarunr
Comes from Rambus itself though.
Gotta be pretty expensive.
Posted on Reply
#22
FordGT90Concept
"I go fast!1!11!1!"
by: DarkMatter
I still fail to see how you could do 8 ops per cycle though.
At eight points along the sine wave. A wave is like a string where the entire length of the string represents a single wavelength. Instead of reading/writing at two points like we do with DDR, we read/write at eight points along it (the rising edge, peak, falling edge, intersection, falling edge, summit, rising edge, intersection).

History tells us anything made by Rambus is doomed to failure in the PC world; just look at RDRAM and the brief stint Intel had with it. Rambus specializes in high performance, soldered in situations. They're kind of like Apple come to think of it. They make a product, don't care what others say about it, and just expect people to come crawling to them for licensing.

It's JEDEC (a forum including all major processor and memory manufacturers) that has to decide when it's time to move to a new memory technology. I'm afraid we're probably going to be stuck with DDR derivitives until photon processors come out. :shadedshu
Posted on Reply
#23
DarkMatter
by: DanTheBanjoman
XDR sends 8 bits per clock. Not sure about the theory behind it though. Either way it already exists so it is possible apparently :)
First of all, 3 things:

1- I need more sleep.
2- I was right in the first place. Falling and rising edges are still ONLY 2. Left edge of low state is the same as the right one of high state. :roll: Quad Pumping is done using 2 clocks with 90º phase difference.
3- XDR AFAIK uses a ring bus to acces the different memory banks, so is effectively multiplexing the data signals and it's a completely different aproach to SDRAM. It's also different to what I said about multiplexing the clock signal, which would be pointless IMO: if memory can run faster just run it faster, IMO FSB could easily keep up. In fact I have always considered the FSB was so "slow" compared to the CPU clock because the memory was even slower. And if you are doubling the memory bits/banks per clock why multiplex the external clock (or use two different phased signals) to be able to use them and not just double the lanes?

That being said, I think I have to elaborate more on my question. Using XDR as main memory is out of the question, we could do that (with it's pros and cons), but that wouldn't be using QDR/ODR SDRAM. My question is how and why you use a Quad Pumped Syncronous RAM when for doing that you have to double the accesible bits per clock of your memory chips without obtaining the benefits of a fully parallel design, if you could just run the memory twice as fast. I'm going to make a diagram of what I say because I don't know how to explain it better now and I'm sure no one will understand this mess. :o
Posted on Reply
#24
DarkMatter
by: FordGT90Concept
At eight points along the sine wave. A wave is like a string where the entire length of the string represents a single wavelength. Instead of reading/writing at two points like we do with DDR, we read/write at eight points along it (the rising edge, peak, falling edge, intersection, falling edge, summit, rising edge, intersection).

History tells us anything made by Rambus is doomed to failure in the PC world; just look at RDRAM and the brief stint Intel had with it. Rambus specializes in high performance, soldered in situations. They're kind of like Apple come to think of it. They make a product, don't care what others say about it, and just expect people to come crawling to them for licensing.

It's JEDEC (a forum including all major processor and memory manufacturers) that has to decide when it's time to move to a new memory technology. I'm afraid we're probably going to be stuck with DDR derivitives until photon processors come out. :shadedshu
I was asking exactly for that. You CAN'T ask any digital circuit to understand such things as peak, intersection, etc. You only have two states (low/high) at your disposal. What you are saying would be like using an 8 state machine, which of course would be ideal, but impossible in current technology. Otherwise the whole digital world would be based on more than 2 state machines!!

A circuit can know when it is on high/low state OR when he is changing from low to high and viceversa as is the case with DDR. But once it is in one state how does it know it has to perform another task? It can't until another edge comes.
Posted on Reply
#25
FordGT90Concept
"I go fast!1!11!1!"
by: DarkMatter
I was asking exactly for that. You CAN'T ask any digital circuit to understand such things as peak, intersection, etc.
Which is to suggest that QDR technology and beyond work primarily via analog signals.
Posted on Reply
Add your own comment