• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.
  • The forums have been upgraded with support for dark mode. By default it will follow the setting on your system/browser. You may override it by scrolling to the end of the page and clicking the gears icon.

GDDR5 Memory - Under the Hood

jumping in late to the "argument" but would like to offer some thoughts


yes, GDDR5 will prob call for extremelly high latencies, as did GDDR4 when compared to GDDR3, and GDDR3 when compared to GDDR2

but what you forget to take into account is that GDDR5 should be able to move more information per clock cycle, like GDDR4 when compared to GDDR3, and GDDR3 when compared to GDDR2

what that translates to - is higher latencies, but more information being transferred - which null and voids any drawbacks with having to run higher latencies.

and coupled with the fact that newer memory desgins allow for higher clocked MEM, require less voltage - that makes them more efficient compared to their predecessors.


Just like with SYS MEM - DDR3 can move more information than DDR2 can, and runs faster as well. Sure, it's possible that DDR2 clocked at 1200 will run 51ms latencies, but so can DDR3 at 1600MHz . . . and which standard transfers more information? The more information that can be moved into and out of the DRAM matrix per clock cycle means the less amount of time you're sitting waiting for things to load up.
 
Comments such as this cause just as many problems on this forum as "Fanboys." If you have a problem with another user please use the report post button rather than inflaming situations with negative comments and attitude.

I didn't mean exclusively in this forum.
 
Wow, alot of useful information on here that I didnt know. And no, Im not referring to the article, Im actually speaking about the memory, latencies, speeds and such translate to bandwidth. Ill take a further look at this article, it seems like an awesome read.

Oh and ATI :nutkick: Nvidia
 
This is just BS. Or, more likely an unintentional lapse from the author. It will soon be changed to something like:
You dont understand how GDDR5 works.....

extremetech then twisted by largon said:
Bandwidth first: A system using GDDR3 memory on a 256-bit memory bus running at 1800MHz (effective DDR speed) would deliver 57.6 GB per second. Think of a GeForce 9600GT, for example. A double speed GDDR5 on a bus half as wide would deliver an equal amount.

GDDR3 memory on a 256-bit memory bus running at 1800MHz (effective DDR speed) would deliver 57.6 GB per second:

1800 effective-----256-bit--------2 bits per cycle

(900 MHz) * (256 bits/interface) * (2 bits / Hz) = 460800Mbit/s or 57.6GB/s


A double speed GDDR5 on a bus half as wide:

---Doubled-------half of 256-bit------4 bits per cycle------STILL TWICE THE BANDWIDTH

(900*2 MHz) * (128 bits/interface) * (4 bits / Hz) = 921600Mbit/s or 115.2GB/s

extremetech said:
Take any GDDR3 bandwidth on a given clock rate and bus width and double it, and you get GDDR5's bandwidth.

This is because GDDR3 can output 2 bits per clock cycle and GDDR5 can output 4 bits per clock cycle.

Qimonda GDDR5 Whitepaper said:
GDDR5 operates with two different clock types. A differential command clock (CK) to where address and command inputs are referenced, and a forwarded differential write clock (WCK) where read and write data are referenced to. Being more precise, the GDDR5 SGRAM uses two write clocks, each of them assigned to two bytes. The WCK runs at twice the CK frequency. Taking a GDDR5 with 5 Gbps data rate per pin as an example, the CK clock runs with 1.25 GHz and WCK with 2.5 GHz. The CK and WCK clocks will be aligned during the initialization and training sequence. This alignment allows read and write access with minimum latency.

EDIT:

Thanx for the good read HTC ;)
 
Last edited:
on paper the 2900XT should have crushed all comers, instead it barley put up a fight agasint the 8800GTS 640. AMD can look great on paper, but give me some proof they can compete.

Your wish is my command!

Samsung K4U52324QE-07 GDDR4 0.714ns at work on a Sapphire Atlantis 3870 Silent OC (card is not modified at all) in two of the most demanding games ever:

Rivatuner - Dirt - 4xAA, 8xAF.gif Rivatuner - Crysis - 4xAA, 8xAF.gif

To the left: CMR DiRT at maxed settings in 1280*1024 with 4xAA, 8xAF, AAA, VSync on, Mipmap quality max, image shows a complete run of Magneti Marelli Crossover.

To the right: Crysis (V1) at tweaked very high DX9 settings in 1280*1024 with 4xAA, 8xAF, AAA, VSync, Mipmap quality max, image shows combat at the end of the road in first level.

Is there any Nvidia-card on the market that can come close to this?

PS You can see upto 20% CPU-limitation in my screens. So this is not the top speed of this card.
 
A double speed GDDR5 on a bus half as wide:

---Doubled-------half of 256-bit------4 bits per cycle------STILL TWICE THE BANDWIDTH

(900*2 MHz) * (128 bits/interface) * (4 bits / Hz) = 921600Mbit/s or 115.2GB/s
That's not correct either.
There is no such thing as 1800MHz GDDR5. Infact, you actually didn't understand how I understood it wrong.
So the double-speed GDDR5 ("900*2 MHz" as you said) makes no sense and the real comparison is:

900MHz GDDR5 * 128bit * 4bits/clk = same bandwidth as 900MHz GDDR3 (same frequency, de facto) * 256bit bus (doubled bus) * 2bits/clk.

GDDR5 is infact "fake-QDR", I thought it was just ~double in frequency. Like 2GHz real (DDR-4000). Damn those people that mix up DDR-ratings to MHz. There's tons of places on the net where you can see things like 4.0GHz GDDR5 - as if it was DDR-4000. Like wikipedia (duh) - until I edited the wiki article. Maybe JEDEC should've called it simply as QDR, and not DDR as does it really matter how many datalinks (DQs) you use if it's actually 4bits/clock = quad data rate by definition. For some reason it's kinda disappointing to know it's just a wider, not faster in frequency. GDDR5 is sort of like a "dualcore RAM".

Hmm... Or maybe GDDR5 should be called QDDR (quasi double data rate)...
:P

Spirou said:
To the right [link]: Crysis (V1) at tweaked very high DX9 settings in 1280*1024 with 4xAA, 8xAF, AAA, VSync, Mipmap quality max, image shows combat at the end of the road in first level.

Is there any Nvidia-card on the market that can come close to this?
Your video memory usage -graph gives away that your results are with CCC forced AA (= 0xAA). That would mean only built-in edge AA is applied. If you want AA actually used only choose 4xAA from in-game options.
4xAA at 1280x1024 takes ~600MB.

Anyways, I'm running at stable 50FPS at the same settings as you.
 

Attachments

  • Crysis_edgeAA_8xAF_1280x1024_etc.jpg
    Crysis_edgeAA_8xAF_1280x1024_etc.jpg
    105.3 KB · Views: 618
Last edited:
Your video memory usage -graph gives away that your results are with CCC forced AA (= 0xAA). That would mean only built-in edge AA is applied. If you want AA actually used only choose 4xAA from in-game options.
4xAA at 1280x1024 takes ~600MB.

Actually i ran MSAA (Wide Tent Samples 8X), but there is no difference in memory usage at all. Anti-Aliasing (like Anisotropic Filtering) is based on math functions that don't need memory at all when rendered properly. It can be emulated thru large tables to reduce shader usage reducing texturing bandwidth (using more TMUs) to address filtered data directly from memory but such does not look like true AA and AF.

However: Crysis doesn't use more than 420 MB on HD 38x0 no matter which setting is chosen. Memory usage on Nvidia cards is much higher due to their specific rendering strategy and chip-design.

Anyways, I'm running at stable 50FPS at the same settings as you.

You must be joking. Even extreme overclocking won't get you higher than 40 million tris per sec which is the average amount for high(!) settings at 40 fps. Usually SLI setups can go that far. With a single GPU you won't get much higher than 30 million tris per sec. I've seen a lot of Crysis benches and by the time i write this noone ever reached 55 million tris per sec. So your screenie simply does not show the same settings.

At 1280*1024 and very high settings plus 4xAA and 8xAF an 8800 Ultra OC reaches 15 fps*, and i am very proud to get 18-22 fps from my card. With tweaked settings between high and very high, i doubt that any single GPU setup can beat 30 fps with less than 600 GFlops and with 85 GB/s memory bandwidth (fully available, not affected by addressed filtering).

* http://www.tomshardware.com/de/fotostrecken/grafik_cpu_leistung2,0101-58000-0-14-15-1-jpg-.html#
 
Spirou,
I ran it again with the same DX9 very high tweak (all knobs @ max) + in-game selected 4xAA + driver forced 8xAF + trilinear filtering + vsync:
-> 30-35FPS

Too bad FSAA is so much heavier to run than those full screen & texture blurring tent AAs on Radeons.
 
Back
Top