The nVidia memory bandwidth myth explained.

W1zzard · May 26, 2010

c is the speed of light, as mentioned above, electrons dont travel at the speed of light but it's a usable approximation

newtekie1 · May 26, 2010

WSP said:
so, basically, all Radeon HD5000 and Geforce GTX400 have IMC embedded onto the GPU, just like today's CPU?
if so, I see ATI doing Intel-like and NVIDIA doing AMD-like with their IMC.Intel's IMC usually leads to higher memory overclock. we see more than 2000mhz is common with Intel, whereas AMD can't compete in memory clockspeed.

that is completely beyond my senses.nvidia had a long time to prepare GF100 and they only came out with lowspeed IMC.cmiiw

Even though AMD was the first of the two to implement a memory controller directly on the CPU, they seem to stuggle doing so, always having problems getting a proper implementation.

First it was the DDR controller on 754 not allowing dual channel, then it was problems with the DDR controller on 939 not being able to handle 4 sticks at 1T and 400+Mhz, then it was problems with their DDR2 controller on AM2 not being able to handle 4 sticks at 1066, and then the problems they had getting a stable DDR3 controller for AM3, which led them to disabling it in the first batch of AM3 processor and releasing them as AM2 only.

With nVidia, the issue is probably more down to having a 384-bit memory controller. It is a little different, but to give you an idea, on motherboards single channel is 64-bit, so dual channel is 128-bit and triple channel is 192-bit. Graphics cards having been using 256-bit as a standard for a good long while now, and I forsee it being the standard for a long while still, simply because of what we are seeing in the nVidia cards with the memory controller not being able to handle the higher clock speeds.

They did this before with the G80 cards, and eventually went back to a 256-bit bus with G92. ATi tried it to with their HD2900 series, with a 512-bit bus, they obviously learned from their mistake. In fact, IIRC, the HD2900XT used GDDR3 rated for 1000MHz, but clocked it at only 825MHz over the 512-bit bus, and it only overclocked into the 970MHz range. While nVidia at the time was using GDDR3 rated for only 900MHz, clocking it at 900MHz over the 384-bit bus, but it overclocked to well over 1000MHz.

Benetanegia · May 26, 2010

W1zzard said:
c = 3*10^5 km/s = 3*10^11 mm/s = 300 mm/nanosecond.

So to travel the 100 mm to the memory the signal needs 0.3 ns. where do you think the rest of the time is spent if not in the memory controller ?

You still don't want to get what I mean, any function added will introduce latencies, because it's going to be required to wait for that function to finish. Even if it's not enabled, you need that fuction that will tell if ECC is enabled or not. I highly doubt ECC is disabled in hardware. If it is and you know it 100% sure, then I retract my opinion, but otherwise, it is very posible that ECC can introduce some latencies that make the MC slower.

I'm not talking about how much the request takes, but about the fact that added functions/silicon always limits the maximum stable clock that anything can achieve.

W1zzard · May 26, 2010

you can implement any function to be executed in parallel without latency if you are willing to spend the transistors for it. from there on you can reduce your transistor count using several clock cycles to do it

slyfox2151 · May 26, 2010

ok, not to poke holes but electricity in a wire does not go the speed of light, 95% would be closer.

Edit, never mind i see you noted it

W1zzard · May 26, 2010

> because it's going to be required to wait for that function to finish.

just put a logic 1 on the "is ecc data good" gate and you are done, no computation needed

if ecc enabled, dont send the logic 1 but connect with the output of the magic ecc black box

dont think in terms of sequential programming for logic design, you can do everything at the same time there

ecc consumes storage in memory, so if it were always on you'd have less memory usable (this is the case for tesla cards, not for geforce)

wahdangun · May 26, 2010

@ wizz

yep that why we have actual memory size in g-force, where tesla card have less than actual memory

Benetanegia · May 26, 2010

W1zzard said:
you can implement any function to be executed in parallel without latency if you are willing to spend the transistors for it. from there on you can reduce your transistor count using several clock cycles to do it

W1zzard said:
> because it's going to be required to wait for that function to finish.

just put a logic 1 on the "is ecc data good" gate and you are done, no computation needed

if ecc enabled, dont send the logic 1 but connect with the output of the magic ecc black box

dont think in terms of sequential programming for logic design, you can do everything at the same time there

ecc consumes storage in memory, so if it were always on you'd have less memory usable (this is the case for tesla cards, not for geforce)

You can do many things in parallel but even then that's going to add some internal latencies. You are adding stages and that adds complexity which can impact speed. You are not exchanging few complex stages by many simpler stages, which results in higher clock speeds. You are adding stages that didn't exist before. So you have to ensure interoperability, you have to ensure that all stages take the same time to execute without adding innecesary traces or waiting times, etc. Adding things in parallel can make the chip run faster, but will also make it bigger* and hotter and that will also limit the clocks. Sounds familiar...

* And that will make overall travelling time larger. You can't make 2 things occupy the same space as one. Yet things are done in parallel because two slightly slower things are faster than a single faster one. That doesn't change the fact that you are adding travelling times (and making the thing slower), because you are space constrained and having two (many more actually) things going from A to B at the same time will make the trip of each of them longer, and that will affect maximum attainable clock.

The fact of the matter is that the MC in Fermi is much slower than that on previous generations of cards and one of the most significant changes is ECC. Can you completely write off the posibility that adding ECC support had a slowing effect? That's all I'm saying. Many things can make a circuit slower and I think none of us is in the position to deny with certainty that something isn't slowing it down.

cadaveca · May 26, 2010

newtekie1 said:
With nVidia, the issue is probably more down to having a 384-bit memory controller. It is a little different, but to give you an idea, on motherboards single channel is 64-bit, so dual channel is 128-bit and triple channel is 192-bit. Graphics cards having been using 256-bit as a standard for a good long while now, and I forsee it being the standard for a long while still, simply because of what we are seeing in the nVidia cards with the memory controller not being able to handle the higher clock speeds.

Although graphics card list 256-bit, 384-bit, and even 512-bit, it's still many 64-bit controllers. So, really, 256 bit is akin to 4-channel 64-bit, just like cpus.

I stole this pic from OC3D, but it illustrates this very clearly:

As you can see, the 384-bit of Fermi gpus is actually 6 seperate 64-bit controllers.

EDIT, here is HD5870(stolen from BitTech), same thing, 256 bit by 4x64-bit:

newtekie1 · May 26, 2010

You are correct, they are 64-bit controllers strung together, but they all still must work together, and it is that working together that limits the speed they can run at and maintain stability.

On a different note, I wonder if upping the GPU voltage, and hence giving the memory controllers more voltage, would actually improve memory overclocks in some cases.

wahdangun · May 26, 2010

i don't think so newtikie, because intel IMC can clock memory a little higher than AMD beside having larger buss

i think what's really limit fermi was it's heat. just like HD 2900 XT,

cadaveca · May 26, 2010

newtekie1 said:
You are correct, they are 64-bit controllers strung together, but they all still must work together, and it is that working together that limits the speed they can run at and maintain stability.

On a different note, I wonder if upping the GPU voltage, and hence giving the memory controllers more voltage, would actually improve memory overclocks in some cases.

HD5870 has seperate memory controller voltage supply from gpu voltage supply...so in that instance, no. I'm not sure about Fermi, but given seperate operating speed, I assume most modern gpus use different voltage supply from vGPU, as seperating them can only help bring stability.

Processor	Ryzen 7 5700X
Memory	48 GB
Video Card(s)	RTX 4080
Storage	2x HDD RAID 1, 3x M.2 NVMe
Display(s)	30" 2560x1600 + 19" 1280x1024
Software	Windows 10 64-bit

Processor	Intel Core i7 10850K@5.2GHz
Motherboard	AsRock Z470 Taichi
Cooling	Corsair H115i Pro w/ Noctua NF-A14 Fans
Memory	32GB DDR4-3600
Video Card(s)	RTX 2070 Super
Storage	500GB SX8200 Pro + 8TB with 1TB SSD Cache
Display(s)	Acer Nitro VG280K 4K 28"
Case	Fractal Design Define S
Audio Device(s)	Onboard is good enough for me
Power Supply	eVGA SuperNOVA 1000w G3
Software	Windows 10 Pro x64

Processor	Ryzen 7 5700X
Memory	48 GB
Video Card(s)	RTX 4080
Storage	2x HDD RAID 1, 3x M.2 NVMe
Display(s)	30" 2560x1600 + 19" 1280x1024
Software	Windows 10 64-bit

System Name	MRCOMP!
Processor	5800X3D
Motherboard	MSI Gaming Plus
Cooling	Corsair 280 AIO
Memory	64GB 3600mhz
Video Card(s)	GTX3060
Storage	1TB SSD
Display(s)	Samsung Neo
Case	No Case... just sitting on cardboard :D
Power Supply	Antec 650w

Processor	Ryzen 7 5700X
Memory	48 GB
Video Card(s)	RTX 4080
Storage	2x HDD RAID 1, 3x M.2 NVMe
Display(s)	30" 2560x1600 + 19" 1280x1024
Software	Windows 10 64-bit

The nVidia memory bandwidth myth explained.

W1zzard

Administrator

newtekie1

Semi-Retired Folder

Benetanegia

New Member

W1zzard

Administrator

slyfox2151

W1zzard

Administrator

wahdangun

Guest

Benetanegia

New Member

cadaveca

My name is Dave

Attachments

newtekie1

Semi-Retired Folder

wahdangun

Guest

cadaveca

My name is Dave