You can't make that assumption when you realize a bus at double the speed can handle twice the amount of requests with twice the granularity of the bus. Think of it like this: a bus is nothing but a road. A 512 bit wide bus have 512 lanes while a 256 bit wide bus have 256 lanes. The narrower road have traffic that moves at twice that speed. There's always some gaps in the traffic, and the narrower road will present gaps open for traffic twice as often as the wider bus.
It's a give or take between that and what I said. Wider bus does have finer granularity depending of what you understand from granularity in this case.
Following your analogy one of the things I said can be represented as you needing 196 lanes available at the same time to fit in a big vehicle. The 512 lanes one will have those lines available way more often. In the 256 line road you will probably have to stall the traffic in order to be able to fit your 192 wide vehicle. The final performace depends on how many times that will happen. Twice the requests only matters for punctual or aleatory SMALL chunks of data, which are scarce in graphics, and much scarcer in R6xx/7xx/8xx series of cards.
Aditionally you have free gaps more often, but the amount of them is smaller. Over the time the amount of gaps is the same, which is what matters is the end. The narrower road does have an advantage for punctual accesses but for that to be really advantageous, your acceleration has to be faster (lower latency). Without fast acceleration that would let you access the road faster, the gaps have to be bigger in order for you to be able to enter the lines (bad analogy probably). Latency is almost always comparatively higher on faster memories.
If you want another key factor that favors wider buses (and that also was behind my claim), power consumption is one. When circuits are running close to their clock limits power consumption (and heat, and current leakage and electromigration and probably many other things I can't think of now) grows exponentially. On the other hand increasing bus width increases it almost linearly.
Anyway, my assumtions are not such, because are based on an study I read some years ago that favored 256 bit wide bus against faster 128 bit one on graphics cards back then, with acutal empirical testing.
Of course my claim is not true for ALL buses and all implementations, but it's true in the case of graphics cards. I say this because maybe your problem was with the fact that my claim looked like I said a wider bus is always better, something that I didn't want to say.
DarkMatter,
You ignored the fact that the narrow bus (GDDR5) is much, much faster than the wide bus (GDDR3).
Absolute latency (= time) is what matters, numerical amount of latency cycles is irrelevant.
Actually no. Imagine you have to access 2048 bits of data, with both 256 and 512 bit buses. It will take 4 cycles to the 512 bit one and 8 cycles to 256 one. Now imagine a typical situation where the faster memory has higher numerical amount of latency cycles but runs at twice the speed. Imagine that translates to 4 ns (256 bit) and 5 ns (512bit) latencies (this is typical, like CAS 5 DDR2 and CAS 8 DDR3 for example). That would translate to 256 bit bus having 4*8= 32 ns of acumulated latency versus 5*4 = 20 ns of latency on the wider bus. Of course this is a worst case scenario for the 256 bit one because it implies that both have to access to the memory every cycle, and both find all the data they need in every cycle. In buffered situations the issue mentioned above loses importance, but it's still present to some extent, making the wider bus inherently better in that respect.
There's another situation, the one relevant to what Scyphe said. And that's when the buses have to access tons of small chunks of data. In this situation the slower wider bus is in a dissadvantage, because availavility of the bus is much more important than the amount of data it can carry. But this situation is extremely rare in buffered memories and much more in graphics cards, where data is usually big and coherent with the surrounding. For example vertex will have X, Y, Z components and pixels will have R, G , B, A.
The end result is a mix of those two extremes, between the need of more space or more availability. Stadistic science says that it is easier to fit (you can fit more) big things into big continents. So when data chunk is big enough, and in graphics they are, a wider bus is better.
@ both: I never said it is much better anyway. I would say the difference is within a 5% difference, but it IS esentially and stadistically better.