• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

HD 5870 Discussion thread.

Status
Not open for further replies.
Joined
Nov 21, 2007
Messages
3,688 (0.61/day)
Location
Ohio
System Name Felix777
Processor Core i5-3570k@stock
Motherboard Biostar H61
Memory 8gb
Video Card(s) XFX RX 470
Storage WD 500GB BLK
Display(s) Acer p236h bd
Case Haf 912
Audio Device(s) onboard
Power Supply Rosewill CAPSTONE 450watt
Software Win 10 x64
Not to bring back the hot debate "memory bottleneck" but i was reading some architecture documents and come to think about something. Many of you and reviewers noticed a 5-10% increase in FPS when memory was OC'ed.

5870 = 256bits (1200Mhz for 153.60GB/s)
5870 = 256bits (1350Mhz for 172.80GB/s) <--- = 11.25% more GB/s (+- 5-10% perfs)

Now what if it had a 512bit bus wide?

5870 = 512bits? (1200Mhz for 307.20GB/s) <--- = 200% more GB/s
5870 = 512bits? (1350Mhz for 345.60GB/s) :eek:

Wouldn't it make sense to expect more than 5-10% increase in FPS? i think yes
I think the issue has more to deal with "memory adressable" than the possible speed it operate.

Anyhow another good page to read.. http://www.beyond3d.com/content/reviews/53/7

um, we've had 2 members here post on this page the benefits that memory overclocking certainly aren't significant enough to make it a bottleneck, it scaled the performance up as any memory oc would. If it were bottlenecked it would increase performance more than what it has. I too used to believe that the HD 5870 had to be bottlenecked but after seeing wolf and bobzilla post their results and both cases memory didn't increase performance significantly i've come to the conclusion that memory isn't really the issue. Though if you mean something else by memory accessible then could you explain in more detail :toast:
 
Joined
Nov 4, 2005
Messages
11,690 (1.73/day)
System Name Compy 386
Processor 7800X3D
Motherboard Asus
Cooling Air for now.....
Memory 64 GB DDR5 6400Mhz
Video Card(s) 7900XTX 310 Merc
Storage Samsung 990 2TB, 2 SP 2TB SSDs and over 10TB spinning
Display(s) 56" Samsung 4K HDR
Audio Device(s) ATI HDMI
Mouse Logitech MX518
Keyboard Razer
Software A lot.
Benchmark Scores Its fast. Enough.
The real world latency drops as the cycle time becomes faster.


5 wait states at 1300MHZ is going to be less than 5 wait states at 1000MHZ, if the core is runing 1,000Mhz (or 1Ghz) then it will lose five cycles every time it has to wait for data from the vmem @ 1,000 Mhz, but if those cycles are 30% faster (1300Mhz) then it effectively gains one or two cycles per fetch back in performance, so a horribly optomized game will benefit from faster data delivery during cache misses than the supposed higher bandwidth that is being generated.


So a 30% performance gain in branch prediction failure, from faster memory results in a real world performance of X, or XX.


Again, not bandwidth driven.
 

Bo_Fox

New Member
Joined
May 29, 2009
Messages
480 (0.09/day)
Location
Barack Hussein Obama-Biden's Nation
System Name Flame Vortec Fatal1ty (rig1), UV Tourmaline Confexia (rig2)
Processor 2 x Core i7's 4+Gigahertzzies
Motherboard BL00DR4G3 and DFI UT-X58 T3eH8
Cooling Thermalright IFX-14 (better than TRUE) 2x push-push, Customized TT Big Typhoon
Memory 6GB OCZ DDR3-1600 CAS7-7-7-1T, 6GB for 2nd rig
Video Card(s) 8800GTX for "free" S3D (mtbs3d.com), 4870 1GB, HDTV Wonder (DRM-free)
Storage WD RE3 1TB, Caviar Black 1TB 7.2k, 500GB 7.2k, Raptor X 10k
Display(s) Sony GDM-FW900 24" CRT oc'ed to 2560x1600@68Hz, Dell 2405FPW 24" PVA (HDCP-free)
Case custom gutted-out painted black case, silver UV case, lots of aesthetics-souped stuff
Audio Device(s) Sonar X-Fi MB, Bernstein audio riser.. what??
Power Supply OCZ Fatal1ty 700W, Iceberg 680W, Fortron Booster X3 300W for GPU
Software 2 partitions WinXP-32 on 2 drives per rig, 2 of Vista64 on 2 drives per rig
Benchmark Scores 5.9 Vista Experience Index... yay!!! What??? :)
It's not just bandwidth that a game needs. Many games also benefit from lower latency cache/memory.

@ a_ump, it could be the hidden "upping" of latency that prevents any performance gains being seen from overclocking the memory. Many review sites are not aware of the error-correcting algorithm that if it's overclocked too much to the point where some errors are being produced, it would actually slow down the performance. It's such a new feature for a video card.

There's no doubting that a 5870 is absolutely starved for more bandwidth.

Upping the latency does hurt performance if it's increased way too much. 500MHz memory with CAS 2 latency is practically good as 1000MHz with CAS 4 latency in most applications. The world is not one-dimensional in that everything needs sheer bandwidth or buffer size or sheer speed with as low latency as possible. Some applications benefit more from increasing bandwidth. Some from increasing core processor clock. Some from increasing buffer size (for memory hungry apps).

1000MHz with CAS6 latency is usually slower than 500MHz with CAS2 latency for most applications. What we do not know for sure is that when we overclock the memory, does the driver automatically increase the latency more than we'd like? When some overclock their system memory just to see how high their clocks can get, they try to increase the latency by a couple notches, which really hurts performance.

There's just too much logical stuff pointing to how a 5870 just needs more bandwidth without losing too much latency.

I'm not saying this to "pwn" you guys. It's human to try to explore as much as possible from every angle of thought.
 
Last edited:

Bo_Fox

New Member
Joined
May 29, 2009
Messages
480 (0.09/day)
Location
Barack Hussein Obama-Biden's Nation
System Name Flame Vortec Fatal1ty (rig1), UV Tourmaline Confexia (rig2)
Processor 2 x Core i7's 4+Gigahertzzies
Motherboard BL00DR4G3 and DFI UT-X58 T3eH8
Cooling Thermalright IFX-14 (better than TRUE) 2x push-push, Customized TT Big Typhoon
Memory 6GB OCZ DDR3-1600 CAS7-7-7-1T, 6GB for 2nd rig
Video Card(s) 8800GTX for "free" S3D (mtbs3d.com), 4870 1GB, HDTV Wonder (DRM-free)
Storage WD RE3 1TB, Caviar Black 1TB 7.2k, 500GB 7.2k, Raptor X 10k
Display(s) Sony GDM-FW900 24" CRT oc'ed to 2560x1600@68Hz, Dell 2405FPW 24" PVA (HDCP-free)
Case custom gutted-out painted black case, silver UV case, lots of aesthetics-souped stuff
Audio Device(s) Sonar X-Fi MB, Bernstein audio riser.. what??
Power Supply OCZ Fatal1ty 700W, Iceberg 680W, Fortron Booster X3 300W for GPU
Software 2 partitions WinXP-32 on 2 drives per rig, 2 of Vista64 on 2 drives per rig
Benchmark Scores 5.9 Vista Experience Index... yay!!! What??? :)
Last edited:
Joined
Sep 1, 2009
Messages
1,183 (0.22/day)
Location
CO
System Name 4k
Processor AMD 5800x3D
Motherboard MSI MAG b550m Mortar Wifi
Cooling Corsair H100i
Memory 4x8Gb Crucial Ballistix 3600 CL16 bl8g36c16u4b.m8fe1
Video Card(s) Nvidia Reference 3080Ti
Storage ADATA XPG SX8200 Pro 1TB
Display(s) LG 48" C1
Case CORSAIR Carbide AIR 240 Micro-ATX
Audio Device(s) Asus Xonar STX
Power Supply EVGA SuperNOVA 650W
Software Microsoft Windows10 Pro x64
Steevo i get what you are trying to say..When i read what you typed about the latency it reminded me of a Phenom 2 NB. The NB dosent need the extra Bandwidth of 1600 its fine at 1333 but it likes tighter timing and the NB to have fast clock speeds like 2.8Ghz to 3Ghz. So the memory bandwidth of the 5870 is fine but it probably likes lower latency which we cant control but we can control the core speed which helps it out far greater than increasing the bandwidth.
 

newfellow

New Member
Joined
Aug 28, 2009
Messages
314 (0.06/day)
System Name ID
Processor Q9450 ~3.74Ghz
Motherboard ASUS-P5E
Cooling Air
Memory G.Skill CL4-8GB
Video Card(s) ATI/Geforce 5850/9800
Storage A-lot
Display(s) BenQ G2400WT
Case 900
Audio Device(s) Shitty ASUS FX;P
Power Supply OCZ GXS 850W
Software -
Benchmark Scores too many machines to spec
That doesn't make any sense absolutely none. Neither does original message. Which is why Steevo probably posted the nice picture.

Higher the latencies higher the speed memory can go that is win win situation lower the latencies smaller OC capability, if you look at standard DDR2 for example 800Mhz CL4 is absolutely same as CL6 1066Mhz with difference that RAM internal copying speed is actually 2000MB/s faster while actual latency of CL4 800Mhz or CL6 1066Mhz stays exactly the same as base latency also goes down on higher speed from 2.50ns to 1.66ns.
 
Joined
Nov 4, 2005
Messages
11,690 (1.73/day)
System Name Compy 386
Processor 7800X3D
Motherboard Asus
Cooling Air for now.....
Memory 64 GB DDR5 6400Mhz
Video Card(s) 7900XTX 310 Merc
Storage Samsung 990 2TB, 2 SP 2TB SSDs and over 10TB spinning
Display(s) 56" Samsung 4K HDR
Audio Device(s) ATI HDMI
Mouse Logitech MX518
Keyboard Razer
Software A lot.
Benchmark Scores Its fast. Enough.
Wow.

Just wow.


mebey I could draw a picture.


a------------------------>B

a needs to get to B data location to be processed.
a doesn't give a fuck if the road is a 12 lane, 2 lane or 1 lane.
B doesn't either.

if each line represents 1ns and a is forced to drive at 1ns per second it will take X numer of seconds get there. The lines represent a unchangin timing value, not a error checking value, not a changed value. Thus the reason we can't/shouldnt mess with timings on GPU's since, DDR.


So a is driving along happily at one dah per ns, untill we give him a NOS bottle. then it becomes one dash every .70ns same number of dashes, still doesn't ive a flying fuck about the width of the highway. B is still waiting.

B runs at one cycle per ns, ever ns a is late results in a performance hit. There could be a 12 lane freeway coming to B's house and it still woudn't stop the fact that a has to drive at the same speed. not one fucking bit.


The sooner a arrives, the sooner the epic plans of their consort can continue.

So today we have learned that the faster a drives (faster memory speed) the faster a branch prediction fault can be recovered, and normal operation can resume. Again.


a------------------>B
--------------------
--------------------
--------------------
--------------------
--------------------
--------------------
--------------------
--------------------
--------------------
--------------------
--------------------



all wasted, jsut wating on little a.
 

Bo_Fox

New Member
Joined
May 29, 2009
Messages
480 (0.09/day)
Location
Barack Hussein Obama-Biden's Nation
System Name Flame Vortec Fatal1ty (rig1), UV Tourmaline Confexia (rig2)
Processor 2 x Core i7's 4+Gigahertzzies
Motherboard BL00DR4G3 and DFI UT-X58 T3eH8
Cooling Thermalright IFX-14 (better than TRUE) 2x push-push, Customized TT Big Typhoon
Memory 6GB OCZ DDR3-1600 CAS7-7-7-1T, 6GB for 2nd rig
Video Card(s) 8800GTX for "free" S3D (mtbs3d.com), 4870 1GB, HDTV Wonder (DRM-free)
Storage WD RE3 1TB, Caviar Black 1TB 7.2k, 500GB 7.2k, Raptor X 10k
Display(s) Sony GDM-FW900 24" CRT oc'ed to 2560x1600@68Hz, Dell 2405FPW 24" PVA (HDCP-free)
Case custom gutted-out painted black case, silver UV case, lots of aesthetics-souped stuff
Audio Device(s) Sonar X-Fi MB, Bernstein audio riser.. what??
Power Supply OCZ Fatal1ty 700W, Iceberg 680W, Fortron Booster X3 300W for GPU
Software 2 partitions WinXP-32 on 2 drives per rig, 2 of Vista64 on 2 drives per rig
Benchmark Scores 5.9 Vista Experience Index... yay!!! What??? :)
Sadly, nobody really tested for this issue concerning a 5870 yet.

Only the engineers at ATI really know how much 512-bit bandwidth would benefit a 5870 without increasing latencies.

Many thought that an X1900XTX did not need more bandwidth (since we saw useless gains with overclocking the memory and greater gains with overclocking the core). However, ATI went ahead and did an X1950XTX with the only difference being ~30% greater bandwidth that yielded about 6-8% performance increase overall. It sold at a $100 price premium over an X1900XTX for several months, which helped ATI to rake in some money.

I do not think this is dumb at all. I do not read somewhat "dumb" posts thoroughly myself, but I definitely can see how some might classify me as "dumb" if my posts are not being read.


Logic 101

Prerequisite lesson:

Check the specifications of a 4890, 5770 and 5870. See how both a 4890 and a 5770 carry half the shader units, TMU's, and ROP's of a 5870--at the same clock speed.

Lesson 1:

A 4890 benefits with memory overclocked to 5870's bandwidth alone. I posted the evidence here in this thread.

Lesson 2:

A 4770 performs overall 20% worse than a 4890 despite slight architecture improvements (which still enabled it to beat a 4890 in 2 games out of like 20 games tested).

Lesson 3:

A 4770, with 5870's bandwidth, is 99.9% guaranteed to have at least a 26% performance increase.

Lesson 4:

A 5870 (2 times a 4770 core) is predicted to perform xx% better overall with 512-bit memory at the same latency.

I could post in theoretical benchmark numbers for a 5870 with 512-bit memory here if you want!
 
Last edited:

newfellow

New Member
Joined
Aug 28, 2009
Messages
314 (0.06/day)
System Name ID
Processor Q9450 ~3.74Ghz
Motherboard ASUS-P5E
Cooling Air
Memory G.Skill CL4-8GB
Video Card(s) ATI/Geforce 5850/9800
Storage A-lot
Display(s) BenQ G2400WT
Case 900
Audio Device(s) Shitty ASUS FX;P
Power Supply OCZ GXS 850W
Software -
Benchmark Scores too many machines to spec
Wow.

Just wow.


mebey I could draw a picture.

seems we need better picture. :)

plus couple more roads and no lanes. jumping from 2.50ns(ddr2-800 reference) to 0.70ns and back while store to buffer. o and thanks for Vista and 7 we actually do have to "a doesn't give a fuck if the road is a 12 lane, 2 lane or 1 lane." give a f.
 
Joined
Nov 4, 2005
Messages
11,690 (1.73/day)
System Name Compy 386
Processor 7800X3D
Motherboard Asus
Cooling Air for now.....
Memory 64 GB DDR5 6400Mhz
Video Card(s) 7900XTX 310 Merc
Storage Samsung 990 2TB, 2 SP 2TB SSDs and over 10TB spinning
Display(s) 56" Samsung 4K HDR
Audio Device(s) ATI HDMI
Mouse Logitech MX518
Keyboard Razer
Software A lot.
Benchmark Scores Its fast. Enough.
Your logic goes out the door when memory latentcy, GPU core steppings, and minor revisions ilke the 4890 had over the 4870 come into play.


We have done tests that show a increased bandwidth delivery, also a factor of memory speed and timings, have done nothing noteable with the 5XXX series. Why do we all strive for faster RAM? more bandwidth? Only on a few certain select applications, like media encoding, file compression, etc..... But more so to resolve a "fault" in less time. Lookup hard faults, page fault, cache and read up for the next year or so to understand.
 

Bo_Fox

New Member
Joined
May 29, 2009
Messages
480 (0.09/day)
Location
Barack Hussein Obama-Biden's Nation
System Name Flame Vortec Fatal1ty (rig1), UV Tourmaline Confexia (rig2)
Processor 2 x Core i7's 4+Gigahertzzies
Motherboard BL00DR4G3 and DFI UT-X58 T3eH8
Cooling Thermalright IFX-14 (better than TRUE) 2x push-push, Customized TT Big Typhoon
Memory 6GB OCZ DDR3-1600 CAS7-7-7-1T, 6GB for 2nd rig
Video Card(s) 8800GTX for "free" S3D (mtbs3d.com), 4870 1GB, HDTV Wonder (DRM-free)
Storage WD RE3 1TB, Caviar Black 1TB 7.2k, 500GB 7.2k, Raptor X 10k
Display(s) Sony GDM-FW900 24" CRT oc'ed to 2560x1600@68Hz, Dell 2405FPW 24" PVA (HDCP-free)
Case custom gutted-out painted black case, silver UV case, lots of aesthetics-souped stuff
Audio Device(s) Sonar X-Fi MB, Bernstein audio riser.. what??
Power Supply OCZ Fatal1ty 700W, Iceberg 680W, Fortron Booster X3 300W for GPU
Software 2 partitions WinXP-32 on 2 drives per rig, 2 of Vista64 on 2 drives per rig
Benchmark Scores 5.9 Vista Experience Index... yay!!! What??? :)
Your logic goes out the door when memory latentcy, GPU core steppings, and minor revisions ilke the 4890 had over the 4870 come into play.


We have done tests that show a increased bandwidth delivery, also a factor of memory speed and timings, have done nothing noteable with the 5XXX series. Why do we all strive for faster RAM? more bandwidth? Only on a few certain select applications, like media encoding, file compression, etc..... But more so to resolve a "fault" in less time. Lookup hard faults, page fault, cache and read up for the next year or so to understand.

What does a 4870 have to do with it? Honestly? I dare you to explain!
 

newfellow

New Member
Joined
Aug 28, 2009
Messages
314 (0.06/day)
System Name ID
Processor Q9450 ~3.74Ghz
Motherboard ASUS-P5E
Cooling Air
Memory G.Skill CL4-8GB
Video Card(s) ATI/Geforce 5850/9800
Storage A-lot
Display(s) BenQ G2400WT
Case 900
Audio Device(s) Shitty ASUS FX;P
Power Supply OCZ GXS 850W
Software -
Benchmark Scores too many machines to spec
Your logic goes out the door when memory latentcy, GPU core steppings, and minor revisions ilke the 4890 had over the 4870 come into play.


We have done tests that show a increased bandwidth delivery, also a factor of memory speed and timings, have done nothing noteable with the 5XXX series. Why do we all strive for faster RAM? more bandwidth? Only on a few certain select applications, like media encoding, file compression, etc..... But more so to resolve a "fault" in less time. Lookup hard faults, page fault, cache and read up for the next year or so to understand.

Ok, I was about to add thanks on this one, but hell I just had to read it again. This doesn't make absolutely no sense either.

We do not correct errors with higher bandwidth we do need bandwidth more than that access time and a lot of applications including games does memory caching not to even consider 'shared memory' dedicated Video memory which is these days is more RAM as there's a lot more memory in system than faster GDDRx memory.
 
Joined
Nov 4, 2005
Messages
11,690 (1.73/day)
System Name Compy 386
Processor 7800X3D
Motherboard Asus
Cooling Air for now.....
Memory 64 GB DDR5 6400Mhz
Video Card(s) 7900XTX 310 Merc
Storage Samsung 990 2TB, 2 SP 2TB SSDs and over 10TB spinning
Display(s) 56" Samsung 4K HDR
Audio Device(s) ATI HDMI
Mouse Logitech MX518
Keyboard Razer
Software A lot.
Benchmark Scores Its fast. Enough.
The 5870 is not the same chip as was the 4890, in the same manner the 4870 is not the 4890.

Just like luck of the draw CPU steppings, one GPU with a different revision code will clock differently than another. Thy make minor revisions to the process all the time and test those as they go out to determine their exact effect on performance, yield, power consumption, and ohter aspects.


the 5870 is not a 4890 doubled. Not.

The addition of thousands of extra surfaces rendered by the tessellator in test show a performance hit, but not nearly as large as the hit we would see from trying to render those surfaces the same way DX10, 9 or any other does.


We have evolved beyound the point of needing to feed the GPU that much data.


Dare away, if i don't are you going to punch your monitor, or yell at me?


If the memory was the bottleneck, then the performance increase would rise withthe memory speed directly. It doesn't.

So the only other plauseable explination is the faster data delivery helps offset soft fault delays. Something a larger on die cache might help with, but only if the branch prediction is working at 100% accuracy, and that never happens.
 

Bo_Fox

New Member
Joined
May 29, 2009
Messages
480 (0.09/day)
Location
Barack Hussein Obama-Biden's Nation
System Name Flame Vortec Fatal1ty (rig1), UV Tourmaline Confexia (rig2)
Processor 2 x Core i7's 4+Gigahertzzies
Motherboard BL00DR4G3 and DFI UT-X58 T3eH8
Cooling Thermalright IFX-14 (better than TRUE) 2x push-push, Customized TT Big Typhoon
Memory 6GB OCZ DDR3-1600 CAS7-7-7-1T, 6GB for 2nd rig
Video Card(s) 8800GTX for "free" S3D (mtbs3d.com), 4870 1GB, HDTV Wonder (DRM-free)
Storage WD RE3 1TB, Caviar Black 1TB 7.2k, 500GB 7.2k, Raptor X 10k
Display(s) Sony GDM-FW900 24" CRT oc'ed to 2560x1600@68Hz, Dell 2405FPW 24" PVA (HDCP-free)
Case custom gutted-out painted black case, silver UV case, lots of aesthetics-souped stuff
Audio Device(s) Sonar X-Fi MB, Bernstein audio riser.. what??
Power Supply OCZ Fatal1ty 700W, Iceberg 680W, Fortron Booster X3 300W for GPU
Software 2 partitions WinXP-32 on 2 drives per rig, 2 of Vista64 on 2 drives per rig
Benchmark Scores 5.9 Vista Experience Index... yay!!! What??? :)
Ok, I was about to add thanks on this one, but hell I just had to read it again. This doesn't make absolutely no sense either.

We do not correct errors with higher bandwidth we do need bandwidth more than that access time and a lot of applications including games does memory caching not to even consider 'shared memory' dedicated Video memory which is these days is more RAM as there's a lot more memory in system than faster GDDRx memory.

Let's keep it up like this! The tune's starting to sound nice.. just kidding! :cool:
 

Bo_Fox

New Member
Joined
May 29, 2009
Messages
480 (0.09/day)
Location
Barack Hussein Obama-Biden's Nation
System Name Flame Vortec Fatal1ty (rig1), UV Tourmaline Confexia (rig2)
Processor 2 x Core i7's 4+Gigahertzzies
Motherboard BL00DR4G3 and DFI UT-X58 T3eH8
Cooling Thermalright IFX-14 (better than TRUE) 2x push-push, Customized TT Big Typhoon
Memory 6GB OCZ DDR3-1600 CAS7-7-7-1T, 6GB for 2nd rig
Video Card(s) 8800GTX for "free" S3D (mtbs3d.com), 4870 1GB, HDTV Wonder (DRM-free)
Storage WD RE3 1TB, Caviar Black 1TB 7.2k, 500GB 7.2k, Raptor X 10k
Display(s) Sony GDM-FW900 24" CRT oc'ed to 2560x1600@68Hz, Dell 2405FPW 24" PVA (HDCP-free)
Case custom gutted-out painted black case, silver UV case, lots of aesthetics-souped stuff
Audio Device(s) Sonar X-Fi MB, Bernstein audio riser.. what??
Power Supply OCZ Fatal1ty 700W, Iceberg 680W, Fortron Booster X3 300W for GPU
Software 2 partitions WinXP-32 on 2 drives per rig, 2 of Vista64 on 2 drives per rig
Benchmark Scores 5.9 Vista Experience Index... yay!!! What??? :)
The 5870 is not the same chip as was the 4890, in the same manner the 4870 is not the 4890.

Just like luck of the draw CPU steppings, one GPU with a different revision code will clock differently than another. Thy make minor revisions to the process all the time and test those as they go out to determine their exact effect on performance, yield, power consumption, and ohter aspects.


the 5870 is not a 4890 doubled. Not.

The addition of thousands of extra surfaces rendered by the tessellator in test show a performance hit, but not nearly as large as the hit we would see from trying to render those surfaces the same way DX10, 9 or any other does.


We have evolved beyound the point of needing to feed the GPU that much data.


Dare away, if i don't are you going to punch your monitor, or yell at me?


If the memory was the bottleneck, then the performance increase would rise withthe memory speed directly. It doesn't.

So the only other plauseable explination is the faster data delivery helps offset soft fault delays. Something a larger on die cache might help with, but only if the branch prediction is working at 100% accuracy, and that never happens.

Already mentioned "slight performance optimizations" with the 5770 over a 4890 numerous times in this thread.

slight performance optimizations: stuff that you regurgitate above

Once again, the performance never scales linearly with the bandwidth. It has never been the case for the past 10 years, but we always needed bandwidth. I'd gladly trade my 128-bit 5770 for your 256-bit 4890, even if I lose DX11 features. Would you trade it with me? Yes or no?

That's the million-dollar answer. It defies all things illogical.
 

newfellow

New Member
Joined
Aug 28, 2009
Messages
314 (0.06/day)
System Name ID
Processor Q9450 ~3.74Ghz
Motherboard ASUS-P5E
Cooling Air
Memory G.Skill CL4-8GB
Video Card(s) ATI/Geforce 5850/9800
Storage A-lot
Display(s) BenQ G2400WT
Case 900
Audio Device(s) Shitty ASUS FX;P
Power Supply OCZ GXS 850W
Software -
Benchmark Scores too many machines to spec
I'd gladly trade my 128-bit 5770 for your 256-bit 4890, even if I lose DX11 features.

Generally this is very interesting point.

I mean of course would be a good trade in all senses, if new GPU features like DirectX 11 would be meaningless (as explained above tesselation makes no difference what so ever to memory). Also considering that 4890 comes with so called (loaning a little) '8 lanes' where as 5770 comes in '4 lanes' (more ideal explainatory of course would be to compare 'dual channel'(128-bit) to 'single channel'(64-bit) (ddr2 reference) and how much better latency you could get from single channel, lol, which ain't much, but makes a lot on access). the 4890 would "thread" twice as fast and gain on memory speed while 5770 bandwidth would be completely dependable on the lower latency where the memory is at to have also faster access time with same internal memory speed with lower read/write external memory speeds which we cannot alter with, atm, softwares like computer BIOSes to alter usual RAM latencies.

but in theory 5770 could be as fast as 4890 memory interface with lower latencies.
 

Bo_Fox

New Member
Joined
May 29, 2009
Messages
480 (0.09/day)
Location
Barack Hussein Obama-Biden's Nation
System Name Flame Vortec Fatal1ty (rig1), UV Tourmaline Confexia (rig2)
Processor 2 x Core i7's 4+Gigahertzzies
Motherboard BL00DR4G3 and DFI UT-X58 T3eH8
Cooling Thermalright IFX-14 (better than TRUE) 2x push-push, Customized TT Big Typhoon
Memory 6GB OCZ DDR3-1600 CAS7-7-7-1T, 6GB for 2nd rig
Video Card(s) 8800GTX for "free" S3D (mtbs3d.com), 4870 1GB, HDTV Wonder (DRM-free)
Storage WD RE3 1TB, Caviar Black 1TB 7.2k, 500GB 7.2k, Raptor X 10k
Display(s) Sony GDM-FW900 24" CRT oc'ed to 2560x1600@68Hz, Dell 2405FPW 24" PVA (HDCP-free)
Case custom gutted-out painted black case, silver UV case, lots of aesthetics-souped stuff
Audio Device(s) Sonar X-Fi MB, Bernstein audio riser.. what??
Power Supply OCZ Fatal1ty 700W, Iceberg 680W, Fortron Booster X3 300W for GPU
Software 2 partitions WinXP-32 on 2 drives per rig, 2 of Vista64 on 2 drives per rig
Benchmark Scores 5.9 Vista Experience Index... yay!!! What??? :)
Yes, very true. But 1000MHz with CAS4 is still better than 500Mhz with CAS2, although both might perform very similarly in most applications. There are a few scenarios that still benefit from increased bandwidth also.

Also, I doubt that a 5770's GDDR5 memory has lower latencies than a 4890. It's actually clocked higher than 4890's, but with 128-bit bus. If anything, the latency is probably identical (CAS 11 or so).
 

newfellow

New Member
Joined
Aug 28, 2009
Messages
314 (0.06/day)
System Name ID
Processor Q9450 ~3.74Ghz
Motherboard ASUS-P5E
Cooling Air
Memory G.Skill CL4-8GB
Video Card(s) ATI/Geforce 5850/9800
Storage A-lot
Display(s) BenQ G2400WT
Case 900
Audio Device(s) Shitty ASUS FX;P
Power Supply OCZ GXS 850W
Software -
Benchmark Scores too many machines to spec
Yes, very true. But 1000MHz with CAS4 is still better than 500Mhz with CAS2, although both might perform very similarly in most applications. There are a few scenarios that still benefit from increased bandwidth also.

Actually if you are using this as reference to how this would act in GDDRx as in graphical memory side. It would have huge benefit from so called 'Internal Memory Copy Speed' hundreds of times more/larger meaning in GPU world than than on RAM. Write and Read speeds in this sense could basically be what ever as long it has insane internal speed as most of the applications which require graphics does everything to keep the data on GPU memory instead of moving it to store on system RAM. (Edit: with exception of 'Microsoft Shared Video Memory'-scheme but 128-bit is still as fast as Dual Channel interface.)

Also, I doubt that a 5770's GDDR5 memory has lower latencies than a 4890. It's actually clocked higher than 4890's, but with 128-bit bus. If anything, the latency is probably identical (CAS 11 or so).

No, need to doubt there's no reason to. Why would they change the latency when they can sell more cards, heh.
 
Joined
May 4, 2009
Messages
1,970 (0.36/day)
Location
Bulgaria
System Name penguin
Processor R7 5700G
Motherboard Asrock B450M Pro4
Cooling Some CM tower cooler that will fit my case
Memory 4 x 8GB Kingston HyperX Fury 2666MHz
Video Card(s) IGP
Storage ADATA SU800 512GB
Display(s) 27' LG
Case Zalman
Audio Device(s) stock
Power Supply Seasonic SS-620GM
Software win10
Yes, very true. But 1000MHz with CAS4 is still better than 500Mhz with CAS2, although both might perform very similarly in most applications. There are a few scenarios that still benefit from increased bandwidth also.

Also, I doubt that a 5770's GDDR5 memory has lower latencies than a 4890. It's actually clocked higher than 4890's, but with 128-bit bus. If anything, the latency is probably identical (CAS 11 or so).

If video ram is the same as normal ddr ram, then yes
 

Attachments

  • 2009-11-10_150854.jpg
    2009-11-10_150854.jpg
    165.7 KB · Views: 335
Last edited:
Joined
Sep 8, 2009
Messages
1,056 (0.20/day)
Location
Porto
Processor Ryzen 9 5900X
Motherboard Gigabyte X570 Aorus Pro
Cooling AiO 240mm
Memory 2x 32GB Kingston Fury Beast 3600MHz CL18
Video Card(s) Radeon RX 6900XT Reference (amd.com)
Storage O.S.: 256GB SATA | 2x 1TB SanDisk SSD SATA Data | Games: 1TB Samsung 970 Evo
Display(s) LG 34" UWQHD
Audio Device(s) X-Fi XtremeMusic + Gigaworks SB750 7.1 THX
Power Supply XFX 850W
Mouse Logitech G502 Wireless
VR HMD Lenovo Explorer
Software Windows 10 64bit
If the memory was the bottleneck, then the performance increase would rise withthe memory speed directly. It doesn't.


Back this up with something real or your whole point is flawed.
 

grimeleven

New Member
Joined
Oct 10, 2009
Messages
19 (0.00/day)
Processor Intel Core i7@3.5Ghz
Motherboard eVGA X58SLI
Cooling TRUE 120 Xtreme
Memory 6GB Aeneon 1866Mhz
Video Card(s) 4870X2 2GB /w AC Xtreme cooler
Storage Vertex 120g
Display(s) Samsung 32 inch LCD 1080p
Case HAF932
Audio Device(s) SB X-Fi
Power Supply Antec TP3 650W
um, we've had 2 members here post on this page the benefits that memory overclocking certainly aren't significant enough to make it a bottleneck, it scaled the performance up as any memory oc would. If it were bottlenecked it would increase performance more than what it has. I too used to believe that the HD 5870 had to be bottlenecked but after seeing wolf and bobzilla post their results and both cases memory didn't increase performance significantly i've come to the conclusion that memory isn't really the issue. Though if you mean something else by memory accessible then could you explain in more detail :toast:

What i meant is the memory channels, forgot to include it.

Look at the design, 4x64bit MC and 4xL2 caches compared to 8x64bit MC on the 512bit controller. Again i agree the difference in Mhz vs performance benefits doesn't justify any overclock, my point is about the data that can be fetched in and out, (like the way Intel has built their X-25 SSD, more memory channels than any other vendors).
 

newfellow

New Member
Joined
Aug 28, 2009
Messages
314 (0.06/day)
System Name ID
Processor Q9450 ~3.74Ghz
Motherboard ASUS-P5E
Cooling Air
Memory G.Skill CL4-8GB
Video Card(s) ATI/Geforce 5850/9800
Storage A-lot
Display(s) BenQ G2400WT
Case 900
Audio Device(s) Shitty ASUS FX;P
Power Supply OCZ GXS 850W
Software -
Benchmark Scores too many machines to spec
What i meant is the memory channels, forgot to include it. http://www.beyond3d.com/images/reviews/cypress-arch/cypress-arch-big.png
Look at the design, 4x64bit MC and 4xL2 caches compared to 8x64bit MC on the 512bit controller. Again i agree the difference in Mhz vs performance benefits doesn't justify any overclock, my point is about the data that can be fetched in and out, (like the way Intel has built their X-25 SSD, more memory channels than any other vendors).

This has some flaws in general when we come to how the data is output/input while there's so much data that it doesn't squeeze all in there (like 3D games streaming). The internal is insanely fast, but considering all the Intructions, all the core specific speed ups and on tops on that incredible memory speed none of this will make it 'a ok' when we are considering streaming 25GB BluRay(example) of textures like 'Rage' presented in the future. That data will still be buffered on X times slower RAM and that is the point where MS Shared memory fails & where every single one of us will have the bottleneck a head. Only way to prevent this is to have graphic design equal to entire shared video memory & very low fast DDR to close match the continuous stream.

Of course level based pre-loading would kill this idea, but that's way too slow technology when we look at data size of today as well as compression techniques on HDR/Textures & general graphical data are way more important than we think and stressful to process "in action". This is where the fast acting latencies & speed is killed away to get very very fast access time to the card sadly out mainboards, RAM nor CPU ain't exactly there yet.
 
Joined
Nov 4, 2005
Messages
11,690 (1.73/day)
System Name Compy 386
Processor 7800X3D
Motherboard Asus
Cooling Air for now.....
Memory 64 GB DDR5 6400Mhz
Video Card(s) 7900XTX 310 Merc
Storage Samsung 990 2TB, 2 SP 2TB SSDs and over 10TB spinning
Display(s) 56" Samsung 4K HDR
Audio Device(s) ATI HDMI
Mouse Logitech MX518
Keyboard Razer
Software A lot.
Benchmark Scores Its fast. Enough.
For a increase of 20% you would need to increase your memory speed by 70% of stock.

Notice how the lines have a nice trend, only about a 3% increase in performance for very 10% increase in memory speed.

Now that we all have a hard factual data based of actual figures from this thread. Let discuss what it means. Obviously faster memory does increase the card's performance. It does not follow the trend established by the memory for percent of increase. So then we must look elsewhere as to WHY the trend follows WHAT rule. The next one is branch prediction fault, and the effect of latency on recovery time. We find that since the bandwidth is not directly proportional to performance we infer that the next thing related to memory speed and data delivery that can explain the results is highly probable.
 

Attachments

  • untitled.JPG
    untitled.JPG
    133.8 KB · Views: 395

newfellow

New Member
Joined
Aug 28, 2009
Messages
314 (0.06/day)
System Name ID
Processor Q9450 ~3.74Ghz
Motherboard ASUS-P5E
Cooling Air
Memory G.Skill CL4-8GB
Video Card(s) ATI/Geforce 5850/9800
Storage A-lot
Display(s) BenQ G2400WT
Case 900
Audio Device(s) Shitty ASUS FX;P
Power Supply OCZ GXS 850W
Software -
Benchmark Scores too many machines to spec
For a increase of 20% you would need to increase your memory speed by 70% of stock.

At the system not the GPU. Games streams data no matter what the 'decided' footprint on GPU memory is it still goes from low latency system memory against high latency high speed GPU Memory which we cannot match no matter what we do. Buffers does solve it sure, but what when a single scene streamed data is twice the size of your GPU memory? (sure you can dump me something like Crysis only uses 900MB on tops, but there's a lot more than 900MB behind there waiting to be written to card).

Notice how the lines have a nice trend, only about a 3% increase in performance for very 10% increase in memory speed.

Look at your own picture 900/1250 and 950/1100 that speaks against this. Raising an GPU increased same amount of speed on your test -150Mhz on memory while memory speed increase did nothing as faster command rate. This is exactly the same as lower latencies which would actually even affect more, but since tighter latencies and lower speed decreases the internal speed only place you would actually see difference would be in while streaming enough data through system.

Now that we all have a hard factual data based of actual figures from this thread. Let discuss what it means. Obviously faster memory does increase the card's performance. It does not follow the trend established by the memory for percent of increase. So then we must look elsewhere as to WHY the trend follows WHAT rule. The next one is branch prediction fault, and the effect of latency on recovery time. We find that since the bandwidth is not directly proportional to performance we infer that the next thing related to memory speed and data delivery that can explain the results is highly probable.

We do have hard 'factual' data it simply doesn't make any sense as long we do not have an bottom line benchmark to do an individual streaming through to actually compare what does the new high-end compression algorithms do against streaming data through GDDRx and system-DDRx memory buffer. Faster memory did increase the performance because of lowered latency = faster access time that is exactly the 'Directly proportional to performance'. Not because there would of been speed behind it and considering very tight latencies against low speeds access time could be tweaked to very very similar levels on lower speeds than, atm, higher clocks with very small penalty of absolute top clock speed of memory or overclock ability.

Been tweaking so damn too many computers with so damn too many manufacturers memory chips and done tech support so too damn long that I think I can make argument latency against speed any time. We're still looking exactly same bottom line no matter what DRAM we speak of and specially on an card/GPU system where the threading cannot be performed in similar speeds as on even older cards. :) :) :)
 
Last edited:
Joined
Nov 21, 2007
Messages
3,688 (0.61/day)
Location
Ohio
System Name Felix777
Processor Core i5-3570k@stock
Motherboard Biostar H61
Memory 8gb
Video Card(s) XFX RX 470
Storage WD 500GB BLK
Display(s) Acer p236h bd
Case Haf 912
Audio Device(s) onboard
Power Supply Rosewill CAPSTONE 450watt
Software Win 10 x64
ah i see grimeleven.

as for bo_fox, i have alot of respect for you and your posts but your theoretical perforamnce increases just aren't real world...as was proven by wolf and bobzilla on the previous page. did you look at those posts? they keep the same core speed and increased the memory speed from 1200-1350, each 50mhz it only improved performance by .6-.9% for a grand total of ~3% performance increase after increasing the memory by 18%...that's not a bottleneck.

grimeleven's statement does make better sense though and i think there is some merit to it.
 
Status
Not open for further replies.
Top