Anyone knows if games generally favors high frequency over low latency VRAM?
I'm asking because I modified the bios on my Rx 580 to use the 1900MHz memory strap above 2000MHz. Before the bios mod I clocked the Vram @2400MHz, after the mod Max is about 2150MHz but my 3Dmark score increased. So I'm wondering if games also prefer low latency VRAM, but I don't have any games with reliable benchmarks. Searching Google only displays results related to mining or regular system RAM. But I read somewhere that graphics is all about bandwidth (MHz) but this was related to integrated graphics, so I don't know if dedicated graphics also prefer high frequency.
I'm guessing it's probably a balance between frequency and timings but would like to know.
It's usefull to have both.
They simply come to play at different times (and actual gain from either, depends on gpu architecture).
Short version :
Bandwidth is needed always (because creating pixels is VERY data manipulation intensive and it has to go/come from somewhere).
If GPU has the compute power, bandwidth is essencial to keep it "fed" and operate at peak values.
Latency on the other hand is good to have for efficient use of GPUs power (less time it takes to deliver new data, the faster work can start on it).
But getting large ammounts of data at the quickest time isn't easy.
That's why most newer GPUs "hide" latency behind complex instructions (for data already operated on), and large caches.
It gives memory/IMC time to deliver the next batch for processing.
GTX 1070 Ti vs. GTX 1080 is great example of this.
G5 VRAM on former has tighter latency, while G5X VRAM on latter has higher bandwidth throughput.
Still, it's a Pascal example, while you own GCN.
Forgot to mention my Rx 580 is the 4GB version so default ram frequency is only 1750.
Kastriot: Why do you think 2000MHz is faster than 2150MHz, do you mean I should find the lowest memory strap that's stable @2GHz?
High memory clock is always good, we need as much maximum bandwidth as possible, but because of the Architecture + IMC + messy/bad Timings the real bandwith is lower, keepeing the memory at a high-but-not-to-high frequency + reduced comunication / roundtrip latencies can be more effective in the real world and produce more performance (try OCLMembench or AIDA64). Polaris memory more or less always clock around 2.1-2.2Ghz (the gpu is designed for 2Ghz, even the new RX590 use 2(8)Ghz GDDR5,or maybe is just a cost saving thing, the IMC@12nm should support 9Ghz vram ), give the memory good timings and it will work great. I don't like the Low latency vs High frequency argument, High clock + decent timings give the best results most of the time .The real latency in ns for any memory operation depend from single timing (in clock cycles) but also from the clock cycle speed (frequency). You need to find the best combination for your gpu/memory chip. Just my two cents here
Overall between the two, latency wins. To give you an idea try this link for DDR memory (fill-in yellow boxes only) & pay particular attention to the eight word (far right column).
Overall between the two, latency wins. To give you an idea try this link for DDR memory (fill-in yellow boxes only) & pay particular attention to the eight word (far right column).
I just edited my last post. Latency is still important, as it is still a unwanted delay. Can someone please post a BIOS timings of a GFX card memory timings.
I think such a screenshot will be useful for this thread.
Overall between the two, latency wins. To give you an idea try this link for DDR memory (fill-in yellow boxes only) & pay particular attention to the eight word (far right column).
We do not have dividers/multipliers for memory clock in GPUs, we have the freedom to use every latency/clock combination we want. With CPUs you need to choose a frequency strap and than use certain timings that works << some combinations are better than others.
What if we take the DDR4-3600@CL16 and we OC the memory to lets say 3700Mhz?(if the IMC is not holding us back)
also (testing the spreadsheet), a "fake" DDR4-8000@CL24 is faster (latency and bandwidth) than a DDR4-7000@CL22 that is faster than DDR4-6500@CL21. I'm pretty sure GDDR5 behave a little different than DDR4 at high clocks, but the idea and results are probably the same or at least comparable.
Thanks. It's not often you see this. My opinion on this is to up the frequency their increased the timings. This is something I don't like when you get near the top-end frequency. ie I would have like to see @1500MHz it says the same all the way to 2000MHz.
What it does show is that the controller is capable of tight timings. This is what I would adjust, because at higher resolution you should get smoother game-play (less stuttering).
EDIT: You need to start a new thread for standard DDRx comments.
Thanks. It's not often you see this. My opinion on this is to up the frequency their increased the timings. This is something I don't like when you get near the top-end frequency. ie I would have like to see @1500MHz it says the same all the way to 2000MHz.
What it does show is that the controller is capable of tight timings. This is what I would adjust, because at higher resolution you should get smoother game-play (less stuttering).
https://www.overclock.net/forum/67-amd/1604567-polaris-bios-editing-rx5xx-rx4xx.html
check the "Memory Overclock - Scaling - Errors monitoring " section.
The user "-Loladinas- " made a test from 2000Mhz to 2250Mhz, Stock vs (My)UberMix v3.1 timings.
UberMix v3.1 are a 1500 - 1625 - 2000 straps Mix.
On Polaris the last strap is 2000Mhz, so after that "all" remain the same, no steps from there.
https://www.overclock.net/forum/67-amd/1604567-polaris-bios-editing-rx5xx-rx4xx.html
check the "Memory Overclock - Scaling - Errors monitoring " section.
The user "-Loladinas- " made a test from 2000Mhz to 2250Mhz, Stock vs (My)UberMix v3.1 timings.
UberMix v3.1 are a 1500 - 1625 - 2000 straps Mix.
On Polaris the last strap is 2000Mhz, so after that "all" remain the same, no steps from there.
this is what i found months ago searching online https://docs.google.com/document/d/1CB8AtN0LhfR-kH0hi4pm6eMJfE3CNLLHB2bYt-nGpHI/edit
TRCDW =“Number of cycles from active to write”
TRCDWA = “ Number of cycles from active to write with auto-precharge. Same as TRCDW”
TRCDR = “Number of cycles from active to read”
TRCDRA = “Number of cycles from active to read with auto-precharge. Same as TRCDR”
this is what i found months ago searching online https://docs.google.com/document/d/1CB8AtN0LhfR-kH0hi4pm6eMJfE3CNLLHB2bYt-nGpHI/edit
TRCDW =“Number of cycles from active to write”
TRCDWA = “ Number of cycles from active to write with auto-precharge. Same as TRCDW”
TRCDR = “Number of cycles from active to read”
TRCDRA = “Number of cycles from active to read with auto-precharge. Same as TRCDR”
I have seen some games do better with low latency and some do better with more speed.... so like many things, there's no 100% answer. One rule of thumb, anad again by no means a 100% predictor, is this:
CAS x 1000 / Latency....lowest number wins
CAS 15 x 1000 / DDR 3000 = 0.500 ms
CAS 16 x 1000 / DDR 3200 = 0.500 ms
To determine moire accurately.... have to test every pairing
Looking at the timing posted in this thread needs proper investigation.
This is what's grabbing my eyes.
At the lower end (slowest speed) TRCDxx are almost on the same clock cycle, but at the highest speed their very far apart. WHY? This must be solved first,
In my testing on normal DDR, did you know that TRCD is partly responsible for micro-stutter. I have no idea if it applies to VRAM, but I will take a wild guess & say, yes it does.
The timing spread is going to have an effect on performance, that's 100% certain, that's why it is there, but it's only generic to cover all cards.
I have seen some games do better with low latency and some do better with more speed.... so like many things, there's no 100% answer. One rule of thumb, anad again by no means a 100% predictor, is this:
CAS x 1000 / Latency....lowest number wins
CAS 15 x 1000 / DDR 3000 = 0.500 ms
CAS 16 x 1000 / DDR 3200 = 0.500 ms
To determine moire accurately.... have to test every pairing
CAS on it's own does not determined the overall performance of DDRx. The link I posted in this thread does not take into account TRCD & TRP.
When all three timing are on the "same clock cycle" (important), it's going to have an effect on other timings which will also be lower (faster). This is where the lower latency DDRx pulls ahead.
Keep in mind that the v4/2150Mhz timings are not "super optimized", while the v3/2100Mhz timings are very(very) optimized for my card.
I'm probably not proving something here, but what i can notice is that with 8.6Ghz memory and a "ok/kinda decent set of timings" i can score the same or a little higher than a super tight set of timings with a slower clocked memory.
I used the Superposition benchmark because i find it very consistent, and at 5K with high texture/low shader i hope to push the ROPs/Memory very hard, using as much bandwith as possible.
i can't go higher than 2150Mhz/8.6Ghz because of the IMC...i see EDC errors @8.7Ghz
Keep in mind that the v4/2150Mhz timings are not "super optimized", while the v3/2100Mhz timings are very(very) optimized for my card.
I'm probably not proving something here, but what i can notice is that with 8.6Ghz memory and a "ok/kinda decent set of timings" i can score the same or a little higher than a super tight set of timings with a slower clocked memory.
I used the Superposition benchmark because i find it very consistent, and at 5K with high texture/low shader i hope to push the ROPs/Memory very hard, using as much bandwith as possible.
i can't go higher than 2150Mhz/8.6Ghz because of the IMC...i see EDC errors @8.7Ghz
You have now posted extended list of memory timings. I just read again what the OP is saying. He stating he has higher performance with 3Dmark with lower VRAM clock speed. If this is the case, then it clearly shows lower latency VRAM timings is the way to go.
Forgot to mention my Rx 580 is the 4GB version so default ram frequency is only 1750.
Kastriot: Why do you think 2000MHz is faster than 2150MHz, do you mean I should find the lowest memory strap that's stable @2GHz?
Amd graphics cards incorporated a version of error check and control and at the highest just stable end you loose performance because of repeated memory requests so sometimes faster is slower.
But different game's definitely also favour one ir the other sometimes ,most games love lower latency though IMHO.
I had 480s with custom straps for mining that gamed really well with 268.5Gb/s bandwidth.
In the end you can increase that figure either way or both but that's what matters.
You have now posted extended list of memory timings. I just read again what the OP is saying. He stating he has higher performance with 3Dmark with lower VRAM clock speed. If this is the case, then it clearly shows lower latency VRAM timings is the way to go.
We still don't know if his 2400mhz memory clock is "stable" and if there are EDC errors. We don't know anything about the specific memory brand he is using, timings, we also have no 3dmark scores from him ... i really still can't see any "proof of concept" here.
I'm really curious to do more test and gather more data about this subject, we can't decide on personal opinions.
Any idea on how i can do some more in-depth tests?
Amd graphics cards incorporated a version of error check and control and at the highest just stable end you loose performance because of repeated memory requests so sometimes faster is slower.
But different game's definitely also favour one ir the other sometimes ,most games love lower latency though IMHO.
I had 480s with custom straps for mining that gamed really well with 268.5Gb/s bandwidth.
In the end you can increase that figure either way or both but that's what matters.
We still don't know if his 2400mhz memory clock is "stable" and if there are EDC errors. We don't know anything about the specific memory brand he is using, timings, we also have no 3dmark scores from him ... i really still can't see any "proof of concept" here.
I'm really curious to do more test and gather more data about this subject, we can't decide on personal opinions.
Any idea on how i can do some more in-depth tests?
are we talking about latency in nanoseconds or in clock cycles?