• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

PCI-e 3.0 x8 may not have enough bandwidth for RX 5500 XT

Yes, AMD has a lot of them that ship with PCIe x8:
 
Yes, AMD has a lot of them that ship with PCIe x8:
A couple of those should be tested in "an X570 environment" to see if they too behave the same way as 5500XT.

Depending on which slot in the board it's connected to, doesn't the board turn x16 into x8 when certain amount of devices are connected? IIRC, there are even instances where the boards turn x16 into x4.

There are scenarios where x16 capable GPUs are running @ x8 not because of a BIOS selection but because of the amount of devices hooked up to the system. With these in mind, such PCIe scaling tests ought to be performed, no? Isn't that the whole point of PCIe scaling tests to begin with?
 
Coming back to this, is there no way to change say the PCIe 4.0 x1 signal into PCIe 3.0 x2 on the fly? Would solve most if not all of these PCIe lane issues on high end boards & cards.
 
A couple of those should be tested in "an X570 environment" to see if they too behave the same way as 5500XT.

Depending on which slot in the board it's connected to, doesn't the board turn x16 into x8 when certain amount of devices are connected? IIRC, there are even instances where the boards turn x16 into x4.

There are scenarios where x16 capable GPUs are running @ x8 not because of a BIOS selection but because of the amount of devices hooked up to the system. With these in mind, such PCIe scaling tests ought to be performed, no? Isn't that the whole point of PCIe scaling tests to begin with?
Some boards drop to x8 in the GPU slot, yes. But why the reindeer games when simply changing it in the bios for testing is possible.....just like the testing did.

I dont think saturating the pcie bus by playing a game and say transferring a massive file between m.2 nvme drives is really common or worth it? Did I misunderstand?
Coming back to this, is there no way to change say the PCIe 4.0 x1 signal into PCIe 3.0 x2 on the fly? Would solve most if not all of these PCIe lane issues on high end boards & cards.
To what end? Are you talking on a 4.0 x1 slot??? What does that have to do with a gpu?
 
Last edited:
Some boards drop to x8 in the GPU slot, yes. But why the reindeer games when simply changing it in the bios for testing is possible.....just like the testing did.

To what end? Are you talking on a 4.0 x1 slot??? What does that have to do with a gpu?
Except it didn't: check the methodology in the 1st and 3rd links in post #95.

It was done with the 2080Ti but not the same way with the 5700XT.

Question: would a regular x16 capable card work @ x8 (in bandwidth) in a PCIe 4.0 x4 slot (due to amount of connected devices) or @ x4?
 
Except it didn't: check the methodology in the 1st and 3rd links in post #95.

It was done with the 2080Ti but not the same way with the 5700XT.

Question: would a regular x16 capable card work @ x8 (in bandwidth) in a PCIe 4.0 x4 slot (due to amount of connected devices) or @ x4?
I was talking about the original testing in the thread. My fault...

As far as your question, if the x16 physical is at x4 electrical, it should work if it is an amd card. Iirc, nvidia cards dont work under x8.... at least with sli.
 
Last edited:
To what end? Are you talking on a 4.0 x1 slot??? What does that have to do with a gpu?
No I'm talking about (electrically) converting PCIe 4.0 signal into 2x their equivalent PCIe 3.0 lanes on the fly, for instance on motherboards where the second (GPU) slot is limited to x8 or x4 as the first slot is occupied & this could be useful there. It would also be mightily handy especially for M.2 slots & other (expansion) cards, assuming it can be done in the first place.
 
there is no difference.
doing it with the board reducing the speed because you have another 2 devices, will be no different then switching bios to lower speed and/or gen.
but even if you switch on purpose to x4 (or because of multiple devices) wont matter much except for ppl owning a 2080ti or faster chip (quadro etc).
those are the only times were i can saturate 3.0 @x8/4.0 @x4.

slower cards will most of the time perform better (pcie 3.0 x8 vs x16) as you have less overhead, should be same with 4.0 @x4.

After using 3DMark11 (X setting) as bench to get some numbers, seems that with pcie 4.0 it won't matter much anymore compared to 3.0 (switching from x16 to x8),
as i get virtually identical numbers no matter what i used (3.0 @8/16 or 4.0 @x8/16), e.g below 0.1% variance (14171 vs 14183).
 
Last edited:
Ok so its a 560/560D
Baffin/Polaris11/Polaris21 has 8 PCI-e lane just like the Navi14.

A couple of those should be tested in "an X570 environment" to see if they too behave the same way as 5500XT.

Depending on which slot in the board it's connected to, doesn't the board turn x16 into x8 when certain amount of devices are connected? IIRC, there are even instances where the boards turn x16 into x4.

There are scenarios where x16 capable GPUs are running @ x8 not because of a BIOS selection but because of the amount of devices hooked up to the system. With these in mind, such PCIe scaling tests ought to be performed, no? Isn't that the whole point of PCIe scaling tests to begin with?
Well those card will be Shader limit before memory limit.
 
What does this mean? If you run out of memory bad things will happen period. Makes low fps lower.
I mean those(except 5500XT) card will not have the power to process the texture that requers more than 4GB memory.
 
That would also depend on the game.
I can play something from 10y ago (wont cost much fps wise), but still use higher res like 1080p/1440p or even 4K, which requires more vram.
 
Last edited:
Question: would a regular x16 capable card work @ x8 (in bandwidth) in a PCIe 4.0 x4 slot (due to amount of connected devices) or @ x4?
I totally don't get the "(in bandwidth)" part in question.

Slot has 4 lanes -> the resulting connection cannot have more than 4 lanes.

No I'm talking about (electrically) converting PCIe 4.0 signal into 2x their equivalent PCIe 3.0 lanes on the fly (...)
I don't recall any hardware (at least in consumer segment) capable of changing lane widths after the POST is done.
Doing such thing may require the PCIe controller to reinitialize - that would mean that all the devices connected to it would have to be disabled and temporairly soft-disconnected.
Let's not forget that reallocationg lanes somewhere else would require physically re-routing the traces - and that is a hardware cost.
 
PCIe bandwidth IS important on VRAM limited scenarios.

Funny how literally no one had a clue about any of this up until a week ago but now it has become very important ? No, it hasn't really, it's a monumental waste of time to worry about PCIe bandwidth, the VRAM is what matters.

What do you use after VRAM is full? RAM.

This is improper, GPUs never use system RAM, it's simply too slow, we aren't talking a few FPS lost here and there, no, frames would take seconds to be processed.

The games simply swap memory in and out of the card's memory, it never gets directly accessed from across the PCIe lanes.
 
Funny how literally no one had a clue about any of this up until a week ago but now it has become very important ? No, it hasn't really, it's a monumental waste of time to worry about PCIe bandwidth, the VRAM is what matters.



This is improper, GPUs never use system RAM, it's simply too slow, we aren't talking a few FPS lost here and there, no, frames would take seconds to be processed.

The games simply swap memory in and out of the card's memory, it never gets directly accessed from across the PCIe lanes.
of course vram is the primary issue. But we've seen testing that shows in vram limited situations the bus helps. If the data was transferring internally and not using pcie and ram, we wouldnt see these performance increases in vram limited situations.

If it isnt using system ram and the pcie bus in some capacity, why does the testing show notable increases when using the bus with more bandwidth (where it didnt before even with a 5700 xt)?

It's also worth noting that integrated gpus use system ram and AMD's does a decent job at 1080p gaming with it.
 
Inb4 people forgets the GT1030 suffers the same problem.
1030 is a turd for gaming.
it's a good multi monitor card though,and for that the pci-lanes don't matter,
 
Funny how literally no one had a clue about any of this up until a week ago but now it has become very important ? No, it hasn't really, it's a monumental waste of time to worry about PCIe bandwidth, the VRAM is what matters.



This is improper, GPUs never use system RAM, it's simply too slow, we aren't talking a few FPS lost here and there, no, frames would take seconds to be processed.

The games simply swap memory in and out of the card's memory, it never gets directly accessed from across the PCIe lanes.
Chech what the shared GPU RAM is, you can look at it from the task manager or GPU-Z. Any GPU uses system RAM as if it were an IGP, as a swap. Always.
Users didn't know it, doesn't make it a lie, specially with provided evidence.
 
Last edited:
I would not like to get a $170 card and find out it's massively limited at 1080p while a $160 1650 super does just fine.
Not only is this card super late and badly priced,it's gonna give you unplayable experience at 1080p now,let alone in the future.
Like I was saying,there should be one 5500xt with 192-bit 6gb memory at $180 and it'd be a very strong contender to the lower Super lineup,but AMD proved themselves incompetent again.And that's not even touching their driver issues.:fear:

assassins-creed-odyssey-1920-1080.png
 
Any GPU uses system RAM as if it were an IGP, as a swap. Always.

That memory is for swapping contents of the GPU. There is no lie, you cannot access memory contents of GPUs without transferring the data first, it's simply not possible. No API allows for this, you can't get something like a pointer to VRAM memory from the host side of things (CPU and RAM). Be it a DirectX shader or a CUDA compute kernel they all operate on buffers that have been transferred first into memory and never accessed across the system bus. That would be unimaginably slow.

Say you need to go through 1 GB of data 1000 times, you have two options : get that memory first into VRAM and access it 1000 times at VRAM latency or access it 1000 times at several times the magnitude of VRAM latency across the PCIe connection from system memory. Which option do you think it's faster ?

Any GPU uses system RAM as if it were an IGP, as a swap. Always.

Definitely not always, only when it's needed.

If it isnt using system ram and the pcie bus in some capacity

It's using RAM in the sense that that's where all the buffers are kept, the buffers however are never accessed directly by the GPU, they have to be in it's VRAM before that's possible.

(where it didnt before even with a 5700 xt)?

The 5700XT has more memory and it also has more PCIe lanes. Everything else works in the same way.
 
Last edited:
I totally don't get the "(in bandwidth)" part in question.

Slot has 4 lanes -> the resulting connection cannot have more than 4 lanes.



I don't recall any hardware (at least in consumer segment) capable of changing lane widths after the POST is done.
Doing such thing may require the PCIe controller to reinitialize - that would mean that all the devices connected to it would have to be disabled and temporairly soft-disconnected.
Let's not forget that reallocationg lanes somewhere else would require physically re-routing the traces - and that is a hardware cost.
This was true with up to PCIe 3.0, but what about with PCIe 4.0 slots? Does a x4 card (electrically and or / for whatever other reason functioning @ x4) work @ x4 or @ x8 in a PCIe 4.0 slot?

This whole issue stems from the fact that x8 is working as x16 in a PCIe 4.0 slot: does the same apply to x4?
 
I would not like to get a $170 card and find out it's massively limited at 1080p while a $160 1650 super does just fine.
Not only is this card super late and badly priced,it's gonna give you unplayable experience at 1080p now,let alone in the future.
Like I was saying,there should be one 5500xt with 192-bit 6gb memory at $180 and it'd be a very strong contender to the lower Super lineup,but AMD proved themselves incompetent again.And that's not even touching their driver issues.:fear:

assassins-creed-odyssey-1920-1080.png
*cough* 8 GiB
assassins-creed-odyssey-1920-1080.png


The question is thus: Why do GeForce cards perform so much better with 4 GiB compared to AMD cards? Maybe it is because of x16 versus x8 (faster RAM access).
 
Because AMD is letting AIBs clear inventory of older stock (especially RX 590, RX 580, and RX 570). The 5500 cards (all of them) will come down in price after they're done.
 
That memory is for swapping contents of the GPU. There is no lie, you cannot access memory contents of GPUs without transferring the data first, it's simply not possible. No API allows for this, you can't get something like a pointer to VRAM memory from the host side of things (CPU and RAM). Be it a DirectX shader or a CUDA compute kernel they all operate on buffers that have been transferred first into memory and never accessed across the system bus. That would be unimaginably slow.

Say you need to go through 1 GB of data 1000 times, you have two options : get that memory first into VRAM and access it 1000 times at VRAM latency or access it 1000 times at several times the magnitude of VRAM latency across the PCIe connection from system memory. Which option do you think it's faster ?



Definitely not always, only when it's needed.



It's using RAM in the sense that that's where all the buffers are kept, the buffers however are never accessed directly by the GPU, they have to be in it's VRAM before that's possible.



The 5700XT has more memory and it also has more PCIe lanes. Everything else works in the same way.
You just confirmed it yourself, after VRAM is full, PCIe is the next bottleneck.
 
Back
Top