• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.
  • The forums have been upgraded with support for dark mode. By default it will follow the setting on your system/browser. You may override it by scrolling to the end of the page and clicking the gears icon.

How...Single PCI slot running multiple GPU's

Joined
Jul 16, 2022
Messages
679 (0.65/day)
While browsing through Instagram, I came across a company offering a computer equipped with an X670e AMD-based motherboard. This particular motherboard has only one 4.0 x 16 PCI slot, which poses a limitation on the number of GPUs that can be installed. However, the vendor has found a way to overcome this limitation by running up to 5 Nvidia A-GPUs on a single PCI slot.

The GPUs are connected to the motherboard via single risers, which are then plugged into a specialized type of PCI card capable of accepting all of these PCI cables. This unique setup enables the vendor to operate all the cards from a single PCI lane as if they were running on multiple PCI lanes.

This is a significant advantage, as the everyday AMD chips are not able to accommodate multiple GPUs due to the motherboard's single PCI 4.0 x 16 lane. If there is more than one GPU or lane occupied on the motherboard, the second PCI lane, when used in conjunction with the first, brings the PCI lanes to a dual 4.0 X 8 PCI lane configuration. Moreover, to achieve multiple GPUs at full 4.0x16 lane, processors with Xeon/Threadripper/Epyc chipsets are necessary.

I am intrigued by this setup and curious about the type of "PCI card" utilized to enable multiple GPUs to connect to a single PCI lane. And get full access and results on each GPU. As an end-user, this presents an opportunity to make the most of the PC without having to invest in processors costing over $2000. With AMD's best processor available at $800, multiple GPUs can be utilized to achieve optimal results, even with an ITX motherboard with only a single PCI lane.

How does this work?

I really appreciate any help you can provide.
 
Some motherboards used to have PCIe switches to enable crossfire/SLI when the CPU didn't have enough native PCIe lanes to support two x16 lane cards. Could this be a similar idea?
 
PCIe switches do that. One (quite extreme) example is a card that you can fit 21 SSDs on; each SSD uses 4 lanes. On that card there's a switch that can handle 100 lanes in total, 84 go to the SSDs and 16 are connected to the PCIe slot. The same thing can be done with graphics cards.

 
Some motherboards used to have PCIe switches to enable crossfire/SLI when the CPU didn't have enough native PCIe lanes to support two x16 lane cards. Could this be a similar idea?
Thank you for the response. I am not sure.
PCIe switches do that. One (quite extreme) example is a card that you can fit 21 SSDs on; each SSD uses 4 lanes. On that card there's a switch that can handle 100 lanes in total, 84 go to the SSDs and 16 are connected to the PCIe slot. The same thing can be done with graphics cards.

I comprehend your statement, and I concur with the sentiment expressed. It is regrettable that the card in question bears a steep cost despite its lack of preloaded content. I intend to incorporate two Asus Hyper M.2 X16 Gen 4 cards into my system, each housing SSDs. This configuration will furnish me with the benefits of RAID and personal storage space for the foreseeable future.
 
Thank you for the response. I am not sure.

I comprehend your statement, and I concur with the sentiment expressed. It is regrettable that the card in question bears a steep cost despite its lack of preloaded content. I intend to incorporate two Asus Hyper M.2 X16 Gen 4 cards into my system, each housing SSDs. This configuration will furnish me with the benefits of RAID and personal storage space for the foreseeable future.

Cards such as this do not include a PCIe switch. They are nothing but wires basically. Their operation depends entirely on the bifurcation (lane-splitting) ability of the processor, and on BIOS support for that.

If you have a Ryzen 7000 CPU, it can split its PCIe x16 bus into x8 + x8. This means it can connect to two graphics cards, or two SSDs on an adapter card, but not more. Further splitting into x8 + x4 + x4 or even x4 + x4 + x4 + x4 is questionable and poorly documented, but other people here may be able to tell you about specific cases when it works.

Two adapter cards can't possibly make 8 SSDs work in a Core or Ryzen system. There are simply too few lanes available in the system.

Here's an example of an adapter card with a PCIe switch (Asmedia brand) on it. But its slot connector is only PCIe 3.0 x8, and M.2 connectors are PCIe 3.0 x4 each. Yet it's costly, the price I see in Germany is 111 EUR.
 
Cards such as this do not include a PCIe switch. They are nothing but wires basically. Their operation depends entirely on the bifurcation (lane-splitting) ability of the processor, and on BIOS support for that.

If you have a Ryzen 7000 CPU, it can split its PCIe x16 bus into x8 + x8. This means it can connect to two graphics cards, or two SSDs on an adapter card, but not more. Further splitting into x8 + x4 + x4 or even x4 + x4 + x4 + x4 is questionable and poorly documented, but other people here may be able to tell you about specific cases when it works.

Two adapter cards can't possibly make 8 SSDs work in a Core or Ryzen system. There are simply too few lanes available in the system.

Here's an example of an adapter card with a PCIe switch (Asmedia brand) on it. But its slot connector is only PCIe 3.0 x8, and M.2 connectors are PCIe 3.0 x4 each. Yet it's costly, the price I see in Germany is 111 EUR.
Thank you. And I agree with the information you provided.

Here is a picture of the actual setup. I do understand how the PCI lane is shopped up. But something mathmattically is not adding up? How do you tehnically pull 4.0 X 80 out of the lane and chop the 80 / 5? Down the 4.0 X 16. I will admit, I like what the person is doing and how he using this smaller processor to there advantage. This keeps from the need of going big workstation CPU. This brings the bang for the buck down in the serious playing field.

1693688297109.png


1693688268200.png


1693688235660.png
 
Thank you. And I agree with the information you provided.

Here is a picture of the actual setup. I do understand how the PCI lane is shopped up. But something mathmattically is not adding up? How do you tehnically pull 4.0 X 80 out of the lane and chop the 80 / 5? Down the 4.0 X 16. I will admit, I like what the person is doing and how he using this smaller processor to there advantage. This keeps from the need of going big workstation CPU. This brings the bang for the buck down in the serious playing field.
The CPU can only communicate with one GPU at any given time. Well, maybe it can communicate with two GPUs at half speed because it can split its 16 lanes into 8 + 8, as I've mentioned before, but this doesn't make a difference. The total bandwidth is 32 GB/s (PCIe 4.0 x16), so every GPU gets 1/5 of that on average. A PCIe switch can't magically improve the bandwidth. But this "low" bandwidth may not be a limiting factor at all, it depends on application of course.
 
The CPU can only communicate with one GPU at any given time. Well, maybe it can communicate with two GPUs at half speed because it can split its 16 lanes into 8 + 8, as I've mentioned before, but this doesn't make a difference. The total bandwidth is 32 GB/s (PCIe 4.0 x16), so every GPU gets 1/5 of that on average. A PCIe switch can't magically improve the bandwidth. But this "low" bandwidth may not be a limiting factor at all, it depends on application of course.
Thank you - I agree with you. If this person is using this kind of power for rendering/workstation. Maybe the software how some way, is able to tap into all that power. Right now, this has my mind blown. I reached out to the engineers at PNY just to see how this works and I have not heard back yet.

IN my mental, you take an ITX board, place it into an M-ATX case, and insert 4 GPUs to access some nice power!
 
There are adapters to connect four GPU via usb3 to one x16 slot, I've had 6/12 GPU in pcs max in game was quadfire Polaris with two 460's for physx that pc nearly set on fire though.
 
There are adapters to connect four GPU via usb3 to one x16 slot, I've had 6/12 GPU in pcs max in game was quadfire Polaris with two 460's for physx that pc nearly set on fire though.

Thanks

How does this work? You expressed the cards are connecting from the PCI slot on the GPU (4 X 16) the cable coverts over to USB3. Then the PCI lane/card on the motherboard accepts the converted USB3.

Do you know what kind of card that is? You have had 6 to 12 GPU connected in this manner?
 
Thanks

How does this work? You expressed the cards are connecting from the PCI slot on the GPU (4 X 16) the cable coverts over to USB3. Then the PCI lane/card on the motherboard accepts the converted USB3.

Do you know what kind of card that is? You have had 6 to 12 GPU connected in this manner?
That version was only for mining on due to low bandwidth , quadfire plus was done we ith bulldozer on a crosshair V I think, it's dead now and I have had a mining board thAt had X1 x12 pciex slots.

Look on Alibaba , eBay and Amazon for such adapter's.
 
That version was only for mining on due to low bandwidth , quadfire plus was done we ith bulldozer on a crosshair V I think, it's dead now and I have had a mining board thAt had X1 x12 pciex slots.

Look on Alibaba , eBay and Amazon for such adapter's.
Thank you so much!
 
Several generations of both AMD and Intel processors support bifurcating the x16 slot into x4/x4/x4/x4 if you use something like that Asus M.2 adapter card. Some generations are limited to a x8/x4/x4 split.

What you need to get beyond that is a pcie switch. This will allow multiple x16 devices to share one x16 connection (for example). But they are only able to communicate one at a time, meaning this is useful in situations where throughput is not the highest priority.

With the same 100-lane switch Sabrent uses in that m.2 card you can make a 5-to-1 x16 slot adapter to have 5 GPUs connected to one x16 slot on the motherboard. With 4 lanes left unused. Which is probably what those 5 GPU images above are referring to. Apparently that 100-lane switch is rather expensive though, so you won't find any such adapters online for $50.

No idea what the limitations on number of devices on one pcie bus are. But in theory you could go nuts and make a x16 to 84 x1 slots with that switch.
 
Several generations of both AMD and Intel processors support bifurcating the x16 slot into x4/x4/x4/x4 if you use something like that Asus M.2 adapter card. Some generations are limited to a x8/x4/x4 split.

What you need to get beyond that is a pcie switch. This will allow multiple x16 devices to share one x16 connection (for example). But they are only able to communicate one at a time, meaning this is useful in situations where throughput is not the highest priority.

With the same 100-lane switch Sabrent uses in that m.2 card you can make a 5-to-1 x16 slot adapter to have 5 GPUs connected to one x16 slot on the motherboard. With 4 lanes left unused. Which is probably what those 5 GPU images above are referring to. Apparently that 100-lane switch is rather expensive though, so you won't find any such adapters online for $50.

No idea what the limitations on number of devices on one pcie bus are. But in theory you could go nuts and make a x16 to 84 x1 slots with that switch.

So are you expressing? Is there no technical way to step on the gas of 5 GPUs simultaneously and pass through the adaptor?
Also, I will have to check on the adaptors. These days, it's hard to tell who is making quality products. I would look for brands like SuperMicro or etc., because that is all I know.

Thanks
 
So are you expressing? Is there no technical way to step on the gas of 5 GPUs simultaneously and pass through the adaptor?
Also, I will have to check on the adaptors. These days, it's hard to tell who is making quality products. I would look for brands like SuperMicro or etc., because that is all I know.

Thanks
You are trying to drive 5 cars side by side at 100kph* through a one lane tunnel while still going 100kph. You will have 4 bad crashes. These 5 cars will have to stop and wait to go one at a time to get through safe (4 of them have to stop technically).

There is no physical way to run 5 x16 slots at full speed through one x16 slot. But a switch will let you do the second version above.

*Just a random number picked for simplicity
 
You are trying to drive 5 cars side by side at 100kph* through a one lane tunnel while still going 100kph. You will have 4 bad crashes. These 5 cars will have to stop and wait to go one at a time to get through safe (4 of them have to stop technically).

There is no physical way to run 5 x16 slots at full speed through one x16 slot. But a switch will let you do the second version above.

*Just a random number picked for simplicity
I do understand. I do understand I will need to research the 100-lane switch.
 
Several generations of both AMD and Intel processors support bifurcating the x16 slot into x4/x4/x4/x4 if you use something like that Asus M.2 adapter card. Some generations are limited to a x8/x4/x4 split.
It seems that motherboard makers were able to unlock these abilities, and Intel and AMD didn't care enough to stop them.

Officially, according to Ark, Intel CPUs could do 8+4+4 before Alder Lake and 8+8 afterwards. AMD CPUs can do 8+8.

@HBSound
This is a separate issue that doesn't really belong in this thread but: why do you need RAID? Capacity? Speed? Resilience against failure? You don't automatically get any of these, and as a general advice I'd say don't mess with it unless you know for certain you need it.
 
@HBSound
This is a separate issue that doesn't really belong in this thread but: why do you need RAID? Capacity? Speed? Resilience against failure? You don't automatically get any of these, and as a general advice I'd say don't mess with it unless you know for certain you need it.

Someone else inserting RAID using speaking about the multi-SSD card above. They used it as an example.
My whole purpose is to discover how someone can use 5 GPUs in a SINGLE PCI slot.
 
I do understand. I do understand I will need to research the 100-lane switch.
It is probably a custom ASIC or FPGA with allot of SERDES that is being used in bridge mode (Packet Switch) with custom software to control it and address the data to the cards. Their design uses a riser cable that connects to the motherboard and then to their backplane that has the ASIC bridge IC and the 5 PCIE slots. I will also note that the motherboard has PCIE gen 5, which is double the rate of gen 4 that the video cards are, so it gives them more bandwidth to allocate (Time Slice like Ethernet) to the 5 cards on the backplane.

Here is a link to a TI 4 channel PCIE chip that I used in the past that is doing the same thing but is gen 1.1 by x1. It should give you a better idea of how they are doing it.

TI XIO3130 4 Channel Bridge Chip (Packet Switch)
TI XIO3130 Users Guide
 
Last edited:
It is probably a custom ASIC or FPGA with allot of SERDES that is being used in bridge mode (Packet Switch) with custom software to control it and address the data to the cards.
Microchip and Broadcom make these switches. The chips, that is. Not retail products. Those by Microchip have up to 100 PCIe 4.0 lanes, they make 5.0 switches too but I haven't checked the details.
 
Microchip and Broadcom make these switches. The chips, that is. Not retail products. Those by Microchip have up to 100 PCIe 4.0 lanes, they make 5.0 switches too but I haven't checked the details.
I just looked at Microchip's site and they do have a PCIe Gen 5 Packet Switch that is large enough handle what they are doing. Broadcom bought PLX a few years back and has not done much with them, which is a shame since they were one of the more advanced PCI-PCIE bridge/switch manufactures.
 
I just looked at Microchip's site and they do have a PCIe Gen 5 Packet Switch that is large enough handle what they are doing. Broadcom bought PLX a few years back and has not done much with them, which is a shame since they were one of the more advanced PCI-PCIE bridge/switch manufactures.
Is this pointing in the right direction - https://www.microchip.com/en-us/product/PM50100

Thank you so much!
 
Is this pointing in the right direction - https://www.microchip.com/en-us/product/PM50100

Thank you so much!
Yes. Wow. I'm wondering if there's any use for this, at all, at the moment. One application I can think of are large enterprise network switches with many 112-gigabit SFP ports. But those probably have specialised chips inside, SPF lanes are not the same as PCIe lanes and would need some kind of conversion. Apart from that, there's the Nvidia H100 accelerator with a PCIe 5.0 x16 interface. But not much more.
 
Yes. Wow. I'm wondering if there's any use for this, at all, at the moment. One application I can think of are large enterprise network switches with many 112-gigabit SFP ports. But those probably have specialised chips inside, SPF lanes are not the same as PCIe lanes and would need some kind of conversion. Apart from that, there's the Nvidia H100 accelerator with a PCIe 5.0 x16 interface. But not much more.
I got you.

I will go to Microchip and get some direction. Maybe a PCI card is a little on the simpler side? And how do you take advantage of all that kind of technology?

The image above leads with an A6000 and 4-A4500 PCI cards. know the A6000 fills up a complete 4.0x16 when it comes to speed and performance. But I can not say the same for the A4500. I know the A4500 needs a x16 lane, but when it comes to starting performance. I do not know if the card brings that kind of power.
 
Someone else inserting RAID using speaking about the multi-SSD card above. They used it as an example.
Ah, that guy @Wirko again.

But no, that monster card is not a RAID controller. At least it's not advertised, the product page only mentions "Software or OS RAID Supported DataProtection", which is not hardware RAID. But that was not the point here. The card allows connection of many 4-lane M.2 PCIe devices to one 16-lane PCIe slot, that was the point. Devices include these:

1693866771779.png
1693866892231.png
1693866966621.png
1693867145563.png
1693867241083.png
These examples have a M.2 connector but it is equivalent to a PCIe x4 slot connector (or a longer x8 or x16 connector with only 4 lanes used).
My whole purpose is to discover how someone can use 5 GPUs in a SINGLE PCI slot.
That would be feasible (hey, the topmost example is a GPU!) but it would also be terribly expensive. Total bandwidth would be limited to where the bottleneck lies, which means the PCIe x16 slot.
 
Back
Top