This will not work the way you're expecting it to. It is not feasible to just expect the VRAM to add up and here is why. Currently GPUs use alternate frame rendering to render the same scene. Because they are rendering the same scene, it makes sense that each GPU would require access to all the same assets. Currently SLI and Crossfire use their bridges (or XDMA in the case of AMDs bridgeless Crossfire) to transfer framebuffers back and forth either for final output or to supplement additional pixel info for temporal shading effects. There is no feasible way from an engineering standpoint for GPUs to cross access eachother's memory. Essentially you would have to build a bridge that crosses over and merges both memory busses, handles handshaking, the cost/benefit ratio and/or diminishing return from added overhead makes it impractical.
For this reason, data is duplicated instead of cross accessed.
Now Mantle (and DX12) lets the developer explicitly control what gets loaded into memory of each GPU independently, however each GPU is still limited to rendering what it has the assets for. When you're rendering the same scene, this isn't very useful from an alternate frame rendering standpoint, you're rendering the same scene, they still need all the same assets.
Now, if you could somehow isolate certain objects from the scene and have the GPUs render those independently and then kind of superimpose their pixel data into a final frame, it could maybe work that way. The developer would have to ensure that overhead of such a procedure doesn't exceed the benefits of increasing rendering throughput. Does that make sense? For example, have one GPU draw the environment, have the second GPU draw the objects in the environment, characters, buildings etc... There is overhead when one part of the image relies on pixel data from the other. Example, perhaps a developer is running a sharpening filter, do they run it all at once after the final frame is composed, or do they transfer framebuffers on the fly? There is some optimization there and I think it would certainly be cool, but as far as I know, no game engines have this kind of functionality yet, and it's definitely not something that will start working with the flick of a switch.
Now, where I think this is a really cool idea is imagine you're playing an open world game and it's split screen with a friend. Traditionally with this kind of scenario the two characters would not be allowed out of the same general area of eachother, limited by how many assets can be stored and managed. However, if you could allocate an independent GPU for each person, you'd both be able to roam the world as far away form eachother as you want given your CPU and RAM are allowing.
This has some super cool applications! Imagine a computer with 4 GPUs in it, you could have each one drive a separate monitor and four people could participate in the same game all playing on the same machine! FPS games with no network latency, open world games with no proximity limits. I have two monitors and an SLI setup, it would be freakin' rad if I could play GTA V with a friend and we each had our own monitor, just plug in a second controller and go!
I really hope developers are thinking of these kinds of possibilities.
Why are they going to need more bandwidth than crossfire or SLI currently uses? Right now RAM is duplicated between them, which is a lot of unneccesary traffic. Cut that out, and things should get easier and not harder.
If you can run a single card on a slot without performance issues, then running it as part of crossfire or SLI will be exactly the same.
VRAM is duplicated between cards, but it's not done over the SLI/Crossfire bridge, it's all loaded over PCIe, just send both cards the same data. Framebuffers are sent across the SLI bridge for output or sequential frame dependent image operations. The amount of bandwidth on an SLI/Crossfire bridge would be wholly inadequate for high speed memory operations on the scale of VRAM operation.