Drawing poop emojii with a broken goose feather held in your left foot is also interesting and novel to some people, doesn't make it any more useful or practical.
This was just an old amateur concept that hasn't been updated in several years (for a good reason). The main issue is that it still uses RAM for data exchange, so basically it works like a conventional RAM disk that needs slightly less memory space, but uses GPU as temporary storage. Adding several more steps to read/write process only makes it drastically slower than RAM disk (to the point where GDDR5 is slower than NVME). Looked through that code and even though I haven't touched CUDA or even C++ in years, I can already see some issues.
I'm sure there are much better and efficient ways to make this work, but I still don't see any reasons to do so... Heck, NVME has already saturated PCIe 3.0 bandwidth, and PCIe 4.0 isn't even at the full swing yet. Regardless of how fast GDDR5/6/7... or HBM is on paper, it's only gonna be that fast from the perspective of the GPU. For the rest of the system it's gonna be only as fast as PCIe and a shitton of abstraction layers will allow it to be. Basically, what I'm trying to say is that you can't make it faster than NVME RAID, even less so - RAM disk. That's why AMD stuck their guns to hybrid solutions, like Radeon Pro SSG. At least for now this approach makes a bit more sense, when you actually need to have "storage" on GPU.