No. Single user inference with two workers doing partial offloading so I don't have to buy more RAM for R1.Are you talking about just single user inference with larger-ish models solely in VRAM across GPUs in different nodes?
Yes, I know. I'm looking into it.If so, llama.cpp with RPC should do the job already. Not much to look into IMO.