Friday, April 5th 2024

X-Silicon Startup Wants to Combine RISC-V CPU, GPU, and NPU in a Single Processor

While we are all used to having a system with a CPU, GPU, and, recently, NPU—X-Silicon Inc. (XSi), a startup founded by former Silicon Valley veterans—has unveiled an interesting RISC-V processor that can simultaneously handle CPU, GPU, and NPU workloads in a chip. This innovative chip architecture, which will be open-source, aims to provide a flexible and efficient solution for a wide range of applications, including artificial intelligence, virtual reality, automotive systems, and IoT devices. The new microprocessor combines a RISC-V CPU core with vector capabilities and GPU acceleration into a single chip, creating a versatile all-in-one processor. By integrating the functionality of a CPU and GPU into a single core, X-Silicon's design offers several advantages over traditional architectures. The chip utilizes the open-source RISC-V instruction set architecture (ISA) for both CPU and GPU operations, running a single instruction stream. This approach promises lower memory footprint execution and improved efficiency, as there is no need to copy data between separate CPU and GPU memory spaces.

Called the C-GPU architecture, X-Silicon uses RISC-V Vector Core, which has 16 32-bit FPUs and a Scaler ALU for processing regular integers as well as floating point instructions. A unified instruction decoder feeds the cores, which are connected to a thread scheduler, texture unit, rasterizer, clipping engine, neural engine, and pixel processors. All is fed into a frame buffer, which feeds the video engine for video output. The setup of the cores allows the users to program each core individually for HPC, AI, video, or graphics workloads. Without software, there is no usable chip, which prompts X-Silicon to work on OpenGL ES, Vulkan, Mesa, and OpenCL APIs. Additionally, the company plans to release a hardware abstraction layer (HAL) for direct chip programming. According to Jon Peddie Research (JPR), the industry has been seeking an open-standard GPU that is flexible and scalable enough to support various markets. X-Silicon's CPU/GPU hybrid chip aims to address this need by providing manufacturers with a single, open-chip design that can handle any desired workload. The XSi gave no timeline, but it has plans to distribute the IP to OEMs and hyperscalers, so the first silicon is still away.
Sources: Jon Peddie Research, X-Silicon, via Tom's Hardware
Add your own comment

31 Comments on X-Silicon Startup Wants to Combine RISC-V CPU, GPU, and NPU in a Single Processor

#26
Vya Domus
GuiltySparkIt is because of the internal "pipelining" of a single architecture
It's got nothing to do with pipelining, no matter how deep the pipeline is instructions still take the same amount of time to complete.
GuiltySparkNo one nowadays makes multiplications by recursively adding a single result in SW-like routines or microcode or whatever you are proposing, it would be too time consuming!
I don't know if that's how it's literally done, probably not but you cannot multiply two integers without carrying out multiple additions, that much is clear. Every integer op will block a processor's integer execution port until it's finished, whether it's a multiplication or addition and that's because they're all using the same addition circuitry, if there were was a separate block for multiplication like you're all convinced there is then addition and multiplication could be done in parallel on the same execution port but it doesn't work like that in any architecture.
Posted on Reply
#27
GuiltySpark
Vya DomusIt's got nothing to do with pipelining
I'm not talking about the processor pipeline to complete instructions, I'm referring to the practice of adding a register in the middle of a combinatorial circuit to break the critical path in two parts - it is named the same, so confusion could happen.
Vya DomusEvery integer op will block a processor's integer execution port until it's finished, whether it's a multiplication or addition and that's because they're all using the same addition circuitry
Also this is not precise, the reason behind what you are referring is because of the fact internally either you can compute one operation at a time, so you have only one port, or if you are on a "Tomasulo-like" architecture, you have to dispatch, and you share queues/entries/ports - whatever you wanna call them - but still they are not going to the same unit. There is no point in that, as I said, it would be too slow.

I'll make a practical reference, just consider this which is one of the simplest open-source architectures available.
Here is where the adder unit is defined, while this point to this and it is where the multiplier is defined.
And this architecture is targeting very very small devices, still there are 2 units for these two operations.
Moreover, that multiplier takes more than 1 cycle - actually 3 cycles - to perform the computation. Not because you need 3 cycles to do that - for a 32 bit multiplication you would need 32 cycles of additions - but because that is the tradeoff in speed they choose.
Posted on Reply
#28
xBruce88x
I have a feeling they're gonna be snapped up for crypto since they can do it all, or ai.
Posted on Reply
#29
mechtech
Needs more Processing Units.................
Posted on Reply
#30
user556
I suspect it's 16 threads, not cores. It's not for high-end applications.
Posted on Reply
#31
theouto
_FlareSuch a claim is disrespecting everyone who ever was or is in that business, as if those where plain stupid.
Investor scam warning!
Build it, show it running a 3D-game with at least 60 FPS, then trigger a press release again.
Big investors have no idea what a cpu even is, so I don't think they care. They just see big tech words, think of AI and put money.
Posted on Reply
Add your own comment
May 10th, 2024 00:24 EDT change timezone

New Forum Posts

Popular Reviews

Controversial News Posts