• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.
  • The forums have been upgraded with support for dark mode. By default it will follow the setting on your system/browser. You may override it by scrolling to the end of the page and clicking the gears icon.

Flow Computing Parallel Processing Unit (PPU) Architecture Achieves End-to-End CPU Operations in Alpha Testing

Nomad76

News Editor
Staff member
Joined
May 21, 2024
Messages
1,366 (3.66/day)
Flow Computing, the pioneer in licensing on-die, ultra-high-performance parallel computing solutions to CPU vendors of all architectures - today announced it has successfully achieved a critical milestone in Flow's development roadmap towards commercializing its parallel processing ecosystem. Capable of increasing any CPU architecture by up to 100X, the company has been actively developing a compiler that enables source code to take advantage of its acclaimed Parallel Processing Unit (PPU) architecture - that compiler today entered Alpha testing.

Through the first target compilations, it has been determined that simple parallel workloads consist of a massive amount of loops in RISC-V CPU models without PPU assistance. Whereas in RISC-V CPU models incorporating the PPU, the amount of these loops is significantly reduced by recompiling the existing code, demonstrable proof that it is indeed possible to achieve a significant performance boost with a PPU-enhanced CPU design at up to 100X performance.



At its most elemental usage level, the compiler identifies parallel elements in the existing source code that can be effectively sped up by the PPU. Code is analyzed by the compiler to identify what parts can be enhanced by PPU acceleration - the compiler then assigns parallelizable functionality directly to the PPU, bypassing CPU bottlenecks to achieve improved performance over the baseline CPU architecture.

In summary, Flow has now achieved full end-to-end operation by being able to compile high-level programs into extended RISC-V binaries and executing them in its gem5-based simulator modeling PPU integrated into the RISC-V CPU system.

This critical milestone in the company's product development roadmap means the full commercialization of its Parallel Processing Unit (PPU) architecture is on schedule for the next critical milestone, which is the completion of the next set of PPU Performance Modeling over the coming months.

View at TechPowerUp Main Site | Source
 
This sounds awful lot like Intel Itanium and its magical compiler that wasn't able to deliver despite years of effort. Or like trying to apply GPU-ish design to classic general purpose code.
Claiming 100x increase due to parallelism seems a bit much, especially when there's Amdahl's law to consider.
FLOW PPU: Latency of memory references is hidden by executing other threads while accessing the memory. No coherency problems since no caches are placed in the front of the network. Scalability is provided via a high-bandwidth network-on-chip.
How is that different from superscalar execution and pipelining that's used by almost every current CPU design? I guess you don't have coherency to think about without any caches, but what about memory latency then?
FLOW PPU: Synchronizations are needed only once per step
since the threads are independent of each other
within a step
I see, the golden path is only for independent data. That narrows the use cases able to be accelerated.

An interesting idea, but a more complete whitepaper would be useful.
 
sounds like another startup project that will promise the world and ultimately go nowhere.
 
Back
Top