• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

South Korean Company Morumi is Developing a CPU with Infinite Parallel Processing Scaling

TheLostSwede

News Editor
Joined
Nov 11, 2004
Messages
18,493 (2.47/day)
Location
Sweden
System Name Overlord Mk MLI
Processor AMD Ryzen 7 7800X3D
Motherboard Gigabyte X670E Aorus Master
Cooling Noctua NH-D15 SE with offsets
Memory 32GB Team T-Create Expert DDR5 6000 MHz @ CL30-34-34-68
Video Card(s) Gainward GeForce RTX 4080 Phantom GS
Storage 1TB Solidigm P44 Pro, 2 TB Corsair MP600 Pro, 2TB Kingston KC3000
Display(s) Acer XV272K LVbmiipruzx 4K@160Hz
Case Fractal Design Torrent Compact
Audio Device(s) Corsair Virtuoso SE
Power Supply be quiet! Pure Power 12 M 850 W
Mouse Logitech G502 Lightspeed
Keyboard Corsair K70 Max
Software Windows 10 Pro
Benchmark Scores https://valid.x86.fr/yfsd9w
One of the biggest drawbacks of modern CPUs is that adding more cores doesn't equal more performance in a linear fashion. Parallelism in CPUs offer limited scaling for most applications and even none for some. A South Korean company called Morumi is now taking a stab at solving this problem and wants to develop a CPU that can offer more or less infinite processing scaling, as more cores are added. The company has been around since 2018 and focused on various telecommunications chips, but has now started the development on what it calls every one period parallel processor (EOPPP) technology.

EOPPP is said to distribute data to each of the cores in a CPU before the data is being processed, which is said to be done over a type of mesh network inside the CPU. This is said to allow for an almost unlimited amount of instructions to be handled at once, if the CPU has enough cores. Morumi already has an early 32-core prototype running on an FPGA and in certain tasks the company has seen a tenfold performance increase. It should be noted that this requires software specifically compiled for EOPPP and Moumi is set to release version 1.0 of its compiler later this year. It's still early days, but it'll be interesting to see how this technology develops, but if it's successfully developed, there's also a high chance of Morumi being acquired by someone much bigger that wants to integrate the technology into their own products.



View at TechPowerUp Main Site | Source
 
Around since 2018 ! And in 2022 they're aiming for infinite scaling!

Sighhh

I quote " is said to distribute data to each of the cores in a CPU before the data is being processed, which is said to be done over a type of mesh network inside the CPU. This is said to allow for an almost unlimited amount of instructions to be handled at once, "

Hmnnn now is anyone else thinking wtaf is it me.

I thought that's how CPU work, distribute data, work on data , does an EORPPP use magical stuff wherein Intel use silicon.

What gives.

And have I mentioned that I am inventing a raycasting chip that can do infinite rays, it takes work in first then does work and through this simple change I WILL BEAT Nvidia, wait what.
 
Around since 2018 ! And in 2022 they're aiming for infinite scaling!

Sighhh

I quote " is said to distribute data to each of the cores in a CPU before the data is being processed, which is said to be done over a type of mesh network inside the CPU. This is said to allow for an almost unlimited amount of instructions to be handled at once, "

Hmnnn now is anyone else thinking wtaf is it me.

I thought that's how CPU work, distribute data, work on data , does an EORPPP use magical stuff wherein Intel use silicon.

What gives.

And have I mentioned that I am inventing a raycasting chip that can do infinite rays, it takes work in first then does work and through this simple change I WILL BEAT Nvidia, wait what.
I assume that the difference here is that the data is devided up in smaller chunks, so each processor core works on a chunk of data and the chunks are put back together at the end somewhere. To be honest, it's not entirely clear how it works and only so much info is available.

From the source link. Maybe I misunderstood something.
The pre-saved data are processed at once and the processed data are moved and saved in parallel on a mesh network. Using this saved result in the next period allows the sequential processing of this parallel data.
 
Basically computers are "stupid". Constantly and unnecessarily repeating the same calculations for different parts of the task. When it is easier to apply the result obtained from a single calculation everywhere the formula is the same. Instead of calculating a trillion times 2+1=3, in different queues, a single calculation is enough and the resulting value is embedded wherever it is needed.
 
I assume that the difference here is that the data is devided up in smaller chunks, so each processor core works on a chunk of data and the chunks are put back together at the end somewhere. To be honest, it's not entirely clear how it works and only so much info is available.

From the source link. Maybe I misunderstood something.
Oh right, sounds a bit mental even if vague, again that's what,!,, , two changes the work is split into chunks at start.
Worked on and,
Put back together at the end.

We have two versions of this in modern pcs already, this is exactly what a GPU does, unified processing across core's and has memory constraints since SRAM has stopped scaling and in general eats space, obviously a CPU does this on a limited small scale?!?.

But EORPPP needs to be specifically written for or compiled for and by the sound of it conceptually written For, ohh kk I mean Academia and enterprise might have a use but I think it limited, especially since we have massively parallel symptoms we already struggle to make work on general tasks and not enough tasks to warrant the financial input.

Well see , but Cerberus would also be saying yo What now.

Ps massively parallel systems :p made me laugh, it's staying, now where are those glasses. :):D
 
But EORPPP needs to be specifically written for or compiled for and by the sound of it conceptually written For, ohh kk I mean Academia and enterprise might have a use but I think it limited, especially since we have massively parallel symptoms we already struggle to make work on general tasks and not enough tasks to warrant the financial input.

A technology doesn't need to have consumer-oriented use cases to be interesting, IMO.
 
What you is missunderstand in this:
in certain tasks the company has seen a tenfold performance increase

Amdahl's law doesn't prevent 10x performance increase, or really, any arbitrary number increase, if sequential part is respectively small enough.
It's the "no performance limit" claim that is BS if there previously was one, as that would require sequential part to be nonexistent, i.e., program code being redesigned, and not just ran on another CPU.
 
Last edited:
A technology doesn't need to have consumer-oriented use cases to be interesting, IMO.
I agree I am Intrigued but the vague in this one is StRONG, ,, and what is said is so ANDDD?!?!?.
 
So it is not a law, since any exceptions can exist, so it is a theorem with a limited range of conditions for which it is valid.

No. It's valid for ANY code and any processor, you just seem to misunderstanding, where it applies.

Let's say you have a piece of code.

1) You run it on CPU A, say, 128 core Xeon, but 127 of those disabled. You get some performance numbers.

2) Now you enable all cores, run it again. You get 10x speedup.

3) Now have same code ran on different CPU B with only 1 core, 127 disabled. You get some other performance number.

4) Rerun that code on CPU B with 128 cores too. What would speedup vs scenario 3) be? x10 too

What would difference between CPU A and CPU B when ran single-to-single or multi-to-multi? That has nothing to do with Amdahl's law, but with how A and B architectures are optimized for this kind of task.

So what Amdahl's law states is that speedup between scenarios 1 vs 2 and 3 vs 4 is the same, because you keep same architecture, but add more cores. This is actual scope of Amdahl's law.

Scenarios 1 vs 3 and 2 vs 4 are not the scope of Amdahl's law.

Changing an architecture is a different scenario. It can make CPU B 10/100/1000x faster than CPU A core-to-core, but it cannot change that speedup from adding more cores will plateau proportionally as well. That max speedup is inherent property of specific code and not something to work around in CPU architecture.

Only way to make it scale without limit is to rewrite the code so that there is no sequential part and all threads are ran independent from each other.

Then you get infinite scaling with more cores on CPU A, but also CPU B, and any other CPU that can run this code.

IOW, there is nothing magical about described CPU that would make same code have infinite scaling, if it didn't have it already.
 
No. It's valid for ANY code and any processor, you just seem to misunderstanding, where it applies.

Let's say you have a piece of code.

1) You run it on CPU A, say, 128 core Xeon, but 127 of those disabled. You get some performance numbers.

2) Now you enable all cores, run it again. You get 10x speedup.

3) Now have same code ran on different CPU B with only 1 core, 127 disabled. You get some other performance number.

4) Rerun that code on CPU B with 128 cores too. What would speedup vs scenario 3) be? x10 too

What would difference between CPU A and CPU B when ran single-to-single or multi-to-multi? That has nothing to do with Amdahl's law, but with how A and B architectures are optimized for this kind of task.

So what Amdahl's law states is that speedup between scenarios 1 vs 2 and 3 vs 4 is the same, because you keep same architecture, but add more cores. This is actual scope of Amdahl's law.

Scenarios 1 vs 3 and 2 vs 4 are not the scope of Amdahl's law.

Changing an architecture is a different scenario. It can make CPU B 10/100/1000x faster than CPU A core-to-core, but it cannot change that speedup from adding more cores will plateau proportionally as well. That max speedup is inherent property of specific code and not something to work around in CPU architecture.

Only way to make it scale without limit is to rewrite the code so that there is no sequential part and all threads are ran independent from each other.

Then you get infinite scaling with more cores on CPU A, but also CPU B, and any other CPU that can run this code.

IOW, there is nothing magical about described CPU that would make same code have infinite scaling, if it didn't have it already.
Mathematical logic is not always correct. At one time we described geocentrism mathematically correctly with the "correct" formulas, then we looked and saw that the Earth is not the center around which everything else revolves.
 
Then please pinpoint, what observation exactly does contradict Amdahl's law here.

10x speedup from more cores is not it, as it's predicted by that law to be entirely possible.

Making same code scale infinitely with more cores, when it didn't on other processors? That's not an observation. That's a claim. Not a validated one by anything provided. Until it actually gets validated, Amdahl's law stands. And I have temerity to strongly doubt it would get validated, ever. Somewhere in the range of doubting perpetuum mobile existing.

For the record "10x speedup" is a claim too for all we know now, but an easily believable one, since:

1)It does not violate said law
2)Processor designing companies have been optimizing architectures for specific tasks for decades
 
Last edited:
"Infinite" is used to attract attention and investment. It is more than obvious that this is a PR word order. I don't know why you're even trying to rub in that part.
 
That's justifying clickbait headlines that are greatly exaggerated or, in this case, outright false.

I guess you won't be complaining about clickbait in headlines for the sake of being consistent then.
 
That's justifying clickbait headlines that are greatly exaggerated or, in this case, outright false.

I guess you won't be complaining about clickbait in headlines for the sake of being consistent then.
I wouldn't deny a title correction, but if the article was produced to its OP here and accordingly he has the rights to change the title. If it is only a translation, and the article is owned by another author, hardly anything can be done about it without his consent.
 
It is too early to express such an opinion. What if they succeed? Will you turn your opinion 180°?
 
Succeed in what exactly?

Creating a highly performing/efficient architecture for specific tasks? I hope they do lol.

Overturning Amdahl's law by making programs suddenly perfectly scale with more cores when it didn't on other processors? I have a bridge to sell you, if you honestly believe that.
 
Oh no. And I'm on the principle of touching first to make sure of something. Which at such an early stage cannot possibly happen. But I'm not in a hurry to dismiss things as impossible. Whoever does it should try harder. Arguing by citing a limit theorem does not work.
 
And guess what? People have been more than just "touching" this for DECADES at this point and yet this law holds. See wiki article posted above to find out when this law was first presented.

Same as, say, evolution. Oh, it's "just" a theory, right? Yet we have so much evidence for it, that we basically accept it at this point. Unless one's tinfoil hat is slipping that is ;)
 
I'm guessing not on Windows OS ;)
 
How would this help on single threaded apps/games?

What you described just sounds like an enhanced hyperthreading?
 
Back
Top