# Multi Core PI @ LINPACK

#### Arctucas

It's going to take a very very long time to complete with 360.000 decimals. It will complete at one time, just leave the benchmark running. It's exponential complexity. For 10k decimals its takes in 0 sec, 800ms, for 20k decimals 2 sec 900ms... and for 80k decimals 54 sec. CPU: i5 3330 @ 3Ghz, 4 cores.
I got 28 seconds, 30 ms for 80K on the previous version.

What would you estimate my time should be for 360K on the new version?

I let it run for approximately 10 minutes with no result.

EDIT:

I feel rather sheepish, I should let it run a few more seconds rather than being impatient:

Last edited:

#### newtekie1

##### Semi-Retired Folder
I got 28 seconds, 30 ms for 80K on the previous version.

What would you estimate my time should be for 360K on the new version?

I let it run for approximately 10 minutes with no result.
Just by my quick math based on the times I'm getting as I increase, you're looking at over an hour to complete 360,000 decimal places.

#### Mydog

ok for a 24/7 summer OC

#### ovidiutabla

Just by my quick math based on the times I'm getting as I increase, you're looking at over an hour to complete 360,000 decimal places.
Something like that. Just leave the benchmark running...

Last edited:

#### Arctucas

So, is something wrong with result above?

#### Mydog

Tested 360.000 decimals with HT

#### newtekie1

##### Semi-Retired Folder
So, is something wrong with result above?
Apparently not because it to my x6 about 20 minutes to finish.

I guess it doesn't scale exactly exponentially like I thought.

#### Aquinus

##### Resident Wat-man
I did a couple tests with my 3820 and threw the results into an OpenOffice spreadsheet to make some graphs out of it. Enjoy if anyone cares.

It almost looks to me as if it completes in O(n log n) time as far as how many decimals per second get calculated on average for any given decimal length but the increasing number of elements is creating a linear increase in times, so it almost feels like something O(n + n log n) or O((n + n) log n) time if I were to take a guess. I'm not really up for getting more data and doing the math to confirm my hunch. That's also for just my 3820 with 4c/8t, I'm sure it scales differently on different hardware.

Last edited:

#### Aquinus

##### Resident Wat-man
I feel that I should also note that crunching will get my CPU up to 72-74*C but even for 360 decimals my CPU barely broke 62*C fully loaded with this. Just an observation because crunching for the same amount of time makes that much more heat despite both applications loading the CPU to 100%.

#### newtekie1

##### Semi-Retired Folder
Crunching likely uses more areas of the CPU, different instruction sets, better use of the cache, etc. because crunching is designed to be as efficient as possible. While this benchmark seem to be purposely inefficient to make the calculation take a lot longer than it should in order to get results that are more suited to a benchmark(several seconds instead of several ms).

Also, for the LOLs:

Last edited:

#### ovidiutabla

UI Update [logo with alpha channel]

#### ovidiutabla

Crunching likely uses more areas of the CPU, different instruction sets, better use of the cache, etc. because crunching is designed to be as efficient as possible. While this benchmark seem to be purposely inefficient to make the calculation take a lot longer than it should in order to get results that are more suited to a benchmark(several seconds instead of several ms).

Also, for the LOLs:

http://www.techpowerup.com/forums/attachment.php?attachmentid=51011&stc=1&d=1367677366
The benchmark is using a very complex formula to calculate decimals of PI.

Bailey–Borwein–Plouffe formula

The Bailey–Borwein–Plouffe formula (BBP formula) provides a spigot algorithm for the computation of the nth binary digit of pi (symbol: π) using base 16 math.

The formula can directly calculate the value of any given digit of π without the need to calculate the preceding digits.

The BBP is a summation-style formula that was discovered in 1995 by Simon Plouffe and was named after the authors of the paper in which the formula was published, David H. Bailey, Peter Borwein, and Simon Plouffe. Before that paper, it had been published by Plouffe on his own site.[1]
The formula is:

The algorithm is very complex, is slow, but i chose it because it's best suited for parallelization.

The whole ideea was to develop a perfect multithreaded benchmark that can make use of all the cores available, not to implement the fastest algorithm to calculate PI.

The BBP formula for π

The original BBP π summation formula was found in 1995 by Plouffe using PSLQ. It is also representable using the P function above:

which also reduces to this equivalent ratio of two polynomials:

y-cruncher is the first efficient and publicly available Pi-calculator that can sustain a near 100% cpu load on multi-core computers.

There are other multi-threaded Pi-programs that can achieve high cpu usage, but few of them can sustain it through an entire Pi computation.

Below is a typical CPU utilization graph of y-cruncher when computing 1 billion digits of Pi across 8 cores.

As of 2010, I am not aware of any Pi-program that achieves perfect parallelism for small computations and is at least half the speed of y-cruncher.
In 2013, meet Multi Core PI sire. Perfect parralelism for any number of decimals.

(It's easy to get perfect parallelism if you artificially make the task really slow.)
I did NOT artificially make the task really slow, in fact, I didn't made anything that slows down the algorithm.

Sure, the Multi Core PI algorithm was not optimized for speed but provide perfect parallelism and that was the whole ideea:

Last edited:

#### newtekie1

##### Semi-Retired Folder
Thanks for the explanation.

I wasn't knocking you, you achieved exactly what you set out to do and it makes a great benchmark.

#### ovidiutabla

Multi Core LINPACK Ultimate

Meet Multi Core LINPACK Ultimate!

A multithreaded CPU benchmark that performs numerical linear algebra. It makes use of the BLAS (Basic Linear Algebra Subprograms) libraries for performing basic vector and matrix operations.

The benchmark is written in C# / WPF [The User Interface], C++ [The Core Algorithm] and provide excellent parallelism.

How it works

Default setting for benchmark is a Matrix size of 4000. Just hit <Run benchmark> button to start benching your CPU.

Submit to HWBOT

First, press <Submit to HWBOT> button. A screenshot of the entire screen and a crypted XML datafile will be created. Attention! CPUZ must be running!

HWBOT

http://hwbot.org/benchmark/multi_core_linpack_ultimate/

Supported operating systems

Microsoft Windows XP / Server 2003
Microsoft Windows Vista / 7
Microsoft Windows 8 / Server 2012

Website

http://www.pcgamingxtreme.ro/multi-core-linpack-ultimate/

#### Feänor

##### New Member
Poor little g540...

#### Attachments

• 128.5 KB Views: 416

#### cheesy999

Poor little g540...
I suspect this benchmark might like Intel processors a little bit more than AMD. Unless I'm reading it wrong.

#### agent00skid

Seems to run fine on my AMD processor.

#### TRWOV

I suspect this benchmark might like Intel processors a little bit more than AMD. Unless I'm reading it wrong.

http://img.techpowerup.org/130519/benchmark.png
It does give unconsistent results, I give you that. agent00skid's A6-3500 APU gets better times than your unlocked X4 and it's a triple-core. I thought it might be related to instruction sets but the Phenom II and Llano support the same instructions.

Maybe memory bandwidth plays a role too?

edit: Maybe your X4 is throttling? Watch the CPU-Z readout while the benchmark is running.

BTW OP, can we have a logo? Seeing the dull standard EXE icon on the desktop isn't cool.

Last edited:

#### agent00skid

My N830 at 1,5 Ghz in my laptop took twice as long. So on my end, it's seems to scale appropriately.

#### cheesy999

Maybe memory bandwidth plays a role too?

BTW OP, can we have a logo? Seeing the dull standard EXE icon on the desktop isn't cool.
I'm on single channel, we should explore this.