• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

Multi Core PI @ LINPACK

It's going to take a very very long time to complete with 360.000 decimals. It will complete at one time, just leave the benchmark running. It's exponential complexity. For 10k decimals its takes in 0 sec, 800ms, for 20k decimals 2 sec 900ms... and for 80k decimals 54 sec. CPU: i5 3330 @ 3Ghz, 4 cores.

I got 28 seconds, 30 ms for 80K on the previous version.

What would you estimate my time should be for 360K on the new version?

I let it run for approximately 10 minutes with no result.


EDIT:

I feel rather sheepish, I should let it run a few more seconds rather than being impatient:

 
Last edited:
I got 28 seconds, 30 ms for 80K on the previous version.

What would you estimate my time should be for 360K on the new version?

I let it run for approximately 10 minutes with no result.

Just by my quick math based on the times I'm getting as I increase, you're looking at over an hour to complete 360,000 decimal places.
 
MultiCorePIScreenShot.jpg


Had to try this one :)

ok for a 24/7 summer OC :toast:
 
Just by my quick math based on the times I'm getting as I increase, you're looking at over an hour to complete 360,000 decimal places.

Something like that. Just leave the benchmark running...
 
Last edited:
So, is something wrong with result above?
 
Tested 360.000 decimals with HT

mcpi.jpg
 
So, is something wrong with result above?

Apparently not because it to my x6 about 20 minutes to finish.

I guess it doesn't scale exactly exponentially like I thought.
 
I did a couple tests with my 3820 and threw the results into an OpenOffice spreadsheet to make some graphs out of it. Enjoy if anyone cares. :)

It almost looks to me as if it completes in O(n log n) time as far as how many decimals per second get calculated on average for any given decimal length but the increasing number of elements is creating a linear increase in times, so it almost feels like something O(n + n log n) or O((n + n) log n) time if I were to take a guess. I'm not really up for getting more data and doing the math to confirm my hunch. That's also for just my 3820 with 4c/8t, I'm sure it scales differently on different hardware.
pi_per_second_avg.PNG

pi_time_to_calc.PNG
 
Last edited:
I feel that I should also note that crunching will get my CPU up to 72-74*C but even for 360 decimals my CPU barely broke 62*C fully loaded with this. Just an observation because crunching for the same amount of time makes that much more heat despite both applications loading the CPU to 100%.
 
Crunching likely uses more areas of the CPU, different instruction sets, better use of the cache, etc. because crunching is designed to be as efficient as possible. While this benchmark seem to be purposely inefficient to make the calculation take a lot longer than it should in order to get results that are more suited to a benchmark(several seconds instead of several ms).

Also, for the LOLs:

MultiCorePIScreenShot.jpg
 
Last edited:
Crunching likely uses more areas of the CPU, different instruction sets, better use of the cache, etc. because crunching is designed to be as efficient as possible. While this benchmark seem to be purposely inefficient to make the calculation take a lot longer than it should in order to get results that are more suited to a benchmark(several seconds instead of several ms).

Also, for the LOLs:

http://www.techpowerup.com/forums/attachment.php?attachmentid=51011&stc=1&d=1367677366

The benchmark is using a very complex formula to calculate decimals of PI.

Bailey–Borwein–Plouffe formula

The Bailey–Borwein–Plouffe formula (BBP formula) provides a spigot algorithm for the computation of the nth binary digit of pi (symbol: π) using base 16 math.

The formula can directly calculate the value of any given digit of π without the need to calculate the preceding digits.

The BBP is a summation-style formula that was discovered in 1995 by Simon Plouffe and was named after the authors of the paper in which the formula was published, David H. Bailey, Peter Borwein, and Simon Plouffe. Before that paper, it had been published by Plouffe on his own site.[1]

The formula is:

48f7653d58f4ad747327d271ed789415.png


The algorithm is very complex, is slow, but i chose it because it's best suited for parallelization.

The whole ideea was to develop a perfect multithreaded benchmark that can make use of all the cores available, not to implement the fastest algorithm to calculate PI.

The BBP formula for π

The original BBP π summation formula was found in 1995 by Plouffe using PSLQ. It is also representable using the P function above:

b4b400477eeb2ca588dc9ee01a414976.png


which also reduces to this equivalent ratio of two polynomials:

b714ec042c2b6a0c96741f69c46c5bc0.png

y-cruncher is the first efficient and publicly available Pi-calculator that can sustain a near 100% cpu load on multi-core computers.

There are other multi-threaded Pi-programs that can achieve high cpu usage, but few of them can sustain it through an entire Pi computation.

Below is a typical CPU utilization graph of y-cruncher when computing 1 billion digits of Pi across 8 cores.

cpu_usage.jpg

As of 2010, I am not aware of any Pi-program that achieves perfect parallelism for small computations and is at least half the speed of y-cruncher.

In 2013, meet Multi Core PI sire. Perfect parralelism for any number of decimals.

(It's easy to get perfect parallelism if you artificially make the task really slow.)

I did NOT artificially make the task really slow, in fact, I didn't made anything that slows down the algorithm.

Sure, the Multi Core PI algorithm was not optimized for speed but provide perfect parallelism and that was the whole ideea:

cpu_usage.png
 
Last edited:
Thanks for the explanation.

I wasn't knocking you, you achieved exactly what you set out to do and it makes a great benchmark.
 
Multi Core LINPACK Ultimate

Meet Multi Core LINPACK Ultimate!

A multithreaded CPU benchmark that performs numerical linear algebra. It makes use of the BLAS (Basic Linear Algebra Subprograms) libraries for performing basic vector and matrix operations.

The benchmark is written in C# / WPF [The User Interface], C++ [The Core Algorithm] and provide excellent parallelism.

Multi_Core_LINPACKUltimate.png


How it works

Default setting for benchmark is a Matrix size of 4000. Just hit <Run benchmark> button to start benching your CPU.

Submit to HWBOT

First, press <Submit to HWBOT> button. A screenshot of the entire screen and a crypted XML datafile will be created. Attention! CPUZ must be running!
Second, follow the link provided on the dialog and submit your datafile to HWBOT.

HWBOT

http://hwbot.org/benchmark/multi_core_linpack_ultimate/

Supported operating systems

Microsoft Windows XP / Server 2003
Microsoft Windows Vista / 7
Microsoft Windows 8 / Server 2012

Website

http://www.pcgamingxtreme.ro/multi-core-linpack-ultimate/

Download Link

http://www.pcgamingxtreme.ro/forum/download/file.php?id=690
 
Poor little g540...
 

Attachments

  • Screenshot.png
    Screenshot.png
    128.5 KB · Views: 527
Poor little g540...

I suspect this benchmark might like Intel processors a little bit more than AMD. Unless I'm reading it wrong.

benchmark.png
 
Seems to run fine on my AMD processor.

Capture042.jpg
 
I suspect this benchmark might like Intel processors a little bit more than AMD. Unless I'm reading it wrong.

http://img.techpowerup.org/130519/benchmark.png

It does give unconsistent results, I give you that. agent00skid's A6-3500 APU gets better times than your unlocked X4 and it's a triple-core. I thought it might be related to instruction sets but the Phenom II and Llano support the same instructions.

Maybe memory bandwidth plays a role too?


edit: Maybe your X4 is throttling? Watch the CPU-Z readout while the benchmark is running.

BTW OP, can we have a logo? Seeing the dull standard EXE icon on the desktop isn't cool.
 
Last edited:
My N830 at 1,5 Ghz in my laptop took twice as long. So on my end, it's seems to scale appropriately.
 
Maybe memory bandwidth plays a role too?

BTW OP, can we have a logo? Seeing the dull standard EXE icon on the desktop isn't cool.

I'm on single channel, we should explore this.
 
Back
Top