Multi Core PI @ LINPACK

Arctucas · May 2, 2013

t.phase said:
It's going to take a very very long time to complete with 360.000 decimals. It will complete at one time, just leave the benchmark running. It's exponential complexity. For 10k decimals its takes in 0 sec, 800ms, for 20k decimals 2 sec 900ms... and for 80k decimals 54 sec. CPU: i5 3330 @ 3Ghz, 4 cores.

I got 28 seconds, 30 ms for 80K on the previous version.

What would you estimate my time should be for 360K on the new version?

I let it run for approximately 10 minutes with no result.

EDIT:

I feel rather sheepish, I should let it run a few more seconds rather than being impatient:

newtekie1 · May 2, 2013

Arctucas said:
I got 28 seconds, 30 ms for 80K on the previous version.

What would you estimate my time should be for 360K on the new version?

I let it run for approximately 10 minutes with no result.

Just by my quick math based on the times I'm getting as I increase, you're looking at over an hour to complete 360,000 decimal places.

Mydog · May 2, 2013

Had to try this one

ok for a 24/7 summer OC :toast:

ovidiutabla · May 3, 2013

newtekie1 said:
Just by my quick math based on the times I'm getting as I increase, you're looking at over an hour to complete 360,000 decimal places.

Something like that. Just leave the benchmark running...

Arctucas · May 3, 2013

So, is something wrong with result above?

Mydog · May 3, 2013

Tested 360.000 decimals with HT

newtekie1 · May 3, 2013

Arctucas said:
So, is something wrong with result above?

Apparently not because it to my x6 about 20 minutes to finish.

I guess it doesn't scale exactly exponentially like I thought.

unclewebb · May 4, 2013

Thanks for the multi-threaded benchmark. :toast:

Aquinus · May 4, 2013

I did a couple tests with my 3820 and threw the results into an OpenOffice spreadsheet to make some graphs out of it. Enjoy if anyone cares.

It almost looks to me as if it completes in O(n log n) time as far as how many decimals per second get calculated on average for any given decimal length but the increasing number of elements is creating a linear increase in times, so it almost feels like something O(n + n log n) ~~or O((n + n) log n)~~ time if I were to take a guess. I'm not really up for getting more data and doing the math to confirm my hunch. That's also for just my 3820 with 4c/8t, I'm sure it scales differently on different hardware.

ovidiutabla · May 4, 2013

Very nice work sire.

Metro UI style:

Download link:

http://www.pcgamingxtreme.ro/forum/download/file.php?id=666

Aquinus · May 4, 2013

I feel that I should also note that crunching will get my CPU up to 72-74*C but even for 360 decimals my CPU barely broke 62*C fully loaded with this. Just an observation because crunching for the same amount of time makes that much more heat despite both applications loading the CPU to 100%.

newtekie1 · May 4, 2013

Crunching likely uses more areas of the CPU, different instruction sets, better use of the cache, etc. because crunching is designed to be as efficient as possible. While this benchmark seem to be purposely inefficient to make the calculation take a lot longer than it should in order to get results that are more suited to a benchmark(several seconds instead of several ms).

Also, for the LOLs:

ovidiutabla · May 6, 2013

UI Update [logo with alpha channel]

ovidiutabla · May 10, 2013

newtekie1 said:
Crunching likely uses more areas of the CPU, different instruction sets, better use of the cache, etc. because crunching is designed to be as efficient as possible. While this benchmark seem to be purposely inefficient to make the calculation take a lot longer than it should in order to get results that are more suited to a benchmark(several seconds instead of several ms).

Also, for the LOLs:

http://www.techpowerup.com/forums/attachment.php?attachmentid=51011&stc=1&d=1367677366

The benchmark is using a very complex formula to calculate decimals of PI.

Bailey–Borwein–Plouffe formula

The Bailey–Borwein–Plouffe formula (BBP formula) provides a spigot algorithm for the computation of the nth binary digit of pi (symbol: π) using base 16 math.

The formula can directly calculate the value of any given digit of π without the need to calculate the preceding digits.

The BBP is a summation-style formula that was discovered in 1995 by Simon Plouffe and was named after the authors of the paper in which the formula was published, David H. Bailey, Peter Borwein, and Simon Plouffe. Before that paper, it had been published by Plouffe on his own site.[1]

The formula is:

The algorithm is very complex, is slow, but i chose it because it's best suited for parallelization.

The whole ideea was to develop a perfect multithreaded benchmark that can make use of all the cores available, not to implement the fastest algorithm to calculate PI.

The BBP formula for π

The original BBP π summation formula was found in 1995 by Plouffe using PSLQ. It is also representable using the P function above:

which also reduces to this equivalent ratio of two polynomials:

y-cruncher is the first efficient and publicly available Pi-calculator that can sustain a near 100% cpu load on multi-core computers.

There are other multi-threaded Pi-programs that can achieve high cpu usage, but few of them can sustain it through an entire Pi computation.

Below is a typical CPU utilization graph of y-cruncher when computing 1 billion digits of Pi across 8 cores.

As of 2010, I am not aware of any Pi-program that achieves perfect parallelism for small computations and is at least half the speed of y-cruncher.

In 2013, meet Multi Core PI sire. Perfect parralelism for any number of decimals.

(It's easy to get perfect parallelism if you artificially make the task really slow.)

I did NOT artificially make the task really slow, in fact, I didn't made anything that slows down the algorithm.

Sure, the Multi Core PI algorithm was not optimized for speed but provide perfect parallelism and that was the whole ideea:

newtekie1 · May 10, 2013

Thanks for the explanation.

I wasn't knocking you, you achieved exactly what you set out to do and it makes a great benchmark.

ovidiutabla · May 16, 2013

Multi Core LINPACK Ultimate

Meet Multi Core LINPACK Ultimate!

A multithreaded CPU benchmark that performs numerical linear algebra. It makes use of the BLAS (Basic Linear Algebra Subprograms) libraries for performing basic vector and matrix operations.

The benchmark is written in C# / WPF [The User Interface], C++ [The Core Algorithm] and provide excellent parallelism.

How it works

Default setting for benchmark is a Matrix size of 4000. Just hit <Run benchmark> button to start benching your CPU.

Submit to HWBOT

First, press <Submit to HWBOT> button. A screenshot of the entire screen and a crypted XML datafile will be created. Attention! CPUZ must be running!
Second, follow the link provided on the dialog and submit your datafile to HWBOT.

HWBOT

http://hwbot.org/benchmark/multi_core_linpack_ultimate/

Supported operating systems

Microsoft Windows XP / Server 2003
Microsoft Windows Vista / 7
Microsoft Windows 8 / Server 2012

Website

http://www.pcgamingxtreme.ro/multi-core-linpack-ultimate/

Download Link

http://www.pcgamingxtreme.ro/forum/download/file.php?id=690

ovidiutabla · May 16, 2013

UI Update

Download link

http://www.pcgamingxtreme.ro/forum/download/file.php?id=690

Arctucas · May 16, 2013

Feänor · May 16, 2013

Poor little g540...

TRWOV · May 19, 2013

cheesy999 · May 19, 2013

Feanor said:
Poor little g540...

I suspect this benchmark might like Intel processors a little bit more than AMD. Unless I'm reading it wrong.

agent00skid · May 19, 2013

Seems to run fine on my AMD processor.

TRWOV · May 19, 2013

cheesy999 said:
I suspect this benchmark might like Intel processors a little bit more than AMD. Unless I'm reading it wrong.

http://img.techpowerup.org/130519/benchmark.png

It does give unconsistent results, I give you that. agent00skid's A6-3500 APU gets better times than your unlocked X4 and it's a triple-core. I thought it might be related to instruction sets but the Phenom II and Llano support the same instructions.

Maybe memory bandwidth plays a role too?

edit: Maybe your X4 is throttling? Watch the CPU-Z readout while the benchmark is running.

BTW OP, can we have a logo? Seeing the dull standard EXE icon on the desktop isn't cool.

agent00skid · May 19, 2013

My N830 at 1,5 Ghz in my laptop took twice as long. So on my end, it's seems to scale appropriately.

cheesy999 · May 20, 2013

TRWOV said:
Maybe memory bandwidth plays a role too?

BTW OP, can we have a logo? Seeing the dull standard EXE icon on the desktop isn't cool.

I'm on single channel, we should explore this.

System Name	It's just a computer
Processor	i9-14900K Direct Die
Motherboard	MSI Z790 ACE MAX
Cooling	4X D5T Vario, 2X HK Res, 3X Nemesis GTR560, NF-A14-iPPC3000PWM, NF-A14-iPPC2000PWM, IceMan DD
Memory	TEAMGROUP FFXD548G8000HC38EDC01 w/Alphacool Apex RAM X4 Water Cooler and Core DDR5-RAM Module
Video Card(s)	MSI Suprim SOC w/Alphacool Core Geforce RTX 5080 Suprim + Vanguard with Backplate
Storage	Samsung 990 PRO 1TB M.2
Display(s)	MSI 321URX
Case	Custom open frame chassis
Audio Device(s)	CREATIVE AE-9/Nakamichi Shockwafe Ultra 9.2.4
Power Supply	Seasonic Prime PX-1300
Mouse	Logitech MX700
Keyboard	Logitech LX700
Software	Win11PRO

Processor	Intel Core i7 10850K@5.2GHz
Motherboard	AsRock Z470 Taichi
Cooling	Corsair H115i Pro w/ Noctua NF-A14 Fans
Memory	32GB DDR4-3600
Video Card(s)	RTX 2070 Super
Storage	500GB SX8200 Pro + 8TB with 1TB SSD Cache
Display(s)	Acer Nitro VG280K 4K 28"
Case	Fractal Design Define S
Audio Device(s)	Onboard is good enough for me
Power Supply	eVGA SuperNOVA 1000w G3
Software	Windows 10 Pro x64

System Name	Gamer - Bencher
Processor	i7 5960X @5.1 GHz - load temps -5 C
Motherboard	Rampage V Extreme
Cooling	LD PC-V2 Phase Change - White XL Suction
Memory	16GB G.Skill Ripjaws 4 3200 MHz CL14-14-15-25 1t
Video Card(s)	Titan X SLI
Storage	2x 180GB Intel 330
Display(s)	Asus Swift PG278Q G-Sync
Case	Lian Li PC343B-XT
Audio Device(s)	Onboard
Power Supply	Antec 1200W TPQ + Corsair AX1200i
Software	Win 7 64 bits + Win 8.1 64 bit

System Name	It's just a computer
Processor	i9-14900K Direct Die
Motherboard	MSI Z790 ACE MAX
Cooling	4X D5T Vario, 2X HK Res, 3X Nemesis GTR560, NF-A14-iPPC3000PWM, NF-A14-iPPC2000PWM, IceMan DD
Memory	TEAMGROUP FFXD548G8000HC38EDC01 w/Alphacool Apex RAM X4 Water Cooler and Core DDR5-RAM Module
Video Card(s)	MSI Suprim SOC w/Alphacool Core Geforce RTX 5080 Suprim + Vanguard with Backplate
Storage	Samsung 990 PRO 1TB M.2
Display(s)	MSI 321URX
Case	Custom open frame chassis
Audio Device(s)	CREATIVE AE-9/Nakamichi Shockwafe Ultra 9.2.4
Power Supply	Seasonic Prime PX-1300
Mouse	Logitech MX700
Keyboard	Logitech LX700
Software	Win11PRO

System Name	Gamer - Bencher
Processor	i7 5960X @5.1 GHz - load temps -5 C
Motherboard	Rampage V Extreme
Cooling	LD PC-V2 Phase Change - White XL Suction
Memory	16GB G.Skill Ripjaws 4 3200 MHz CL14-14-15-25 1t
Video Card(s)	Titan X SLI
Storage	2x 180GB Intel 330
Display(s)	Asus Swift PG278Q G-Sync
Case	Lian Li PC343B-XT
Audio Device(s)	Onboard
Power Supply	Antec 1200W TPQ + Corsair AX1200i
Software	Win 7 64 bits + Win 8.1 64 bit

Multi Core PI @ LINPACK

Arctucas

newtekie1

Semi-Retired Folder

Mydog

ovidiutabla

Arctucas

Mydog

newtekie1

Semi-Retired Folder

unclewebb

ThrottleStop & RealTemp Author

Aquinus

Resident Wat-man

ovidiutabla

Aquinus

Resident Wat-man

newtekie1

Semi-Retired Folder

ovidiutabla

ovidiutabla

newtekie1

Semi-Retired Folder

ovidiutabla

ovidiutabla

Arctucas

Feänor

New Member

Attachments

TRWOV

cheesy999

agent00skid

TRWOV

agent00skid

cheesy999

System Name	Apollo
Processor	Intel Core i9 9880H
Motherboard	Some proprietary Apple thing.
Memory	64GB DDR4-2667
Video Card(s)	AMD Radeon Pro 5600M, 8GB HBM2
Storage	1TB Apple NVMe, 2TB external SSD, 4TB external HDD for backup.
Display(s)	32" Dell UHD, 27" LG UHD, 28" LG 5k
Case	MacBook Pro (16", 2019)
Audio Device(s)	AirPods Pro, AirPods Max
Power Supply	Display or Thunderbolt 4 Hub
Mouse	Logitech G502
Keyboard	Logitech G915, GL Clicky
Software	MacOS 15.5

System Name	Dell-y Driver
Processor	Core i5-10400
Motherboard	Asrock H410M-HVS
Cooling	Intel 95w stock cooler
Memory	2x8 A-DATA 2999Mhz DDR4
Video Card(s)	UHD 630
Storage	1TB WD Green M.2 - 4TB Seagate Barracuda
Display(s)	Asus PA248 1920x1200 IPS
Case	Dell Vostro 270S case
Audio Device(s)	Onboard
Power Supply	Dell 220w
Software	Windows 10 64bit

System Name	PC
Processor	AMD Ryzen 3600
Motherboard	MSI B450 Mortar Max
Cooling	Phanteks PH-TC12DX, 3 x NZXT FN 140mm, 1x NZXT FV V2 120mm
Memory	32gb DDR4 3200mhz
Video Card(s)	ASUS R9 290 DCII-OC 4GB
Storage	corsair mp600 1TB
Display(s)	LG 27MB85Z 27" 1440p
Case	NZXT Source 340
Power Supply	Thermaltake 675w
Mouse	Logitech G500S
Keyboard	Logitech G510S
Software	Windows 8.1 64 bit

System Name	Waterfall \| xe
Processor	Core i3-12100F \| i5-1240P
Motherboard	Gigabyte B660M-DS3H \| HP laptop
Cooling	Custom Watercooling \| Stock laptop
Memory	2*16GB \| 16GB DDR4
Video Card(s)	RX 5700 with WC blocks \| Iris Xe
Storage	Intel 660 1TB + Crucial BX100 500GB \| 1TB SSD
Display(s)	U24E850R+U2515H \| Internal 16"
Case	Be Quiet! Dark Base 900 \| Laptop
Audio Device(s)	Audio 2 DJ + Xenyx Q802USB \| Realtek
Power Supply	Seasonic Focus Plus Gold 750W \| 65W USB-C Power brick
Mouse	Logitech M330
Keyboard	Logitech G610 Orion Brown \| Laptop
Software	Gentoo + Windows 10 Pro \| Gentoo