• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

Multi Core PI @ LINPACK

Joined
Feb 21, 2008
Messages
40 (0.01/day)
I developed a multithreaded CPU benchmark that calculates PI decimals using Bailey–Borwein–Plouffe formula. The benchmark is using a multithreaded algorithm written in C++ and provide excellent parallelism. Multi Core PI is written in Visual C++ using MFC and Win32API.

How it works

A slider will help you set the decimals of PI, from 10.000 to 100.000. Default is 80.000. Just hit Run benchmark button to start benching your CPU.

Submit to HWBOT

First, press Take Screenshot button. A screenshot and a XML datafile will be created. Attention! CPUZ must be running!
Second, follow the link provided on the dialog and submit your datafile to HWBOT.

Supported operating systems

Microsoft Windows XP / Server 2003
Microsoft Windows Vista / 7
Microsoft Windows 8 / Server 2012

Download link

http://www.pcgamingxtreme.ro/
 
Last edited:

HammerON

The Watchful Moderator
Staff member
Joined
Mar 2, 2009
Messages
8,397 (1.53/day)
Location
Up North
System Name Threadripper
Processor 3960X
Motherboard ASUS ROG Strix TRX40-XE
Cooling XSPC Raystorm Neo (sTR4) Water Block
Memory G. Skill Trident Z Neo 64 GB 3600
Video Card(s) PNY RTX 4090
Storage Samsung 960 Pro 512 GB + WD Black SN850 1TB
Display(s) Dell 32" Curved Gaming Monitor (S3220DGF)
Case Corsair 5000D Airflow
Audio Device(s) On-board
Power Supply EVGA SuperNOVA 1000 G5
Mouse Roccat Kone Pure
Keyboard Corsair K70
Software Win 10 Pro
Benchmark Scores Always changing~
My results:


100% 12 thread utilization:)
 
Last edited:

Aquinus

Resident Wat-man
Joined
Jan 28, 2012
Messages
13,147 (2.97/day)
Location
Concord, NH, USA
System Name Apollo
Processor Intel Core i9 9880H
Motherboard Some proprietary Apple thing.
Memory 64GB DDR4-2667
Video Card(s) AMD Radeon Pro 5600M, 8GB HBM2
Storage 1TB Apple NVMe, 4TB External
Display(s) Laptop @ 3072x1920 + 2x LG 5k Ultrafine TB3 displays
Case MacBook Pro (16", 2019)
Audio Device(s) AirPods Pro, Sennheiser HD 380s w/ FIIO Alpen 2, or Logitech 2.1 Speakers
Power Supply 96w Power Adapter
Mouse Logitech MX Master 3
Keyboard Logitech G915, GL Clicky
Software MacOS 12.1
100% 12 thread utilization

:wtf: I've explained this already multiple times and people seem too ignorant to listen and you're the last person I should need to explain this to.

Disable hyper-threading and run it again, please. :)
 

HammerON

The Watchful Moderator
Staff member
Joined
Mar 2, 2009
Messages
8,397 (1.53/day)
Location
Up North
System Name Threadripper
Processor 3960X
Motherboard ASUS ROG Strix TRX40-XE
Cooling XSPC Raystorm Neo (sTR4) Water Block
Memory G. Skill Trident Z Neo 64 GB 3600
Video Card(s) PNY RTX 4090
Storage Samsung 960 Pro 512 GB + WD Black SN850 1TB
Display(s) Dell 32" Curved Gaming Monitor (S3220DGF)
Case Corsair 5000D Airflow
Audio Device(s) On-board
Power Supply EVGA SuperNOVA 1000 G5
Mouse Roccat Kone Pure
Keyboard Corsair K70
Software Win 10 Pro
Benchmark Scores Always changing~


Wow - that was amazing:(
My time was increased by almost 100%.... What else would I expect when disabling HT???
 

Aquinus

Resident Wat-man
Joined
Jan 28, 2012
Messages
13,147 (2.97/day)
Location
Concord, NH, USA
System Name Apollo
Processor Intel Core i9 9880H
Motherboard Some proprietary Apple thing.
Memory 64GB DDR4-2667
Video Card(s) AMD Radeon Pro 5600M, 8GB HBM2
Storage 1TB Apple NVMe, 4TB External
Display(s) Laptop @ 3072x1920 + 2x LG 5k Ultrafine TB3 displays
Case MacBook Pro (16", 2019)
Audio Device(s) AirPods Pro, Sennheiser HD 380s w/ FIIO Alpen 2, or Logitech 2.1 Speakers
Power Supply 96w Power Adapter
Mouse Logitech MX Master 3
Keyboard Logitech G915, GL Clicky
Software MacOS 12.1
http://img.techpowerup.org/130210/Capture113.jpg

Wow - that was amazing:(
My time was increased by almost 100%.... What else would I expect when disabling HT???

Well that confuses me even more. I disable HT on mine and my score goes from 18.5 to 19. :|

HT should never result in 100% improvement. There aren't the resources available to let it scale like that. That should be more like a 15-30% drop in performance on average.

Edit: I lied that was Multi Core PRIME not MC PI, they look exactly the same sans the formula so I didn't notice it off the bat. My skepticism from PRIME worked its way over here. Either way I disabled HT and now it runs slower by about 60%. That's a bit more normal. I'm less skeptical about this benchmark and more about the prime one (unless your storing the output in a float or a double and not a fixed point number, in that case the computer is chugging for nothing). Since floating point numbers are not exact and as you go more decimals in, the precision of further decimals decreases.

4c w/ HT:
p8t.jpg


4c w/o HT:
p4t.jpg


Once again is the output being verified? Can you do multiple runs per benchmark to make sure that every runs results are consistent and once again, I would like output so I can verify the benchmarks results so I can put my skepticism at ease. As it stands, something is happening on my rig and I don't know what it is or if it is right.
 
Last edited:
Joined
Sep 2, 2011
Messages
1,019 (0.22/day)
Location
Porto
System Name No name / Purple Haze
Processor Phenom II 1100T @ 3.8Ghz / Pentium 4 3.4 EE Gallatin @ 3.825Ghz
Motherboard MSI 970 Gaming/ Abit IC7-MAX3
Cooling CM Hyper 212X / Scythe Andy Samurai Master (CPU) - Modded Ati Silencer 5 rev. 2 (GPU)
Memory 8GB GEIL GB38GB2133C10ADC + 8GB G.Skill F3-14900CL9-4GBXL / 2x1GB Crucial Ballistix Tracer PC4000
Video Card(s) Asus R9 Fury X Strix (4096 SP's/1050 Mhz)/ PowerColor X850XT PE @ (600/1230) AGP + (HD3850 AGP)
Storage Samsung 250 GB / WD Caviar 160GB
Display(s) Benq XL2411T
Audio Device(s) motherboard / Creative Sound Blaster X-Fi XtremeGamer Fatal1ty Pro + Front panel
Power Supply Tagan BZ 900W / Corsair HX620w
Mouse Zowie AM
Keyboard Qpad MK-50
Software Windows 7 Pro 64Bit / Windows XP
Benchmark Scores 64CU Fury: http://www.3dmark.com/fs/11269229 / X850XT PE http://www.3dmark.com/3dm05/5532432
My Phenom II x6 is slow :wtf:
 

Attachments

  • pHiI.png
    pHiI.png
    150.4 KB · Views: 670
Last edited:
Joined
Jun 17, 2007
Messages
7,335 (1.20/day)
Location
C:\Program Files (x86)\Aphexdreamer\
System Name Unknown
Processor AMD Bulldozer FX8320 @ 4.4Ghz
Motherboard Asus Crosshair V
Cooling XSPC Raystorm 750 EX240 for CPU
Memory 8 GB CORSAIR Vengeance Red DDR3 RAM 1922mhz (10-11-9-27)
Video Card(s) XFX R9 290
Storage Samsung SSD 254GB and Western Digital Caviar Black 1TB 64MB Cache SATA 6.0Gb/s
Display(s) AOC 23" @ 1920x1080 + Asus 27" 1440p
Case HAF X
Audio Device(s) X Fi Titanium 5.1 Surround Sound
Power Supply 750 Watt PP&C Silencer Black
Software Windows 8.1 Pro 64-bit

Aquinus

Resident Wat-man
Joined
Jan 28, 2012
Messages
13,147 (2.97/day)
Location
Concord, NH, USA
System Name Apollo
Processor Intel Core i9 9880H
Motherboard Some proprietary Apple thing.
Memory 64GB DDR4-2667
Video Card(s) AMD Radeon Pro 5600M, 8GB HBM2
Storage 1TB Apple NVMe, 4TB External
Display(s) Laptop @ 3072x1920 + 2x LG 5k Ultrafine TB3 displays
Case MacBook Pro (16", 2019)
Audio Device(s) AirPods Pro, Sennheiser HD 380s w/ FIIO Alpen 2, or Logitech 2.1 Speakers
Power Supply 96w Power Adapter
Mouse Logitech MX Master 3
Keyboard Logitech G915, GL Clicky
Software MacOS 12.1

It's because the FPU is getting used for this benchmark. Keep in mind that each module only has one FPU so without FMA3 optimizations you're only going to see 3-cores worth of performance out of it. However if this used fixed point instead of floating point, this could use the integer cores which are faster in general and performances significantly better on AMD's newer processors. Fixed point also offers a higher level of precision, floating point is inaccurate because of how it converts decimals to and from base 2 integers.
 
Joined
Jun 17, 2007
Messages
7,335 (1.20/day)
Location
C:\Program Files (x86)\Aphexdreamer\
System Name Unknown
Processor AMD Bulldozer FX8320 @ 4.4Ghz
Motherboard Asus Crosshair V
Cooling XSPC Raystorm 750 EX240 for CPU
Memory 8 GB CORSAIR Vengeance Red DDR3 RAM 1922mhz (10-11-9-27)
Video Card(s) XFX R9 290
Storage Samsung SSD 254GB and Western Digital Caviar Black 1TB 64MB Cache SATA 6.0Gb/s
Display(s) AOC 23" @ 1920x1080 + Asus 27" 1440p
Case HAF X
Audio Device(s) X Fi Titanium 5.1 Surround Sound
Power Supply 750 Watt PP&C Silencer Black
Software Windows 8.1 Pro 64-bit
It's because the FPU is getting used for this benchmark. Keep in mind that each module only has one FPU so without FMA3 optimizations you're only going to see 3-cores worth of performance out of it. However if this used fixed point instead of floating point, this could use the integer cores which are faster in general and performances significantly better on AMD's newer processors. Fixed point also offers a higher level of precision, floating point is inaccurate because of how it converts decimals to and from base 2 integers.

Which is why I had asked him if he would/could make a more FX optimized benchmark but he said it is FX optimized as it was coded with an FX processor. http://www.techpowerup.com/forums/showpost.php?p=2842045&postcount=68
 
Joined
Sep 2, 2011
Messages
1,019 (0.22/day)
Location
Porto
System Name No name / Purple Haze
Processor Phenom II 1100T @ 3.8Ghz / Pentium 4 3.4 EE Gallatin @ 3.825Ghz
Motherboard MSI 970 Gaming/ Abit IC7-MAX3
Cooling CM Hyper 212X / Scythe Andy Samurai Master (CPU) - Modded Ati Silencer 5 rev. 2 (GPU)
Memory 8GB GEIL GB38GB2133C10ADC + 8GB G.Skill F3-14900CL9-4GBXL / 2x1GB Crucial Ballistix Tracer PC4000
Video Card(s) Asus R9 Fury X Strix (4096 SP's/1050 Mhz)/ PowerColor X850XT PE @ (600/1230) AGP + (HD3850 AGP)
Storage Samsung 250 GB / WD Caviar 160GB
Display(s) Benq XL2411T
Audio Device(s) motherboard / Creative Sound Blaster X-Fi XtremeGamer Fatal1ty Pro + Front panel
Power Supply Tagan BZ 900W / Corsair HX620w
Mouse Zowie AM
Keyboard Qpad MK-50
Software Windows 7 Pro 64Bit / Windows XP
Benchmark Scores 64CU Fury: http://www.3dmark.com/fs/11269229 / X850XT PE http://www.3dmark.com/3dm05/5532432

Bo$$

Lab Extraordinaire
Joined
May 7, 2009
Messages
5,656 (1.04/day)
Location
London, UK
System Name Desktop | Server
Processor Intel i7 2700k @ 4.6GHZ | AMD 5350 @ 2500MHZ
Motherboard Asus P7Z77-V Pro | Asus AM1I-A
Cooling Corsair H60v2 | Stock Air
Memory Crucial Ballistix 2x8GB CL8 1600MHZ | Corsair Vengence 2x4GB CL9 1600MHZ
Video Card(s) EVGA GTX 1060 6GB | PNY GTX 750Ti
Storage Samsung 840 EVO 250GB + 4TB WD Red | 2x Seagate Barracuda 2TB
Display(s) Samsung S27D390H + Asus VE276Q | Headless
Case Fractal Design R5 | CM Elite 110
Audio Device(s) Asus Xonar D1 w/Otone Stilo 5.1 and Creative Fatal1ty headset
Power Supply EVGA Supernova 850 G2| Corsair CX430M
Mouse Razer Imperator 2012
Keyboard Corsair K90
Software Windows 7 SP1 X64 | Ubuntu 16.04LTS


Maybe looks a little low here
 
Joined
Mar 18, 2008
Messages
5,395 (0.92/day)
Location
Australia
System Name Night Rider | Mini LAN PC | Workhorse
Processor AMD R7 5800X3D | Ryzen 1600X | i7 970
Motherboard MSi AM4 Pro Carbon | GA- | Gigabyte EX58-UD5
Cooling Noctua U9S Twin Fan| Stock Cooler, Copper Core)| Big shairkan B
Memory 2x8GB DDR4 G.Skill Ripjaws 3600MHz| 2x8GB Corsair 3000 | 6x2GB DDR3 1300 Corsair
Video Card(s) MSI AMD 6750XT | 6500XT | MSI RX 580 8GB
Storage 1TB WD Black NVME / 250GB SSD /2TB WD Black | 500GB SSD WD, 2x1TB, 1x750 | WD 500 SSD/Seagate 320
Display(s) LG 27" 1440P| Samsung 20" S20C300L/DELL 15" | 22" DELL/19"DELL
Case LIAN LI PC-18 | Mini ATX Case (custom) | Atrix C4 9001
Audio Device(s) Onboard | Onbaord | Onboard
Power Supply Silverstone 850 | Silverstone Mini 450W | Corsair CX-750
Mouse Coolermaster Pro | Rapoo V900 | Gigabyte 6850X
Keyboard MAX Keyboard Nighthawk X8 | Creative Fatal1ty eluminx | Some POS Logitech
Software Windows 10 Pro 64 | Windows 10 Pro 64 | Windows 7 Pro 64/Windows 10 Home
:rolleyes:
 

Attachments

  • Multi core Pi.jpg
    Multi core Pi.jpg
    171.9 KB · Views: 561
Joined
Jul 14, 2006
Messages
2,405 (0.37/day)
Location
People's Republic of America
System Name It's just a computer
Processor i9-9900K Direct Die
Motherboard eVGA Z390 Dark
Cooling Dual D5T Vario, XSPC BayRes, Nemesis GTR560, NF-A14-iPPC3000PWM, NF-A14-iPPC2000, HK IV Pro Nickel
Memory G.Skill F4-4500C19D-16GTZKKE or G.Skill F4-3600C16D-16GTZ or G.Skill F4-4000C19D-32GTZSW
Video Card(s) eVGA RTX2080 FTW3 Ultra
Storage Samsung 960 EVO M.2
Display(s) LG 32GK650F
Case Thermaltake Xaser VI
Audio Device(s) Auzentech X-Meridian 7.1 2G/Z-5500
Power Supply Seasonic Prime PX-1300
Mouse Logitech
Keyboard Logitech
Software Win7 Ultimate x64 SP1
Joined
Apr 4, 2008
Messages
4,686 (0.80/day)
System Name Obelisc
Processor i7 3770k @ 4.8 GHz
Motherboard Asus P8Z77-V
Cooling H110
Memory 16GB(4x4) @ 2400 MHz 9-11-11-31
Video Card(s) GTX 780 Ti
Storage 850 EVO 1TB, 2x 5TB Toshiba
Case T81
Audio Device(s) X-Fi Titanium HD
Power Supply EVGA 850 T2 80+ TITANIUM
Software Win10 64bit
Joined
Aug 30, 2006
Messages
7,192 (1.12/day)
System Name ICE-QUAD // ICE-CRUNCH
Processor Q6600 // 2x Xeon 5472
Memory 2GB DDR // 8GB FB-DIMM
Video Card(s) HD3850-AGP // FireGL 3400
Display(s) 2 x Samsung 204Ts = 3200x1200
Audio Device(s) Audigy 2
Software Windows Server 2003 R2 as a Workstation now migrated to W10 with regrets.
Great x86 kernel 5.x compatible!

I think a REALLY USEFUL statistic would be the time / cores / GHz so that we can see the "efficiency" of the FP core!

 

cadaveca

My name is Dave
Joined
Apr 10, 2006
Messages
17,232 (2.63/day)
It's because the FPU is getting used for this benchmark. Keep in mind that each module only has one FPU so without FMA3 optimizations you're only going to see 3-cores worth of performance out of it. However if this used fixed point instead of floating point, this could use the integer cores which are faster in general and performances significantly better on AMD's newer processors. Fixed point also offers a higher level of precision, floating point is inaccurate because of how it converts decimals to and from base 2 integers.

PD emulates x87 entirely, hence the slowdown, IMHO. FPU doesn't matter when you aren't capable of running the instruction in the first place.
 
Joined
Feb 21, 2008
Messages
40 (0.01/day)
I removed the slider.

Default setting for benchmark is 80.000 decimals. The target is to submit to HWBOT and we have to make sure that all users are benching at the same settings [80k decimals]

Download Link:

www.pcgamingxtreme.ro

 
Last edited:

Aquinus

Resident Wat-man
Joined
Jan 28, 2012
Messages
13,147 (2.97/day)
Location
Concord, NH, USA
System Name Apollo
Processor Intel Core i9 9880H
Motherboard Some proprietary Apple thing.
Memory 64GB DDR4-2667
Video Card(s) AMD Radeon Pro 5600M, 8GB HBM2
Storage 1TB Apple NVMe, 4TB External
Display(s) Laptop @ 3072x1920 + 2x LG 5k Ultrafine TB3 displays
Case MacBook Pro (16", 2019)
Audio Device(s) AirPods Pro, Sennheiser HD 380s w/ FIIO Alpen 2, or Logitech 2.1 Speakers
Power Supply 96w Power Adapter
Mouse Logitech MX Master 3
Keyboard Logitech G915, GL Clicky
Software MacOS 12.1
PD emulates x87 entirely, hence the slowdown, IMHO.
Pardon me, I know what x87 is but I don't know what you mean when you say "PD", could you clarify?
FPU doesn't matter when you aren't capable of running the instruction in the first place.
I agree but do we know that the benchmark isn't executing x87 instructions in the first place?

Also floating point emulation is worse than just using floating point numbers to begin with. You really need the exact value if you want your result of pi to be at all accurate. As that decimal place goes further out you're going to start losing precision.
 
Joined
Feb 21, 2008
Messages
40 (0.01/day)
Pardon me, I know what x87 is but I don't know what you mean when you say "PD", could you clarify?

I agree but do we know that the benchmark isn't executing x87 instructions in the first place?

Also floating point emulation is worse than just using floating point numbers to begin with. You really need the exact value if you want your result of pi to be at all accurate. As that decimal place goes further out you're going to start losing precision.

The application is compiled using Streaming SIMD Extensions 2 (/arch:SSE2) setting in order to replace FPU instructions with SSE code.
 

Aquinus

Resident Wat-man
Joined
Jan 28, 2012
Messages
13,147 (2.97/day)
Location
Concord, NH, USA
System Name Apollo
Processor Intel Core i9 9880H
Motherboard Some proprietary Apple thing.
Memory 64GB DDR4-2667
Video Card(s) AMD Radeon Pro 5600M, 8GB HBM2
Storage 1TB Apple NVMe, 4TB External
Display(s) Laptop @ 3072x1920 + 2x LG 5k Ultrafine TB3 displays
Case MacBook Pro (16", 2019)
Audio Device(s) AirPods Pro, Sennheiser HD 380s w/ FIIO Alpen 2, or Logitech 2.1 Speakers
Power Supply 96w Power Adapter
Mouse Logitech MX Master 3
Keyboard Logitech G915, GL Clicky
Software MacOS 12.1
The application is compiled using Streaming SIMD Extensions 2 (/arch:SSE2) setting in order to replace FPU instructions with SSE code.

SSE still utilizes the FPU, but that answers part of my question. I'm still curious what Cadaveca meant by "PD" though.
 
Top