• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

New multithreaded CPU benchmark: "Eight queens puzzle"

Joined
Jan 29, 2012
Messages
6,402 (1.44/day)
Location
Florida
System Name natr0n-PC
Processor Ryzen 5950x/5600x
Motherboard B450 AORUS M
Cooling EK AIO - 6 fan action
Memory Patriot - Viper Steel DDR4 (B-Die)(4x8GB)
Video Card(s) EVGA 3070ti FTW
Storage Various
Display(s) PIXIO IPS 240Hz 1080P
Case Thermaltake Level 20 VT
Audio Device(s) LOXJIE D10 + Kinter Amp + 6 Bookshelf Speakers Sony+JVC+Sony
Power Supply Super Flower Leadex III ARGB 80+ Gold 650W
Software XP/7/8.1/10
Benchmark Scores http://valid.x86.fr/79kuh6
Program foes not work for me? Says its testing 12 threads then the window clothes after a few seconds. ;*(


open command prompt as admin then drag and drop q.exe then hit enter
 
Joined
Aug 12, 2012
Messages
616 (0.15/day)
Location
Nebulas
System Name X99
Processor 5930K @ 4.7GHz @ 1.323v
Motherboard Rampage V Edition 10
Cooling EK
Memory Dominator Platinum 32GB
Video Card(s) 2x Gigabyte xtreme gaming 980ti
Storage Samsung 950 Pro M.2, 850 Pro & WD320
Display(s) Tempest X270OC @100Hz
Case Thermaltake Core P5
Audio Device(s) On-board
Power Supply 120-G2-1600-X1
Mouse Mamba 2012
Keyboard K70
Software Win10
Benchmark Scores http://www.3dmark.com/fs/6823139
I think I tried that, I will try again when I get back to the my office.
 
Joined
Mar 23, 2016
Messages
4,839 (1.65/day)
Processor Ryzen 9 5900X
Motherboard MSI B450 Tomahawk ATX
Cooling Cooler Master Hyper 212 Black Edition
Memory VENGEANCE LPX 2 x 16GB DDR4-3600 C18 OCed 3800
Video Card(s) XFX Speedster SWFT309 AMD Radeon RX 6700 XT CORE Gaming
Storage 970 EVO NVMe M.2 500 GB, 870 QVO 1 TB
Display(s) Samsung 28” 4K monitor
Case Phantek Eclipse P400S (PH-EC416PS)
Audio Device(s) EVGA NU Audio
Power Supply EVGA 850 BQ
Mouse SteelSeries Rival 310
Keyboard Logitech G G413 Silver
Software Windows 10 Professional 64-bit v22H2
When you unzip the file, hold down the shift key while right clicking on the folder, and choose the "Open Command Window Here."
 

silentbogo

Moderator
Staff member
Joined
Nov 20, 2013
Messages
5,470 (1.45/day)
Location
Kyiv, Ukraine
System Name WS#1337
Processor Ryzen 7 3800X
Motherboard ASUS X570-PLUS TUF Gaming
Cooling Xigmatek Scylla 240mm AIO
Memory 4x8GB Samsung DDR4 ECC UDIMM
Video Card(s) Inno3D RTX 3070 Ti iChill
Storage ADATA Legend 2TB + ADATA SX8200 Pro 1TB
Display(s) Samsung U24E590D (4K/UHD)
Case ghetto CM Cosmos RC-1000
Audio Device(s) ALC1220
Power Supply SeaSonic SSR-550FX (80+ GOLD)
Mouse Logitech G603
Keyboard Modecom Volcano Blade (Kailh choc LP)
VR HMD Google dreamview headset(aka fancy cardboard)
Software Windows 11, Ubuntu 20.04 LTS
Forgot to make a screenshot, but the results are a bit off:
Xeon X5650 running @3.3GHz
~24 sec for 1 thread
~6.5 sec for 12 threads
and everything in-between for 2,4,6,8 threads

CPU is barely breaking a sweat - 9-12% load regardless of #threads, and it does not even go to turbo (workload is not intense enough).
I did not look at the code yet, but I suspect there is something holding back the multi-threaded part. I did mess with openMP and OpenMPI (some simple image processing stuff) a few years ago and never seen such abnormal scaling...
 
Joined
Oct 17, 2012
Messages
9,781 (2.34/day)
Location
Massachusetts
System Name Americas cure is the death of Social Justice & Political Correctness
Processor i7-11700K
Motherboard Asrock Z590 Extreme wifi 6E
Cooling Noctua NH-U12A
Memory 32GB Corsair RGB fancy boi 5000
Video Card(s) RTX 3090 Reference
Storage Samsung 970 Evo 1Tb + Samsung 970 Evo 500Gb
Display(s) Dell - 27" LED QHD G-SYNC x2
Case Fractal Design Meshify-C
Audio Device(s) on board
Power Supply Seasonic Focus+ Gold 1000 Watt
Mouse Logitech G502 spectrum
Keyboard AZIO MGK-1 RGB (Kaith Blue)
Software Win 10 Professional 64 bit
Benchmark Scores the MLGeesiest
the only way i can get it to work is to hold shift, right click, open command window, then drag n drop, and it only runs a single pass @ 8 threads.
im sure i could modify that command line @ the top of the console, but i have no time to right now.
Intel Xeon E3 1231V3 @ 3.4-3.8Ghz
picked up the xeon which was Suppossed to come via newegg/Fedex, but wasnt delivered , got it local, lovinig the threads.sadly ill need to give it away to its owner tho :(
 
Joined
Jul 1, 2005
Messages
5,197 (0.76/day)
Location
Kansas City, KS
System Name Dell XPS 15 9560
Processor I7-7700HQ
Memory 32GB DDR4
Video Card(s) GTX 1050/1080 Ti
Storage 1TB SSD
Display(s) 2x Dell P2715Q/4k Internal
Case Razer Core
Audio Device(s) Creative E5/Objective 2 Amp/Senn HD650
Mouse Logitech Proteus Core
Keyboard Logitech G910
Forgot to make a screenshot, but the results are a bit off:
Xeon X5650 running @3.3GHz
~24 sec for 1 thread
~6.5 sec for 12 threads
and everything in-between for 2,4,6,8 threads

CPU is barely breaking a sweat - 9-12% load regardless of #threads, and it does not even go to turbo (workload is not intense enough).
I did not look at the code yet, but I suspect there is something holding back the multi-threaded part. I did mess with openMP and OpenMPI (some simple image processing stuff) a few years ago and never seen such abnormal scaling...

Yeah for something that resolves so fast, I would expect the 12 threads to have micromanagement issues before being able to really squeak out any speeds below 3s. continuous runs seem to have a cache reliance as well.
 

FordGT90Concept

"I go fast!1!11!1!"
Joined
Oct 13, 2008
Messages
26,259 (4.65/day)
Location
IA, USA
System Name BY-2021
Processor AMD Ryzen 7 5800X (65w eco profile)
Motherboard MSI B550 Gaming Plus
Cooling Scythe Mugen (rev 5)
Memory 2 x Kingston HyperX DDR4-3200 32 GiB
Video Card(s) AMD Radeon RX 7900 XT
Storage Samsung 980 Pro, Seagate Exos X20 TB 7200 RPM
Display(s) Nixeus NX-EDG274K (3840x2160@144 DP) + Samsung SyncMaster 906BW (1440x900@60 HDMI-DVI)
Case Coolermaster HAF 932 w/ USB 3.0 5.25" bay + USB 3.2 (A+C) 3.5" bay
Audio Device(s) Realtek ALC1150, Micca OriGen+
Power Supply Enermax Platimax 850w
Mouse Nixeus REVEL-X
Keyboard Tesoro Excalibur
Software Windows 10 Home 64-bit
Benchmark Scores Faster than the tortoise; slower than the hare.
Yeah for something that resolves so fast, I would expect the 12 threads to have micromanagement issues before being able to really squeak out any speeds below 3s. continuous runs seem to have a cache reliance as well.
I've ran it several times and it peaked out at 61% CPU. It climbs up, hits a peak, then climbs down.

For comparison, I ran Chess repeatedly and it took about 20 times to find one that took longer than four seconds to execute (so Task Manager can register it) and the peak was 87%. Chess is async multithreaded so as long as the stack of work is sufficiently large enough, it will load the CPU to 100%.

Q often doesn't even exceed 50% over 7 seconds.

Comparing the two programs in Task Manager, I think it is possible that a 12 core, finishing in ~3 seconds, may never be able to reach 100% load.

First thing I'd try is making the board larger. 12 threads should take at least 5-10 seconds to finish.
 
Last edited:
Joined
Nov 18, 2010
Messages
7,106 (1.46/day)
Location
Rīga, Latvia
System Name HELLSTAR
Processor AMD RYZEN 9 5950X
Motherboard ASUS Strix X570-E
Cooling 2x 360 + 280 rads. 3x Gentle Typhoons, 3x Phanteks T30, 2x TT T140 . EK-Quantum Momentum Monoblock.
Memory 4x8GB G.SKILL Trident Z RGB F4-4133C19D-16GTZR 14-16-12-30-44
Video Card(s) Sapphire Pulse RX 7900XTX + under waterblock.
Storage Optane 900P[W11] + WD BLACK SN850X 4TB + 750 EVO 500GB + 1TB 980PRO[FEDORA]
Display(s) Philips PHL BDM3270 + Acer XV242Y
Case Lian Li O11 Dynamic EVO
Audio Device(s) Sound Blaster ZxR
Power Supply Fractal Design Newton R3 1000W
Mouse Razer Basilisk
Keyboard Razer BlackWidow V3 - Yellow Switch
Software FEDORA 39 / Windows 11 insider
Mine are

Using 1 thread(s).
Elapsed time (hh:mm:ss:cs): 13.69
Using 4 thread(s).
Elapsed time (hh:mm:ss:cs): 6.10
Using 8 thread(s).
Elapsed time (hh:mm:ss:cs): 4.94
Using 12 thread(s).
Elapsed time (hh:mm:ss:cs): 3.48
 
Joined
Jul 14, 2006
Messages
2,405 (0.37/day)
Location
People's Republic of America
System Name It's just a computer
Processor i9-9900K Direct Die
Motherboard eVGA Z390 Dark
Cooling Dual D5T Vario, XSPC BayRes, Nemesis GTR560, NF-A14-iPPC3000PWM, NF-A14-iPPC2000, HK IV Pro Nickel
Memory G.Skill F4-4500C19D-16GTZKKE or G.Skill F4-3600C16D-16GTZ or G.Skill F4-4000C19D-32GTZSW
Video Card(s) eVGA RTX2080 FTW3 Ultra
Storage Samsung 960 EVO M.2
Display(s) LG 32GK650F
Case Thermaltake Xaser VI
Audio Device(s) Auzentech X-Meridian 7.1 2G/Z-5500
Power Supply Seasonic Prime PX-1300
Mouse Logitech
Keyboard Logitech
Software Win7 Ultimate x64 SP1
6700K @ 4.7

8 threads = 5.04
 
  • Like
Reactions: xvi

BAGZZlash

RBE Author
Joined
Mar 9, 2008
Messages
587 (0.10/day)
Okay, here's a few things.

1.) I put the results we have so far into a table and sorted it by the one-thread computing times.

2.) I made a few changes to the program:
2a) Not entering (or entering an invalid) number of threads to use will now make the program iterate through all available threads settings. That is, if you have, say, four cores and just run the program, it will compute the results based on four threads, then three, then two, then one.
2b) For those of you with rather fast CPUs I added a command line option "large". This will switch n from 18 to 19. Lots of more computations to do, will take a minute even on the fastest of CPUs.
2c) In either case, the program will now wait for the user to press the enter key after it's done. This will prevent the window from closing.

For the larger chessboard I figured that the launched parallel threads may have very different computation times. For the "large" chessboard, I see a clear behavior on my quadcore: First, the CPU load hits 100%. After a while, it declines to 75%, showing that one of the four threads is done, the other three ones still working. After few seconds, the load drops to 50%, then 25%, then the program is done. Can you confirm this behavior?
 

Attachments

  • Results.png
    Results.png
    18.7 KB · Views: 197
  • Q.zip
    64.7 KB · Views: 114
Joined
Nov 18, 2010
Messages
7,106 (1.46/day)
Location
Rīga, Latvia
System Name HELLSTAR
Processor AMD RYZEN 9 5950X
Motherboard ASUS Strix X570-E
Cooling 2x 360 + 280 rads. 3x Gentle Typhoons, 3x Phanteks T30, 2x TT T140 . EK-Quantum Momentum Monoblock.
Memory 4x8GB G.SKILL Trident Z RGB F4-4133C19D-16GTZR 14-16-12-30-44
Video Card(s) Sapphire Pulse RX 7900XTX + under waterblock.
Storage Optane 900P[W11] + WD BLACK SN850X 4TB + 750 EVO 500GB + 1TB 980PRO[FEDORA]
Display(s) Philips PHL BDM3270 + Acer XV242Y
Case Lian Li O11 Dynamic EVO
Audio Device(s) Sound Blaster ZxR
Power Supply Fractal Design Newton R3 1000W
Mouse Razer Basilisk
Keyboard Razer BlackWidow V3 - Yellow Switch
Software FEDORA 39 / Windows 11 insider
It works funny with many cores indeed.

Using 12 thread(s).
Elapsed time (hh:mm:ss:cs): 3.40
Using 11 thread(s).
Elapsed time (hh:mm:ss:cs): 3.35
Using 10 thread(s).
Elapsed time (hh:mm:ss:cs): 3.52
Using 9 thread(s).
Elapsed time (hh:mm:ss:cs): 3.54
Using 8 thread(s).
Elapsed time (hh:mm:ss:cs): 4.95
Using 7 thread(s).
Elapsed time (hh:mm:ss:cs): 4.91
Using 6 thread(s).
Elapsed time (hh:mm:ss:cs): 4.86
Using 5 thread(s).
Elapsed time (hh:mm:ss:cs): 5.72
Using 4 thread(s).
Elapsed time (hh:mm:ss:cs): 6.01
Using 3 thread(s).
Elapsed time (hh:mm:ss:cs): 6.39
Using 2 thread(s).
Elapsed time (hh:mm:ss:cs): 10.00
Using 1 thread(s).

And with large

Using 12 thread(s).
Elapsed time (hh:mm:ss:cs): 22.61
Using 11 thread(s).
Elapsed time (hh:mm:ss:cs): 23.65
Using 10 thread(s).
Elapsed time (hh:mm:ss:cs): 23.36
Using 9 thread(s).
Elapsed time (hh:mm:ss:cs): 22.07
Using 8 thread(s).
Elapsed time (hh:mm:ss:cs): 32.80
Using 7 thread(s).
Elapsed time (hh:mm:ss:cs): 32.79
Using 6 thread(s).
Elapsed time (hh:mm:ss:cs): 31.13
Using 5 thread(s).
Elapsed time (hh:mm:ss:cs): 40.04
Using 4 thread(s).
Elapsed time (hh:mm:ss:cs): 43.58
Using 3 thread(s).
Elapsed time (hh:mm:ss:cs): 52.90
Using 2 thread(s).
Elapsed time (hh:mm:ss:cs): 1:16.15
Using 1 thread(s).
Elapsed time (hh:mm:ss:cs): 1:39.01
 
Last edited:
Joined
May 1, 2008
Messages
1,039 (0.18/day)
Location
Frankfurt/Main - Germany
System Name Shaman of Sexy
Processor AMD Phenom II X4 955 BE@4Ghz EK Supreme Block
Motherboard M3A79-T Deluxe Anfi-Tech Waterblocks
Cooling Magicool 360 + 120 + 120 Slim scythe slipped Laing DDC-1/T
Memory 4GB Corsair Dominator CM2X2048-8500C5D
Video Card(s) Sapphire ATI Radeon HD 4870 X2 EK 4870 X2 Block
Storage RAID 0 Seagate
Display(s) Samsung 226BW 22"
Case CoolerMaster Cosmos RC-1000 in mod progress
Audio Device(s) onboard
Power Supply Coba Nitrox 750W
Software Windows 7 Ultimate
Benchmark Scores http://service.futuremark.com/compare?3dmv=1056967
C:\Users\n0tiert\Downloads\Q>q 8
Using 8 thread(s).
Elapsed time (hh:mm:ss:cs): 8.33

FX-8150@4GHZ

if i add "q.exe N" it only runs once, is that correct ?
 

FordGT90Concept

"I go fast!1!11!1!"
Joined
Oct 13, 2008
Messages
26,259 (4.65/day)
Location
IA, USA
System Name BY-2021
Processor AMD Ryzen 7 5800X (65w eco profile)
Motherboard MSI B550 Gaming Plus
Cooling Scythe Mugen (rev 5)
Memory 2 x Kingston HyperX DDR4-3200 32 GiB
Video Card(s) AMD Radeon RX 7900 XT
Storage Samsung 980 Pro, Seagate Exos X20 TB 7200 RPM
Display(s) Nixeus NX-EDG274K (3840x2160@144 DP) + Samsung SyncMaster 906BW (1440x900@60 HDMI-DVI)
Case Coolermaster HAF 932 w/ USB 3.0 5.25" bay + USB 3.2 (A+C) 3.5" bay
Audio Device(s) Realtek ALC1150, Micca OriGen+
Power Supply Enermax Platimax 850w
Mouse Nixeus REVEL-X
Keyboard Tesoro Excalibur
Software Windows 10 Home 64-bit
Benchmark Scores Faster than the tortoise; slower than the hare.
For the larger chessboard I figured that the launched parallel threads may have very different computation times. For the "large" chessboard, I see a clear behavior on my quadcore: First, the CPU load hits 100%. After a while, it declines to 75%, showing that one of the four threads is done, the other three ones still working. After few seconds, the load drops to 50%, then 25%, then the program is done. Can you confirm this behavior?
It went from 100% for a 2-3 seconds then it plummeted to 25% in under a second or two and stayed there for a while. It presumably then went down to 12.5% and finished. It does not fully utilize the CPU for long.

In the picture, where you see it shoot up from <50% back up to 100%, that's WCG taking back the idle clocks so disregard that...

I believe the steps are:
8 cores -> 6 cores -> 5 cores -> 4 cores -> 3 cores -> 2 cores/BOINC 8 cores
100% -> 75% -> 62.5% -> 50% -> 37.5% -> 25% (doesn't reach it before BOINC takes over)

Over half of the time the program runs, it's using 25% or less of the resources available to it.

Might I suggest using a Queue on the main thread and each core pulling off a job from it? That's how I usually do it.


Also, why run on less than maximum cores by default?
 
Last edited:
Joined
Mar 23, 2016
Messages
4,839 (1.65/day)
Processor Ryzen 9 5900X
Motherboard MSI B450 Tomahawk ATX
Cooling Cooler Master Hyper 212 Black Edition
Memory VENGEANCE LPX 2 x 16GB DDR4-3600 C18 OCed 3800
Video Card(s) XFX Speedster SWFT309 AMD Radeon RX 6700 XT CORE Gaming
Storage 970 EVO NVMe M.2 500 GB, 870 QVO 1 TB
Display(s) Samsung 28” 4K monitor
Case Phantek Eclipse P400S (PH-EC416PS)
Audio Device(s) EVGA NU Audio
Power Supply EVGA 850 BQ
Mouse SteelSeries Rival 310
Keyboard Logitech G G413 Silver
Software Windows 10 Professional 64-bit v22H2
The screenshot of Task Manager was with the command line option "large." It's a capture of the 8 thread then 7 thread.


 

Aquinus

Resident Wat-man
Joined
Jan 28, 2012
Messages
13,147 (2.96/day)
Location
Concord, NH, USA
System Name Apollo
Processor Intel Core i9 9880H
Motherboard Some proprietary Apple thing.
Memory 64GB DDR4-2667
Video Card(s) AMD Radeon Pro 5600M, 8GB HBM2
Storage 1TB Apple NVMe, 4TB External
Display(s) Laptop @ 3072x1920 + 2x LG 5k Ultrafine TB3 displays
Case MacBook Pro (16", 2019)
Audio Device(s) AirPods Pro, Sennheiser HD 380s w/ FIIO Alpen 2, or Logitech 2.1 Speakers
Power Supply 96w Power Adapter
Mouse Logitech MX Master 3
Keyboard Logitech G915, GL Clicky
Software MacOS 12.1
@FordGT90Concept and @biffzinker, depending on how the OP designed the benchmark, you may not see 100% usage even though up to 8 threads are being used.

As Ford said:
Might I suggest using a Queue on the main thread and each core pulling off a job from it? That's how I usually do it.
This is useful if and only if the algorithm is brute forcing the solutions. Using a queue for divvying out tasks is the most basic way to accelerate purely parallel workloads (such as dispatching at the job level,) such as doing it brute force but, it's not the most efficient way to solve the problem. A divide and conquer algorithm will eventually exhibit the behavior that you two are describing. That is, as the task is broken apart, it can utilize more cores and a lot of applications can benefit from this to a point but, requires threads joining up on each other when their "slice" of the calculation is complete, it doesn't allow the computer to put that thread to work elsewhere since there is still a significant amount of serial work that needs to be completed.

Depending on how @BAGZZlash implemented it, a little bit of heuristics, queueing, or deeper level of concurrency aside from the brute-force way might improve performance significantly. I did notice that the OP used OpenMP which means the application is probably written in C or C++ which means that a big hurdle is actually creating the multi-threaded part. I would argue a more dynamic language with richer data structures would probably help at the expense of some computational power.

I did take a peek at the link the OP provided and it appears that the algorithm is most definitely a brute force method that has minimal heuristics. I'm tempted to create my own version but, it probably won't be in C/C++ but rather a language like Clojure.

Would anyone be interested? If there are, it might be an incentive for me to do it.
 
Joined
Nov 4, 2005
Messages
11,655 (1.73/day)
System Name Compy 386
Processor 7800X3D
Motherboard Asus
Cooling Air for now.....
Memory 64 GB DDR5 6400Mhz
Video Card(s) 7900XTX 310 Merc
Storage Samsung 990 2TB, 2 SP 2TB SSDs and over 10TB spinning
Display(s) 56" Samsung 4K HDR
Audio Device(s) ATI HDMI
Mouse Logitech MX518
Keyboard Razer
Software A lot.
Benchmark Scores Its fast. Enough.
My 1100T

6 Threads 6:38
1 Thread 18:11
 

cdawall

where the hell are my stars
Joined
Jul 23, 2006
Messages
27,680 (4.29/day)
Location
Houston
System Name All the cores
Processor 2990WX
Motherboard Asrock X399M
Cooling CPU-XSPC RayStorm Neo, 2x240mm+360mm, D5PWM+140mL, GPU-2x360mm, 2xbyski, D4+D5+100mL
Memory 4x16GB G.Skill 3600
Video Card(s) (2) EVGA SC BLACK 1080Ti's
Storage 2x Samsung SM951 512GB, Samsung PM961 512GB
Display(s) Dell UP2414Q 3840X2160@60hz
Case Caselabs Mercury S5+pedestal
Audio Device(s) Fischer HA-02->Fischer FA-002W High edition/FA-003/Jubilate/FA-011 depending on my mood
Power Supply Seasonic Prime 1200w
Mouse Thermaltake Theron, Steam controller
Keyboard Keychron K8
Software W10P
Joined
Mar 23, 2016
Messages
4,839 (1.65/day)
Processor Ryzen 9 5900X
Motherboard MSI B450 Tomahawk ATX
Cooling Cooler Master Hyper 212 Black Edition
Memory VENGEANCE LPX 2 x 16GB DDR4-3600 C18 OCed 3800
Video Card(s) XFX Speedster SWFT309 AMD Radeon RX 6700 XT CORE Gaming
Storage 970 EVO NVMe M.2 500 GB, 870 QVO 1 TB
Display(s) Samsung 28” 4K monitor
Case Phantek Eclipse P400S (PH-EC416PS)
Audio Device(s) EVGA NU Audio
Power Supply EVGA 850 BQ
Mouse SteelSeries Rival 310
Keyboard Logitech G G413 Silver
Software Windows 10 Professional 64-bit v22H2
Joined
Sep 9, 2013
Messages
526 (0.14/day)
System Name Can I run it
Processor delidded i9-10900KF @ AI OC 3x5.4 10x5.3+Supercool direct die waterblock
Motherboard ASUS Maximus XII Apex 2701 BIOS
Cooling Main = GTS 360 GTX 240, EK PE 360,XSPC EX 360,2x EK-XRES 100 Revo D5 PWM, 12x T30, AC High Flow Next
Memory 2x16GB TridentZ 3600@4600 16-16-16-36@1.59V+EK Monarch, Separate loop with GTS 120&Freezemod DDC
Video Card(s) Gigabyte RTX 3080 Ti Gaming OC @ 0.8V 1830Mhz core + Barrow full cover waterblock
Storage Transcend PCIE 220S 1TB for (main), WD Blue 3D NAND 250GB for OC testing, Seagate Barracuda 4TB
Display(s) Samsung Odyssey OLED G9 5120x1440 240Hz calibrated by X-Rite i1 Display Pro Plus
Case Thermaltake View 71
Audio Device(s) Q Acoustics M20 HD
Power Supply Silverstone ST-1200 PTS 1200W 80+ Platinum
Mouse Logitech G Pro Wireless
Keyboard Ducky Shine 7 (Cherry MX red)
Software Windows 11
i5-6500 @ 5Ghz

 
Joined
Sep 9, 2013
Messages
526 (0.14/day)
System Name Can I run it
Processor delidded i9-10900KF @ AI OC 3x5.4 10x5.3+Supercool direct die waterblock
Motherboard ASUS Maximus XII Apex 2701 BIOS
Cooling Main = GTS 360 GTX 240, EK PE 360,XSPC EX 360,2x EK-XRES 100 Revo D5 PWM, 12x T30, AC High Flow Next
Memory 2x16GB TridentZ 3600@4600 16-16-16-36@1.59V+EK Monarch, Separate loop with GTS 120&Freezemod DDC
Video Card(s) Gigabyte RTX 3080 Ti Gaming OC @ 0.8V 1830Mhz core + Barrow full cover waterblock
Storage Transcend PCIE 220S 1TB for (main), WD Blue 3D NAND 250GB for OC testing, Seagate Barracuda 4TB
Display(s) Samsung Odyssey OLED G9 5120x1440 240Hz calibrated by X-Rite i1 Display Pro Plus
Case Thermaltake View 71
Audio Device(s) Q Acoustics M20 HD
Power Supply Silverstone ST-1200 PTS 1200W 80+ Platinum
Mouse Logitech G Pro Wireless
Keyboard Ducky Shine 7 (Cherry MX red)
Software Windows 11
still pass...

 

FordGT90Concept

"I go fast!1!11!1!"
Joined
Oct 13, 2008
Messages
26,259 (4.65/day)
Location
IA, USA
System Name BY-2021
Processor AMD Ryzen 7 5800X (65w eco profile)
Motherboard MSI B550 Gaming Plus
Cooling Scythe Mugen (rev 5)
Memory 2 x Kingston HyperX DDR4-3200 32 GiB
Video Card(s) AMD Radeon RX 7900 XT
Storage Samsung 980 Pro, Seagate Exos X20 TB 7200 RPM
Display(s) Nixeus NX-EDG274K (3840x2160@144 DP) + Samsung SyncMaster 906BW (1440x900@60 HDMI-DVI)
Case Coolermaster HAF 932 w/ USB 3.0 5.25" bay + USB 3.2 (A+C) 3.5" bay
Audio Device(s) Realtek ALC1150, Micca OriGen+
Power Supply Enermax Platimax 850w
Mouse Nixeus REVEL-X
Keyboard Tesoro Excalibur
Software Windows 10 Home 64-bit
Benchmark Scores Faster than the tortoise; slower than the hare.
I did notice that the OP used OpenMP which means the application is probably written in C or C++ which means that a big hurdle is actually creating the multi-threaded part. I would argue a more dynamic language with richer data structures would probably help at the expense of some computational power.
It is C and the source is in the ZIP ("Q.c"). I don't know enough of C to sort through it to see if queuing is possible.
 
Joined
Jan 14, 2016
Messages
140 (0.05/day)
Location
Canada
System Name I overclock AMD setups
Processor AMD 8320+ @ 4.95GHZ / AMD 6300 @ 4.8 Ghz / AMD 8350 @ In RMA
Motherboard Gigabyte 990FXA-UD3 / Gigabyte 970-D3P
Cooling Corsair H100
Memory 16GB DDR3 Corsair Vengace
Video Card(s) MSI GAMING GTX 980 @ 1545Mhz core
Storage Samsung SSD 850 EVO 250GB
Display(s) Acer 144hz
Case Coolermaster CM 690 III (White Version)
Audio Device(s) Creative Titanium Fatality Pro
Power Supply Corsair Hx750i
Mouse Logitech G300s
Keyboard Microsoft Digital Media
Software Windows 10 64
Benchmark Scores 23.0k on Skydiver, 8.1k on Firestrike. 1.5k single, 9.7k multi CPU-Z
FX 8320 @ 4.95Ghz

 
Joined
Feb 8, 2012
Messages
3,012 (0.68/day)
Location
Zagreb, Croatia
System Name Windows 10 64-bit Core i7 6700
Processor Intel Core i7 6700
Motherboard Asus Z170M-PLUS
Cooling Corsair AIO
Memory 2 x 8 GB Kingston DDR4 2666
Video Card(s) Gigabyte NVIDIA GeForce GTX 1060 6GB
Storage Western Digital Caviar Blue 1 TB, Seagate Baracuda 1 TB
Display(s) Dell P2414H
Case Corsair Carbide Air 540
Audio Device(s) Realtek HD Audio
Power Supply Corsair TX v2 650W
Mouse Steelseries Sensei
Keyboard CM Storm Quickfire Pro, Cherry MX Reds
Software MS Windows 10 Pro 64-bit
It is C and the source is in the ZIP ("Q.c"). I don't know enough of C to sort through it to see if queuing is possible.
Not really possible because openmp handles thread scheduling by itself, you just use #pragma omp parallel for construct before your for loop and the openmp distributes iterations to different threads.
What you can do is choose from 4 modes for scheduler: static, dynamic, guided or runtime. Last two are special cases of dynamic.
Basically static is with least locking, does simple round robin and expects that calculated iteration count in the for loop never changes so the chunks can be calculated at compile time.
Dynamic calculates all chunks in runtime and requires more locking.
Here the iteration count of the for loop that get parallelized is 18 and static scheduling could be used but each iteration is heavy and long running, so the granularity is too coarse to harvest more efficiency by modifying thread scheduling. This is why scaling is off and true scaling would be seen on 18+ core xeons.
Additionally this code could not be parallelized with finer granularity because only the calculation of each scenario of the first queen position (and the subsequent brute force search down the hierarchy) is independent of each other.
 

FordGT90Concept

"I go fast!1!11!1!"
Joined
Oct 13, 2008
Messages
26,259 (4.65/day)
Location
IA, USA
System Name BY-2021
Processor AMD Ryzen 7 5800X (65w eco profile)
Motherboard MSI B550 Gaming Plus
Cooling Scythe Mugen (rev 5)
Memory 2 x Kingston HyperX DDR4-3200 32 GiB
Video Card(s) AMD Radeon RX 7900 XT
Storage Samsung 980 Pro, Seagate Exos X20 TB 7200 RPM
Display(s) Nixeus NX-EDG274K (3840x2160@144 DP) + Samsung SyncMaster 906BW (1440x900@60 HDMI-DVI)
Case Coolermaster HAF 932 w/ USB 3.0 5.25" bay + USB 3.2 (A+C) 3.5" bay
Audio Device(s) Realtek ALC1150, Micca OriGen+
Power Supply Enermax Platimax 850w
Mouse Nixeus REVEL-X
Keyboard Tesoro Excalibur
Software Windows 10 Home 64-bit
Benchmark Scores Faster than the tortoise; slower than the hare.
Explains why it falls to 2-3 (n=18 or 19 in the case of large) threads on my system. It knocks out the first 8 (100% CPU), then the second 8 (100% falling off), leaving the remaining 2-3 (32.5% falling off). Without major reworking of the algorithm, it does not make a good benchmark because of that bias.
 
Top