LatestMarch 27th, 2017 - What's New
5.7 MBWin 10, 8, 7 (64-bit)p95v291.win64.zip090845939ADCAAD03F953109E6D4E386
4.5 MBWin 10, 8, 7 (32-bit)p95v291.win32.zipBB740E716500ADBA427F2A65070F375F
Downloaded: 36,200 times (43.4 GB)
Popular system stability test program.
29.1 (March 27th, 2017)
- Faster trial factoring for machines that support FMA (Haswell and later). Multi-threaded trial factoring now supports more than one thread sieving for small primes. Several tuning parameters added - see undoc.txt.
- The portable library, hwloc, for analyzing a machine's topology is now used. This replaces the buggy code prime95 used to detect hyperthreading. It also eliminates the need for AffinityScramble2. Running a benchmark will output this topology information to results.txt.
- AVX-512 trial factoring support added.
- Dialog box for benchmarking added.
- In the Test/Worker Windows dialog box you no longer choose how many threads each worker uses. Instead, you choose how many CPU cores each worker uses. There affinity options have been removed. There are two new options that will decide if each worker also uses hyperthreading.
28.10 (January 30th, 2017)
- Since GPUs are so much better at trial factoring than CPUs, benchmarking no longer times prime95's trial factoring by default. Two new benchmarking options are available: OnlyBenchThroughput and OnlyBenchMaxCPUs. See undoc.txt for details.
- Slightly reduced the memory bandwidth requirements for several large FFTs. May lead to a very small speed increase for users testing 100 million digit numbers.
- If running more than one worker, prime95 looks for any sin/cos data that it can share among the workers. Depending on the FFT sizes you are running, this could lead to a very slight reduction in needed memory bandwidth.
- Method for choosing the best FFT implementation changed. In previous versions, the FFT implementation that resulted in the fastest single worker timing was used. In this version the FFT implementation that had the best throughput was selected. For FMA3 FFTs I used a 4-core Skylake to measure best throughput. For AVX FFTs I used a 4-core Sandy Bridge to measure best throughput. Not many FFTs were affected, but you may see a few percent variation in throughput with this version.
- Improved AVX2 trial factoring in 64-bit executable. Trial factoring should still be done on a GPU. A GPU is on the order of 100 times more efficient at trial factoring than a CPU!!!
- Trial factoring now defines one "iteration" as processing 128KB of sieve, or 1M possible factors. In previous versions an iteration was defined as 16KB of sieve in 32-bit executables and 48KB in 64-bit executables. The trial factoring benchmark still times processing 16KB of sieve.
- Trial factoring in 64-bit executables is now multi-threaded.
- On initial install, the default settings for number of worker windows will be set to the number of cores / 4 with multithreading turned on.
- The worker windows dialog box now enforces a minimum number of multi-threaded cores for some work types to ensure timely completion of assignments. Also, the worker windows menu choice no longer allows assigning work to hyperthreads (they are rarely beneficial in prime95). This behavior can be overridden with the ConfigureHyperthreads undoc.txt feature.