Why include FP64

WhoDecidedThat · Mar 28, 2014

Hi guys. I have a question, well, it's a rant actually. GK104 can do
FP64 at 1/24 the FP32 rate. GM107 can do FP64 at 1/32 the FP32 rate.
My question is, why doesn't NVIDIA just leave FP64 support and save
die space? What is the point of supporting FP64 at such abysmal
speeds?

newtekie1 · Mar 28, 2014

GK104 and GM107 are way more capable at Double-Precision Floating Point than nVidia allows in the desktop parts. They include it because the ability is there, they use the same dies for their workstation cards. It is cheaper to produce a single die and use it in multiple cards than it is to produce a purpose built die for each different product. So the ability is in the die no matter what, but they purposely limit the performance of the desktop cards so people that actually need the higher performance will pay the outrageous prices for workstation class cards.

FordGT90Concept · Mar 28, 2014

Because scientific applications especially require 64-bit floating point operations. Some CAD programs probably do as well. If they eliminate it, they basically give AMD (maybe Intel too with Xeon Phi) the market for high performance computing.

WhoDecidedThat · Mar 28, 2014

newtekie1 said:
GK104 and GM107 are way more capable at Double-Precision Floating Point than nVidia allows in the desktop parts. They include it because the ability is there, they use the same dies for their workstation cards. It is cheaper to produce a single die and use it in multiple cards than it is to produce a purpose built die for each different product. So the ability is in the die no matter what, but they purposely limit the performance of the desktop cards so people that actually need the higher performance will pay the outrageous prices for workstation class cards.

By GK104 I meant the fully unlocked GK104. It's a fact that a completely unlocked GK104 processes FP64 at 1/24 FP32 rate.

FordGT90Concept said:
Because scientific applications especially require 64-bit floating point operations. Some CAD programs probably do as well. If they eliminate it, they basically give AMD (maybe Intel too with Xeon Phi) the market for high performance computing.

They have the TITAN series for that.

newtekie1 · Mar 28, 2014

blanarahul said:
By GK104 I meant the fully unlocked GK104. It's a fact that a completely unlocked GK104 processes FP64 at 1/24 FP32 rate.

Again, it is a by-product of trying to use the same GPU die in two different markets and balancing things. GK104 was designed primarily for the desktop market, and it's poor FP64 performance is a result, but they couldn't complete remove FP64 support because they wanted to use it in the workstation market as well. While GK110 was designed for the workstation market, and its great FP64 performance is a result, it was only brought to the desktop market when it was needed.

WhoDecidedThat · Mar 28, 2014

newtekie1 said:
They couldn't complete remove FP64 support because they wanted to use it in the workstation market as well.

What kind of Workstation Market would like to use 190 GFLOPs of peak FP64 performance? A 150$ CPU can do more than that.

The Von Matrices · Mar 28, 2014

You're right in that it doesn't make sense to use a mid range GPU over a good CPU for those tasks. A Core i7 4770K can do ~224 GFLOPS DP, while a GTX 680 can only churn out ~128 GFLOPS DP.

The real reason why FP64 blocks are not eliminated from GPUs is to ensure that all code works on all GPUs in a series, even if it does run more slowly on some. This also allows developers to create and test their programs on any GPU then deploy it on GPUs like Tesla cards that are much faster at DP.

WhoDecidedThat · Mar 28, 2014

The Von Matrices said:
This also allows developers to create and test their programs on any GPU then deploy it on GPUs like Tesla cards that are much faster at DP.

That appears to be the real reason. But they could leave GK104 alone and create a cheapo TITAN card.

newtekie1 · Mar 29, 2014

Are we sure GK104 actually has hardware FP64 and it isn't just emulated FP64 support? If GK104 was emulating FP64, that would explain why it's performance is so bad and also not be wasting any silicon.

BiggieShady · Mar 29, 2014

newtekie1 said:
Are we sure GK104 actually has hardware FP64 and it isn't just emulated FP64 support?

Yes, there is special block (module) that is not shown in diagrams which has 8 fat cuda cores that can do fp64 and only fp64 (that's why it's not in the diagrams). I believe titan has those cores mixed among regular ones in every smx.

The CUDA FP64 block contains 8 special CUDA cores that are not part of the general CUDA core count and are not in any of NVIDIA’s diagrams.

from http://www.anandtech.com/show/5699/nvidia-geforce-gtx-680-review/2

eidairaman1 · Mar 29, 2014

Do you even use it???

RejZoR · Mar 29, 2014

newtekie1 said:
GK104 and GM107 are way more capable at Double-Precision Floating Point than nVidia allows in the desktop parts. They include it because the ability is there, they use the same dies for their workstation cards. It is cheaper to produce a single die and use it in multiple cards than it is to produce a purpose built die for each different product. So the ability is in the die no matter what, but they purposely limit the performance of the desktop cards so people that actually need the higher performance will pay the outrageous prices for workstation class cards.

That's like creating awesome V12 Bi-Turbo engine and purposly limit it to 4 cylinders with disabled turbos and stuff it in Mercedes A class. Just so they can sell more of those SL65 Mercs with full blown same engine. That sucks a bit...

TheoneandonlyMrK · Mar 29, 2014

Every chip manufacturer out there bar memory manufacturers do this.
They also frequently put circuit on die just so they can see how it will work out in yield and use terms that the customer who buys the end chip will never ever see or use because it will be fused off.
Its just The Way.
As for dropping all fp64 on lower cards that would not be wise as I doubt it would play crisis then at all and some games would really really struggle emulating fp64 and I'd wager physx uSes it quite a bit

MxPhenom 216 · Mar 29, 2014

blanarahul said:
That appears to be the real reason. But they could leave GK104 alone and create a cheapo TITAN card.

A lot of Direct Compute performance comes from the cache on the gpu, not just FP64. Nvidia started stripping down the cache on their Geforce cards after first gen fermi. Gtx470/480.

Cache can make a gpu die very large and hot.

WhoDecidedThat · Mar 30, 2014

theoneandonlymrk said:
As for dropping all fp64 on lower cards that would not be wise as I doubt it would play crisis then at all and some games would really really struggle emulating fp64 and I'd wager physx uSes it quite a bit

FP64 is used for more precise effects not for more effects. There isn't a single consumer application that uses GPU FP64 and that includes PhysX.

MxPhenom 216 said:
A lot of Direct Compute performance comes from the cache on the gpu, not just FP64. Nvidia started stripping down the cache on their Geforce cards after first gen fermi. Gtx470/480.

Cache can make a gpu die very large and hot.

Yet Maxwell went with a 2 MB cache for GM107.

Aquinus · Mar 30, 2014

Can't nVidia's higher end cards (like Titan) and workstation cards run in a DP mode where DP performance isn't crap at the expense of SP performance? I can't seem to find where I read that but I distinctly remember nVidia giving that option to particular GPUs like Titan and their workstation cards. I think the point was that most games use single precision, so it makes sense for SP to be faster than DP for consumer graphics cards.

Edit: Yes! There is a driver switch that changes how the GPU performs with DP and SP. Apparently there are side-effects like Boost getting disabled, but it results in DP numbers that would otherwise be mediocre.

source

FordGT90Concept · Mar 30, 2014

The problem with IEEE 754-1985 binary64 (double precision) is it is not bitwise backwards compatible with binary32 (single precision):
http://kipirvine.com/asm/workbook/floating_tut.htm

If you throw a binary32 at a binary64 processor, it has to convert it (or have separate hardware) before it can process it. There's no reason why GPUs couldn't be made entirely binary64 but to do so means no backwards compatibility or severely limited performance when doing so. This is why binary64 GPUs are marketed separately to a different audience. As demand increases for binary64, GPUs will grow increasingly biased towards binary64 but likely at the cost of binary32 performance.

Pi in binary32: 3.1415927410125732421875
Pi in binary64: 3.141592653589793115997963468544185161590576171875

Pi is a small number. Imagine if you were dealing with a number like 1 billion and some change. The bigger the whole number, the less precise the fraction.

WhoDecidedThat · Mar 30, 2014

FordGT90Concept said:
The problem with IEEE 754-1985 binary64 (double precision) is it is not bitwise backwards compatible with binary32 (single precision):
http://kipirvine.com/asm/workbook/floating_tut.htm

If you throw a binary32 at a binary64 processor, it has to convert it (or have separate hardware) before it can process it. There's no reason why GPUs couldn't be made entirely binary64 but to do so means no backwards compatibility or severely limited performance when doing so. This is why binary64 GPUs are marketed separately to a different audience. As demand increases for binary64, GPUs will grow increasingly biased towards binary64 but likely at the cost of binary32 performance.

This may explain why NVIDIA had entirely different FP64 CUDA Cores in Kepler. But why did Fermi had FP32 capable FP64 cores?

TheoneandonlyMrK · Mar 30, 2014

blanarahul said:
FP64 is used for more precise effects not for more effects. There isn't a single consumer application that uses GPU FP64 and that includes PhysX.

Yet Maxwell went with a 2 MB cache for GM107.

There are plenty of uses for fp 64 and i did not imply more effects as i know what double precision means , regardless it is what it is you are bickering about bs , not everyone uses a gf cards sound controler and that uses space, should they chop that out for one more shader.
No
And how is something you don't use appreciate or like worth a rant

FordGT90Concept · Mar 30, 2014

blanarahul said:
This may explain why NVIDIA had entirely different FP64 CUDA Cores in Kepler. But why did Fermi had FP32 capable FP64 cores?

Every FPU in each CUDA core in Fermi has hardware to handle both binary32 and binary64 not unlike the FPU in a CPU.

CUDA cores dedicated to binary64 could be used directly by programmers to perform high precision calculations.

WhoDecidedThat · Mar 30, 2014

FordGT90Concept said:
Every FPU in Fermi has hardware to handle both binary32 and binary64.

I know that. But why didn't they do the same for Kepler?

TheoneandonlyMrK · Mar 30, 2014

Arch has to evolve with our uses .are you just working on post count.

WhoDecidedThat · Mar 30, 2014

theoneandonlymrk said:
are you just working on post count.

Nope. What's the benefit? I made this account wayy back in 2011. It won't affect my posts/day anyway.

FordGT90Concept said:
CUDA cores dedicated to binary64 could be used directly by programmers to perform high precision calculations.

What?

FordGT90Concept · Mar 30, 2014

blanarahul said:
I know that. But why didn't they do the same for Kepler?

They wanted to boost binary64 by having cores dedicated to it most likely for scientific applications. It sounds to me like, if Fermi had 32 FPUs capable of binary32 and binary64, Kepler would have 64 + some more binary64 only FPUs. This way, they're getting equal binary32 capability and they get equal or much more (for software the uses the dedicated binary64 FPUs) binary64 performance.

blanarahul said:
Nope. What's the benefit? I made this account wayy back in 2011. It won't affect my posts/day anyway. But you didn't answer my question. What is the advantage of separating FP32 and FP64 cores?

The advantage is that you can vastly simplify the FPU by removing the backwards compatibility for binary32. This means, in turn, they can pack more binary64 performance into less die space.

WhoDecidedThat · Mar 30, 2014

FordGT90Concept said:
The advantage is that you can vastly simplify the FPU by removing the backwards compatibility for binary32. This means, in turn, they can pack more binary64 performance into less die space.

That appears to be the reason.

FordGT90Concept said:
It sounds to me like, if Fermi had 32 FPUs capable of binary32 and binary64, Kepler would have 64 + some more binary64 only FPUs. This way, they're getting equal binary32 capability and they get equal or much more (for software the uses the dedicated binary64 FPUs) binary64 performance.

Nope.

Anandtech said:
In GK104 none of the regular CUDA core blocks are FP64 capable; in its place we have what we’re calling the CUDA FP64 block. The CUDA FP64 block contains 8 special CUDA cores that are not part of the general CUDA core count and are not in any of NVIDIA’s diagrams. These CUDA cores can only do and are only used for FP64 math.

Processor	Intel Core i7 10850K@5.2GHz
Motherboard	AsRock Z470 Taichi
Cooling	Corsair H115i Pro w/ Noctua NF-A14 Fans
Memory	32GB DDR4-3600
Video Card(s)	RTX 2070 Super
Storage	500GB SX8200 Pro + 8TB with 1TB SSD Cache
Display(s)	Acer Nitro VG280K 4K 28"
Case	Fractal Design Define S
Audio Device(s)	Onboard is good enough for me
Power Supply	eVGA SuperNOVA 1000w G3
Software	Windows 10 Pro x64

System Name	BY-2021
Processor	AMD Ryzen 7 5800X (65w eco profile)
Motherboard	MSI B550 Gaming Plus
Cooling	Scythe Mugen (rev 5)
Memory	2 x Kingston HyperX DDR4-3200 32 GiB
Video Card(s)	AMD Radeon RX 7900 XT
Storage	Samsung 980 Pro, Seagate Exos X20 TB 7200 RPM
Display(s)	Nixeus NX-EDG274K (3840x2160@144 DP) + Samsung SyncMaster 906BW (1440x900@60 HDMI-DVI)
Case	Coolermaster HAF 932 w/ USB 3.0 5.25" bay + USB 3.2 (A+C) 3.5" bay
Audio Device(s)	Realtek ALC1150, Micca OriGen+
Power Supply	Enermax Platimax 850w
Mouse	Nixeus REVEL-X
Keyboard	Tesoro Excalibur
Software	Windows 10 Home 64-bit
Benchmark Scores	Faster than the tortoise; slower than the hare.

Processor	Intel Core i7 10850K@5.2GHz
Motherboard	AsRock Z470 Taichi
Cooling	Corsair H115i Pro w/ Noctua NF-A14 Fans
Memory	32GB DDR4-3600
Video Card(s)	RTX 2070 Super
Storage	500GB SX8200 Pro + 8TB with 1TB SSD Cache
Display(s)	Acer Nitro VG280K 4K 28"
Case	Fractal Design Define S
Audio Device(s)	Onboard is good enough for me
Power Supply	eVGA SuperNOVA 1000w G3
Software	Windows 10 Pro x64

System Name	My Surround PC
Processor	AMD Ryzen 9 7950X3D
Motherboard	ASUS STRIX X670E-F
Cooling	Swiftech MCP35X / EK Quantum CPU / Alphacool GPU / XSPC 480mm w/ Corsair Fans
Memory	96GB (2 x 48 GB) G.Skill DDR5-6000 CL30
Video Card(s)	MSI NVIDIA GeForce RTX 4090 Suprim X 24GB
Storage	WD SN850 2TB, Samsung PM981a 1TB, 4 x 4TB + 1 x 10TB HGST NAS HDD for Windows Storage Spaces
Display(s)	2 x Viotek GFI27QXA 27" 4K 120Hz + LG UH850 4K 60Hz + HMD
Case	NZXT Source 530
Audio Device(s)	Sony MDR-7506 / Logitech Z-5500 5.1
Power Supply	Corsair RM1000x 1 kW
Mouse	Patriot Viper V560
Keyboard	Corsair K100
VR HMD	HP Reverb G2
Software	Windows 11 Pro x64
Benchmark Scores	Mellanox ConnectX-3 10 Gb/s Fiber Network Card

Processor	Intel Core i7 10850K@5.2GHz
Motherboard	AsRock Z470 Taichi
Cooling	Corsair H115i Pro w/ Noctua NF-A14 Fans
Memory	32GB DDR4-3600
Video Card(s)	RTX 2070 Super
Storage	500GB SX8200 Pro + 8TB with 1TB SSD Cache
Display(s)	Acer Nitro VG280K 4K 28"
Case	Fractal Design Define S
Audio Device(s)	Onboard is good enough for me
Power Supply	eVGA SuperNOVA 1000w G3
Software	Windows 10 Pro x64

Why include FP64

WhoDecidedThat

newtekie1

Semi-Retired Folder

FordGT90Concept

"I go fast!1!11!1!"

WhoDecidedThat

newtekie1

Semi-Retired Folder

WhoDecidedThat

The Von Matrices

WhoDecidedThat

newtekie1

Semi-Retired Folder

BiggieShady

eidairaman1

The Exiled Airman

RejZoR

TheoneandonlyMrK

MxPhenom 216

ASIC Engineer

WhoDecidedThat

Aquinus

Resident Wat-man

FordGT90Concept

"I go fast!1!11!1!"

WhoDecidedThat

TheoneandonlyMrK

FordGT90Concept

"I go fast!1!11!1!"

WhoDecidedThat

TheoneandonlyMrK

WhoDecidedThat

FordGT90Concept

"I go fast!1!11!1!"

WhoDecidedThat

System Name	Windows 10 64-bit Core i7 6700
Processor	Intel Core i7 6700
Motherboard	Asus Z170M-PLUS
Cooling	Corsair AIO
Memory	2 x 8 GB Kingston DDR4 2666
Video Card(s)	Gigabyte NVIDIA GeForce GTX 1060 6GB
Storage	Western Digital Caviar Blue 1 TB, Seagate Baracuda 1 TB
Display(s)	Dell P2414H
Case	Corsair Carbide Air 540
Audio Device(s)	Realtek HD Audio
Power Supply	Corsair TX v2 650W
Mouse	Steelseries Sensei
Keyboard	CM Storm Quickfire Pro, Cherry MX Reds
Software	MS Windows 10 Pro 64-bit

System Name	PCGOD
Processor	AMD FX 8350@ 5.0GHz
Motherboard	Asus TUF 990FX Sabertooth R2 2901 Bios
Cooling	Scythe Ashura, 2×BitFenix 230mm Spectre Pro LED (Blue,Green), 2x BitFenix 140mm Spectre Pro LED
Memory	16 GB Gskill Ripjaws X 2133 (2400 OC, 10-10-12-20-20, 1T, 1.65V)
Video Card(s)	AMD Radeon 290 Sapphire Vapor-X
Storage	Samsung 840 Pro 256GB, WD Velociraptor 1TB
Display(s)	NEC Multisync LCD 1700V (Display Port Adapter)
Case	AeroCool Xpredator Evil Blue Edition
Audio Device(s)	Creative Labs Sound Blaster ZxR
Power Supply	Seasonic 1250 XM2 Series (XP3)
Mouse	Roccat Kone XTD
Keyboard	Roccat Ryos MK Pro
Software	Windows 7 Pro 64

System Name	Dark Monolith
Processor	AMD Ryzen 7 5800X3D
Motherboard	ASUS Strix X570-E
Cooling	Arctic Cooling Freezer II 240mm + 2x SilentWings 3 120mm
Memory	64 GB G.Skill Ripjaws V Black 3600 MHz
Video Card(s)	XFX Radeon RX 9070 XT Mercury OC Magnetic Air
Storage	Seagate Firecuda 530 4 TB SSD + Samsung 850 Pro 2 TB SSD + Seagate Barracuda 8 TB HDD
Display(s)	ASUS ROG Swift PG27AQDM 240Hz OLED
Case	Silverstone Kublai KL-07
Audio Device(s)	Sound Blaster AE-9 MUSES Edition + Altec Lansing MX5021 2.1 Nichicon Gold
Power Supply	BeQuiet DarkPower 11 Pro 750W
Mouse	Logitech G502 Proteus Spectrum
Keyboard	UVI Pride MechaOptical
Software	Windows 11 Pro

System Name	RyzenGtEvo/ Asus strix scar II
Processor	Amd R5 5900X/ Intel 8750H
Motherboard	Crosshair hero8 impact/Asus
Cooling	360EK extreme rad+ 360$EK slim all push, cpu ek suprim Gpu full cover all EK
Memory	Gskill Trident Z 3900cas18 32Gb in four sticks./16Gb/16GB
Video Card(s)	Asus tuf RX7900XT /Rtx 2060
Storage	Silicon power 2TB nvme/8Tb external/1Tb samsung Evo nvme 2Tb sata ssd/1Tb nvme
Display(s)	Samsung UAE28"850R 4k freesync.dell shiter
Case	Lianli 011 dynamic/strix scar2
Audio Device(s)	Xfi creative 7.1 on board ,Yamaha dts av setup, corsair void pro headset
Power Supply	corsair 1200Hxi/Asus stock
Mouse	Roccat Kova/ Logitech G wireless
Keyboard	Roccat Aimo 120
VR HMD	Oculus rift
Software	Win 10 Pro
Benchmark Scores	laptop Timespy 6506

System Name	Main Stack
Processor	AMD Ryzen 7 9800X3D
Motherboard	Asus X870 ROG Strix-A - White
Cooling	Air (temporary until 9070xt blocks are available)
Memory	G. Skill Royal 2x24GB 6000Mhz C26
Video Card(s)	Powercolor Red Devil Radeon 9070XT 16G
Storage	Samsung 9100 Gen5 1TB \| Samsung 980 Pro 1TB (Games_1) \| Lexar NM790 2TB (Games_2)
Display(s)	Asus XG27ACDNG 360Hz QD-OLED \| Gigabyte M27Q-P 165Hz 1440P IPS \| LG 24" 1440 IPS 1440p
Case	HAVN HS420 - White
Audio Device(s)	FiiO K7 \| Sennheiser HD650 + Beyerdynamic FOX Mic
Power Supply	Corsair RM1000x ATX 3.1
Mouse	Razer Viper v3 Pro
Keyboard	Corsair K65 Plus 75% Wireless - USB Mode
Software	Windows 11 Pro 64-Bit

System Name	Apollo
Processor	Intel Core i9 9880H
Motherboard	Some proprietary Apple thing.
Memory	64GB DDR4-2667
Video Card(s)	AMD Radeon Pro 5600M, 8GB HBM2
Storage	1TB Apple NVMe, 2TB external SSD, 4TB external HDD for backup.
Display(s)	32" Dell UHD, 27" LG UHD, 28" LG 5k
Case	MacBook Pro (16", 2019)
Audio Device(s)	AirPods Pro, AirPods Max
Power Supply	Display or Thunderbolt 4 Hub
Mouse	Logitech G502
Keyboard	Logitech G915, GL Clicky
Software	MacOS 15.5