• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

Future AMD GPU Architecture to Implement BFloat16 Hardware

btarunr

Editor & Senior Moderator
Staff member
Joined
Oct 9, 2007
Messages
46,349 (7.68/day)
Location
Hyderabad, India
System Name RBMK-1000
Processor AMD Ryzen 7 5700G
Motherboard ASUS ROG Strix B450-E Gaming
Cooling DeepCool Gammax L240 V2
Memory 2x 8GB G.Skill Sniper X
Video Card(s) Palit GeForce RTX 2080 SUPER GameRock
Storage Western Digital Black NVMe 512GB
Display(s) BenQ 1440p 60 Hz 27-inch
Case Corsair Carbide 100R
Audio Device(s) ASUS SupremeFX S1220A
Power Supply Cooler Master MWE Gold 650W
Mouse ASUS ROG Strix Impact
Keyboard Gamdias Hermes E2
Software Windows 11 Pro
A future AMD graphics architecture could implement BFloat16 floating point capability on the silicon. Updates to AMD's ROCm libraries on GitHub dropped a big hint as to the company implementing the compute standard, which has significant advantages over FP16 that's implemented by current-gen AMD GPUs. BFloat16 offers a significantly higher range than FP16, which caps out at just 6.55 x 10^4, forcing certain AI researchers to "fallback" to the relatively inefficient FP32 math hardware. BFloat16 uses three fewer significand bits than FP16 (8 bits versus 11 bits), offering 8 exponent bits, while FP16 only offers 5 bits. BFloat16 is more resilient to overflow and underflow in conversions to FP32 than FP16 is, since BFloat16 is essentially a truncated FP32. The addition of BFloat16 is more of a "future-proofing" measure by AMD. Atomic operations in modern 3D game rendering are unlikely to benefit from BFloat16 in comparison to FP16. BFloat16, however, will pay huge dividends to the AI machine-learning community.



View at TechPowerUp Main Site
 
Joined
Feb 3, 2017
Messages
3,481 (1.32/day)
Processor R5 5600X
Motherboard ASUS ROG STRIX B550-I GAMING
Cooling Alpenföhn Black Ridge
Memory 2*16GB DDR4-2666 VLP @3800
Video Card(s) EVGA Geforce RTX 3080 XC3
Storage 1TB Samsung 970 Pro, 2TB Intel 660p
Display(s) ASUS PG279Q, Eizo EV2736W
Case Dan Cases A4-SFX
Power Supply Corsair SF600
Mouse Corsair Ironclaw Wireless RGB
Keyboard Corsair K60
VR HMD HTC Vive
Joined
Aug 12, 2019
Messages
1,716 (1.00/day)
Location
LV-426
System Name Custom
Processor i9 9900k
Motherboard Gigabyte Z390 arous master
Cooling corsair h150i
Memory 4x8 3200mhz corsair
Video Card(s) Galax RTX 3090 EX Gamer White OC
Storage 500gb Samsung 970 Evo PLus
Display(s) MSi MAG341CQ
Case Lian Li Pc-011 Dynamic
Audio Device(s) Arctis Pro Wireless
Power Supply 850w Seasonic Focus Platinum
Mouse Logitech G403
Keyboard Logitech G110
i read the title future amd gpu architecture to implement bloatware ...
 

Ahhzz

Moderator
Staff member
Joined
Feb 27, 2008
Messages
8,737 (1.48/day)
System Name OrangeHaze / Silence
Processor i7-13700KF / i5-10400 /
Motherboard ROG STRIX Z690-E / MSI Z490 A-Pro Motherboard
Cooling Corsair H75 / TT ToughAir 510
Memory 64Gb GSkill Trident Z5 / 32GB Team Dark Za 3600
Video Card(s) Palit GeForce RTX 2070 / Sapphire R9 290 Vapor-X 4Gb
Storage Hynix Plat P41 2Tb\Samsung MZVL21 1Tb / Samsung 980 Pro 1Tb
Display(s) 22" Dell Wide/24" Asus
Case Lian Li PC-101 ATX custom mod / Antec Lanboy Air Black & Blue
Audio Device(s) SB Audigy 7.1
Power Supply Corsair Enthusiast TX750
Mouse Logitech G502 Lightspeed Wireless / Logitech G502 Proteus Spectrum
Keyboard K68 RGB — CHERRY® MX Red
Software Win10 Pro \ RIP:Win 7 Ult 64 bit
Guys. Topic. Find it or take a break.
 
Joined
Jun 12, 2017
Messages
184 (0.07/day)
System Name Linotosh
Processor Dual 800mhz G4
Cooling Air
Memory 1.5 GB
I don't think Nvidia has announced support for bfloat16 yet, but considering both Arm and Intel have announced they will be including it, it's just a matter of time before Nvidia announces support as well.
The real question is when will it be implemented?
 
Joined
Apr 19, 2011
Messages
2,198 (0.46/day)
Location
So. Cal.
Nice to know the only antagonists... hold a place of honor :mad:
 
Joined
Sep 15, 2011
Messages
6,465 (1.41/day)
Processor Intel® Core™ i7-13700K
Motherboard Gigabyte Z790 Aorus Elite AX
Cooling Noctua NH-D15
Memory 32GB(2x16) DDR5@6600MHz G-Skill Trident Z5
Video Card(s) ZOTAC GAMING GeForce RTX 3080 AMP Holo
Storage 2TB SK Platinum P41 SSD + 4TB SanDisk Ultra SSD + 500GB Samsung 840 EVO SSD
Display(s) Acer Predator X34 3440x1440@100Hz G-Sync
Case NZXT PHANTOM410-BK
Audio Device(s) Creative X-Fi Titanium PCIe
Power Supply Corsair 850W
Mouse Logitech Hero G502 SE
Software Windows 11 Pro - 64bit
Benchmark Scores 30FPS in NFS:Rivals
Is there a plain English translation and explanation of this article please?
 
Joined
Mar 11, 2019
Messages
61 (0.03/day)
Location
Germany
System Name New Horyzen
Processor Ryzen 7 1700X
Motherboard ASRock Fatal1ty X370 Gaming K4
Cooling Noctua U-12S
Memory 2x 8GB Corsair Vengance LPX 3200MHz
Video Card(s) Sapphire 5700XT
Storage A lot
Case Modded Corsair Carbide 300R
Audio Device(s) ESI Maya44 eX, Focusrite Scarlett 2i4 2nd, Samson Meteor, Mixer and Headphone Amp
Power Supply SeaSonic M12II 750W
Mouse Logitech G502
Keyboard HyperX Alloy & Logitech G13
Software All of it
I don't think Nvidia has announced support for bfloat16 yet, but considering both Arm and Intel have announced they will be including it, it's just a matter of time before Nvidia announces support as well.
The real question is when will it be implemented?
Nvidia is big GPUs with CUDA and making their entry with Tensor cores. So maybe they feel safe with what they got?
 
Joined
Feb 3, 2017
Messages
3,481 (1.32/day)
Processor R5 5600X
Motherboard ASUS ROG STRIX B550-I GAMING
Cooling Alpenföhn Black Ridge
Memory 2*16GB DDR4-2666 VLP @3800
Video Card(s) EVGA Geforce RTX 3080 XC3
Storage 1TB Samsung 970 Pro, 2TB Intel 660p
Display(s) ASUS PG279Q, Eizo EV2736W
Case Dan Cases A4-SFX
Power Supply Corsair SF600
Mouse Corsair Ironclaw Wireless RGB
Keyboard Corsair K60
VR HMD HTC Vive
Is there a plain English translation and explanation of this article please?
The topic does not lend well to easy explanations.


tl;dr
Same number of exponent bits as float32 but reduced amount of mantissa bits. This means same range but reduced resolution/precision.
Benefits are easier conversion from bfloat16 to float32 due to same exponent length, less area cost of implementing in hardware compared to float32 (and supposedly compared to float16) etc.
All this is mainly useful for machine learning stuff at this point basically because that does not have high precision requirements.

Edit:
And reading it again, I am just mostly just rephrasing the article :)
 
Last edited:

JohnWal

New Member
Joined
Oct 7, 2019
Messages
9 (0.01/day)
The topic does not lend well to easy explanations.


tl;dr
Same number of exponent bits as float32 but reduced amount of mantissa bits. This means same range but reduced resolution/precision.
Benefits are easier conversion from bfloat16 to float32 due to same exponent length, less area cost of implementing in hardware compared to float32 (and supposedly compared to float16) etc.
All this is mainly useful for machine learning stuff at this point basically because that does not have high precision requirements.

Edit:
And reading it again, I am just mostly just rephrasing the article :)

AMD are supporting in GPU hardware a new numerical datatype that trades precision for increased performance. A number of common Machine Learning algorithms do not actually require high-precision math to be effective so use of bfloat16 enables increased performance by essentially allowing twice the numerical calculation thought put as compared to float32. Intel already have support for bfloat16 as part of the upcoming Cooper Lake generation of processors and Google use it as part of their Tensor Processing Units. At some point bfloat16 will appear in AMD CPU's as well. A nice article is available at: https://www.nextplatform.com/2019/07/15/intel-prepares-to-graft-googles-bfloat16-onto-processors/
 

FordGT90Concept

"I go fast!1!11!1!"
Joined
Oct 13, 2008
Messages
26,259 (4.64/day)
Location
IA, USA
System Name BY-2021
Processor AMD Ryzen 7 5800X (65w eco profile)
Motherboard MSI B550 Gaming Plus
Cooling Scythe Mugen (rev 5)
Memory 2 x Kingston HyperX DDR4-3200 32 GiB
Video Card(s) AMD Radeon RX 7900 XT
Storage Samsung 980 Pro, Seagate Exos X20 TB 7200 RPM
Display(s) Nixeus NX-EDG274K (3840x2160@144 DP) + Samsung SyncMaster 906BW (1440x900@60 HDMI-DVI)
Case Coolermaster HAF 932 w/ USB 3.0 5.25" bay + USB 3.2 (A+C) 3.5" bay
Audio Device(s) Realtek ALC1150, Micca OriGen+
Power Supply Enermax Platimax 850w
Mouse Nixeus REVEL-X
Keyboard Tesoro Excalibur
Software Windows 10 Home 64-bit
Benchmark Scores Faster than the tortoise; slower than the hare.
Is there a plain English translation and explanation of this article please?
This:
BFloat16 uses three fewer significand bits than FP16 (8 bits versus 11 bits), offering 8 exponent bits, while FP16 only offers 5 bits.
Significand are like basic number such as 123456.
Exponent are 10^123456 power.

Bfloat16 gives you more precision (11 bit significand) in your value but less range (5 bit exponent) than float16 (8 bit significand and 8 bit exponent). Said differently: Bfloat16 is better for values close to zero than float16 is.
 
Top