• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.
  • The forums have been upgraded with support for dark mode. By default it will follow the setting on your system/browser. You may override it by scrolling to the end of the page and clicking the gears icon.

Future AMD GPU Architecture to Implement BFloat16 Hardware

btarunr

Editor & Senior Moderator
Staff member
Joined
Oct 9, 2007
Messages
47,709 (7.42/day)
Location
Dublin, Ireland
System Name RBMK-1000
Processor AMD Ryzen 7 5700G
Motherboard Gigabyte B550 AORUS Elite V2
Cooling DeepCool Gammax L240 V2
Memory 2x 16GB DDR4-3200
Video Card(s) Galax RTX 4070 Ti EX
Storage Samsung 990 1TB
Display(s) BenQ 1440p 60 Hz 27-inch
Case Corsair Carbide 100R
Audio Device(s) ASUS SupremeFX S1220A
Power Supply Cooler Master MWE Gold 650W
Mouse ASUS ROG Strix Impact
Keyboard Gamdias Hermes E2
Software Windows 11 Pro
A future AMD graphics architecture could implement BFloat16 floating point capability on the silicon. Updates to AMD's ROCm libraries on GitHub dropped a big hint as to the company implementing the compute standard, which has significant advantages over FP16 that's implemented by current-gen AMD GPUs. BFloat16 offers a significantly higher range than FP16, which caps out at just 6.55 x 10^4, forcing certain AI researchers to "fallback" to the relatively inefficient FP32 math hardware. BFloat16 uses three fewer significand bits than FP16 (8 bits versus 11 bits), offering 8 exponent bits, while FP16 only offers 5 bits. BFloat16 is more resilient to overflow and underflow in conversions to FP32 than FP16 is, since BFloat16 is essentially a truncated FP32. The addition of BFloat16 is more of a "future-proofing" measure by AMD. Atomic operations in modern 3D game rendering are unlikely to benefit from BFloat16 in comparison to FP16. BFloat16, however, will pay huge dividends to the AI machine-learning community.



View at TechPowerUp Main Site
 
i read the title future amd gpu architecture to implement bloatware ...
 
Guys. Topic. Find it or take a break.
 
I don't think Nvidia has announced support for bfloat16 yet, but considering both Arm and Intel have announced they will be including it, it's just a matter of time before Nvidia announces support as well.
The real question is when will it be implemented?
 
Is there a plain English translation and explanation of this article please?
 
I don't think Nvidia has announced support for bfloat16 yet, but considering both Arm and Intel have announced they will be including it, it's just a matter of time before Nvidia announces support as well.
The real question is when will it be implemented?
Nvidia is big GPUs with CUDA and making their entry with Tensor cores. So maybe they feel safe with what they got?
 
Is there a plain English translation and explanation of this article please?
The topic does not lend well to easy explanations.


tl;dr
Same number of exponent bits as float32 but reduced amount of mantissa bits. This means same range but reduced resolution/precision.
Benefits are easier conversion from bfloat16 to float32 due to same exponent length, less area cost of implementing in hardware compared to float32 (and supposedly compared to float16) etc.
All this is mainly useful for machine learning stuff at this point basically because that does not have high precision requirements.

Edit:
And reading it again, I am just mostly just rephrasing the article :)
 
Last edited:
The topic does not lend well to easy explanations.


tl;dr
Same number of exponent bits as float32 but reduced amount of mantissa bits. This means same range but reduced resolution/precision.
Benefits are easier conversion from bfloat16 to float32 due to same exponent length, less area cost of implementing in hardware compared to float32 (and supposedly compared to float16) etc.
All this is mainly useful for machine learning stuff at this point basically because that does not have high precision requirements.

Edit:
And reading it again, I am just mostly just rephrasing the article :)

AMD are supporting in GPU hardware a new numerical datatype that trades precision for increased performance. A number of common Machine Learning algorithms do not actually require high-precision math to be effective so use of bfloat16 enables increased performance by essentially allowing twice the numerical calculation thought put as compared to float32. Intel already have support for bfloat16 as part of the upcoming Cooper Lake generation of processors and Google use it as part of their Tensor Processing Units. At some point bfloat16 will appear in AMD CPU's as well. A nice article is available at: https://www.nextplatform.com/2019/07/15/intel-prepares-to-graft-googles-bfloat16-onto-processors/
 
Is there a plain English translation and explanation of this article please?
This:
BFloat16 uses three fewer significand bits than FP16 (8 bits versus 11 bits), offering 8 exponent bits, while FP16 only offers 5 bits.
Significand are like basic number such as 123456.
Exponent are 10^123456 power.

Bfloat16 gives you more precision (11 bit significand) in your value but less range (5 bit exponent) than float16 (8 bit significand and 8 bit exponent). Said differently: Bfloat16 is better for values close to zero than float16 is.
 
Back
Top