Tuesday, October 22nd 2019

Future AMD GPU Architecture to Implement BFloat16 Hardware

A future AMD graphics architecture could implement BFloat16 floating point capability on the silicon. Updates to AMD's ROCm libraries on GitHub dropped a big hint as to the company implementing the compute standard, which has significant advantages over FP16 that's implemented by current-gen AMD GPUs. BFloat16 offers a significantly higher range than FP16, which caps out at just 6.55 x 10^4, forcing certain AI researchers to "fallback" to the relatively inefficient FP32 math hardware. BFloat16 uses three fewer significand bits than FP16 (8 bits versus 11 bits), offering 8 exponent bits, while FP16 only offers 5 bits. BFloat16 is more resilient to overflow and underflow in conversions to FP32 than FP16 is, since BFloat16 is essentially a truncated FP32. The addition of BFloat16 is more of a "future-proofing" measure by AMD. Atomic operations in modern 3D game rendering are unlikely to benefit from BFloat16 in comparison to FP16. BFloat16, however, will pay huge dividends to the AI machine-learning community.
Sources: ROCm (Github), dylan522p (Reddit), Dr Nick Higham
Add your own comment

12 Comments on Future AMD GPU Architecture to Implement BFloat16 Hardware

#2
Hyderz
i read the title future amd gpu architecture to implement bloatware ...
Posted on Reply
#3
Mysteoa
Hyderz
i read the title future amd gpu architecture to implement bloatware ...
I liked this.
Posted on Reply
#4
ChosenName
Makes sense if range is more important than resolution.
Posted on Reply
#5
Ahhzz
Guys. Topic. Find it or take a break.
Posted on Reply
#6
Hardware Geek
I don't think Nvidia has announced support for bfloat16 yet, but considering both Arm and Intel have announced they will be including it, it's just a matter of time before Nvidia announces support as well.
The real question is when will it be implemented?
Posted on Reply
#7
Casecutter
Nice to know the only antagonists... hold a place of honor :mad:
Posted on Reply
#8
Prima.Vera
Is there a plain English translation and explanation of this article please?
Posted on Reply
#9
MazeFrame
Hardware Geek
I don't think Nvidia has announced support for bfloat16 yet, but considering both Arm and Intel have announced they will be including it, it's just a matter of time before Nvidia announces support as well.
The real question is when will it be implemented?
Nvidia is big GPUs with CUDA and making their entry with Tensor cores. So maybe they feel safe with what they got?
Posted on Reply
#10
londiste
Prima.Vera
Is there a plain English translation and explanation of this article please?
The topic does not lend well to easy explanations.

https://en.wikipedia.org/wiki/Bfloat16_floating-point_format

tl;dr
Same number of exponent bits as float32 but reduced amount of mantissa bits. This means same range but reduced resolution/precision.
Benefits are easier conversion from bfloat16 to float32 due to same exponent length, less area cost of implementing in hardware compared to float32 (and supposedly compared to float16) etc.
All this is mainly useful for machine learning stuff at this point basically because that does not have high precision requirements.

Edit:
And reading it again, I am just mostly just rephrasing the article :)
Posted on Reply
#11
JohnWal
londiste
The topic does not lend well to easy explanations.

https://en.wikipedia.org/wiki/Bfloat16_floating-point_format

tl;dr
Same number of exponent bits as float32 but reduced amount of mantissa bits. This means same range but reduced resolution/precision.
Benefits are easier conversion from bfloat16 to float32 due to same exponent length, less area cost of implementing in hardware compared to float32 (and supposedly compared to float16) etc.
All this is mainly useful for machine learning stuff at this point basically because that does not have high precision requirements.

Edit:
And reading it again, I am just mostly just rephrasing the article :)
AMD are supporting in GPU hardware a new numerical datatype that trades precision for increased performance. A number of common Machine Learning algorithms do not actually require high-precision math to be effective so use of bfloat16 enables increased performance by essentially allowing twice the numerical calculation thought put as compared to float32. Intel already have support for bfloat16 as part of the upcoming Cooper Lake generation of processors and Google use it as part of their Tensor Processing Units. At some point bfloat16 will appear in AMD CPU's as well. A nice article is available at: https://www.nextplatform.com/2019/07/15/intel-prepares-to-graft-googles-bfloat16-onto-processors/
Posted on Reply
#12
FordGT90Concept
"I go fast!1!11!1!"
Prima.Vera
Is there a plain English translation and explanation of this article please?
This:
btarunr
BFloat16 uses three fewer significand bits than FP16 (8 bits versus 11 bits), offering 8 exponent bits, while FP16 only offers 5 bits.
Significand are like basic number such as 123456.
Exponent are 10^123456 power.

Bfloat16 gives you more precision (11 bit significand) in your value but less range (5 bit exponent) than float16 (8 bit significand and 8 bit exponent). Said differently: Bfloat16 is better for values close to zero than float16 is.
Posted on Reply
Add your own comment