- Joined
- Nov 11, 2016
- Messages
- 3,065 (1.13/day)
System Name | The de-ploughminator Mk-II |
---|---|
Processor | i7 13700KF |
Motherboard | MSI Z790 Carbon |
Cooling | ID-Cooling SE-226-XT + Phanteks T30 |
Memory | 2x16GB G.Skill DDR5 7200Cas34 |
Video Card(s) | Asus RTX4090 TUF |
Storage | Kingston KC3000 2TB NVME |
Display(s) | LG OLED CX48" |
Case | Corsair 5000D Air |
Power Supply | Corsair HX850 |
Mouse | Razor Viper Ultimate |
Keyboard | Corsair K75 |
Software | win11 |
NVidia A100 (the $10,000 server card) is only 19.5 FP32 TFlops: https://www.nvidia.com/content/dam/en-zz/Solutions/Data-Center/a100/pdf/nvidia-a100-datasheet.pdf
And only 9.7 FP64 TFlops.
The Tensor-flops are an elevated number that only deep-learning folk care about (and apparently not all deep learning folk are using those tensor cores). Achieving ~20 FP32 TFlops general-purpose code is basically the best today (MI100 is a little bit faster, but without as much of that NVlink thing going on).
So 45 TFlops of FP32 is pretty huge by today's standards. However, Intel is going to be competing against the next-generation products, not the A100. I'm sure NVidia is going to grow, but 45TFlops per card is probably going to be competitive.
Nope, RTX 3090 has ~36TFLOPS of FP32, Tensor TFLOPS is something like INT4 or INT8, obviously A100 is designed for different type of workload that don't depend on FP32 or FP64 so much. The workstation Ampere A6000 has 40 TFLOPS of FP32, I guess Nvidia doesn't care about FP64 performance anymore after Titan X Maxwell