• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

AMD Announces CDNA Architecture. Radeon MI100 is the World's Fastest HPC Accelerator

btarunr

Editor & Senior Moderator
Staff member
Joined
Oct 9, 2007
Messages
46,277 (7.69/day)
Location
Hyderabad, India
System Name RBMK-1000
Processor AMD Ryzen 7 5700G
Motherboard ASUS ROG Strix B450-E Gaming
Cooling DeepCool Gammax L240 V2
Memory 2x 8GB G.Skill Sniper X
Video Card(s) Palit GeForce RTX 2080 SUPER GameRock
Storage Western Digital Black NVMe 512GB
Display(s) BenQ 1440p 60 Hz 27-inch
Case Corsair Carbide 100R
Audio Device(s) ASUS SupremeFX S1220A
Power Supply Cooler Master MWE Gold 650W
Mouse ASUS ROG Strix Impact
Keyboard Gamdias Hermes E2
Software Windows 11 Pro
AMD today announced the new AMD Instinct MI100 accelerator - the world's fastest HPC GPU and the first x86 server GPU to surpass the 10 teraflops (FP64) performance barrier. Supported by new accelerated compute platforms from Dell, Gigabyte, HPE, and Supermicro, the MI100, combined with AMD EPYC CPUs and the ROCm 4.0 open software platform, is designed to propel new discoveries ahead of the exascale era.

Built on the new AMD CDNA architecture, the AMD Instinct MI100 GPU enables a new class of accelerated systems for HPC and AI when paired with 2nd Gen AMD EPYC processors. The MI100 offers up to 11.5 TFLOPS of peak FP64 performance for HPC and up to 46.1 TFLOPS peak FP32 Matrix performance for AI and machine learning workloads. With new AMD Matrix Core technology, the MI100 also delivers a nearly 7x boost in FP16 theoretical peak floating point performance for AI training workloads compared to AMD's prior generation accelerators.



"Today AMD takes a major step forward in the journey toward exascale computing as we unveil the AMD Instinct MI100 - the world's fastest HPC GPU," said Brad McCredie, corporate vice president, Data Center GPU and Accelerated Processing, AMD. "Squarely targeted toward the workloads that matter in scientific computing, our latest accelerator, when combined with the AMD ROCm open software platform, is designed to provide scientists and researchers a superior foundation for their work in HPC."



Open Software Platform for the Exascale Era
The AMD ROCm developer software provides the foundation for exascale computing. As an open source toolset consisting of compilers, programming APIs and libraries, ROCm is used by exascale software developers to create high performance applications. ROCm 4.0 has been optimized to deliver performance at scale for MI100-based systems. ROCm 4.0 has upgraded the compiler to be open source and unified to support both OpenMP 5.0 and HIP. PyTorch and Tensorflow frameworks, which have been optimized with ROCm 4.0, can now achieve higher performance with MI100. ROCm 4.0 is the latest offering for HPC, ML and AI application developers which allows them to create performance portable software.

"We've received early access to the MI100 accelerator, and the preliminary results are very encouraging. We've typically seen significant performance boosts, up to 2-3x compared to other GPUs," said Bronson Messer, director of science, Oak Ridge Leadership Computing Facility. "What's also important to recognize is the impact software has on performance. The fact that the ROCm open software platform and HIP developer tool are open source and work on a variety of platforms, it is something that we have been absolutely almost obsessed with since we fielded the very first hybrid CPU/GPU system."

Key capabilities and features of the AMD Instinct MI100 accelerator include:
  • All-New AMD CDNA Architecture- Engineered to power AMD GPUs for the exascale era and at the heart of the MI100 accelerator, the AMD CDNA architecture offers exceptional performance and power efficiency
  • Leading FP64 and FP32 Performance for HPC Workloads - Delivers industry leading 11.5 TFLOPS peak FP64 performance and 23.1 TFLOPS peak FP32 performance, enabling scientists and researchers across the globe to accelerate discoveries in industries including life sciences, energy, finance, academics, government, defense and more.
  • All-New Matrix Core Technology for HPC and AI - Supercharged performance for a full range of single and mixed precision matrix operations, such as FP32, FP16, bFloat16, Int8 and Int4, engineered to boost the convergence of HPC and AI.
  • 2nd Gen AMD Infinity Fabric Technology - Instinct MI100 provides ~2x the peer-to-peer (P2P) peak I/O bandwidth over PCIe 4.0 with up to 340 GB/s of aggregate bandwidth per card with three AMD Infinity Fabric Links.4 In a server, MI100 GPUs can be configured with up to two fully-connected quad GPU hives, each providing up to 552 GB/s of P2P I/O bandwidth for fast data sharing.
  • Ultra-Fast HBM2 Memory- Features 32 GB High-bandwidth HBM2 memory at a clock rate of 1.2 GHz and delivers an ultra-high 1.23 TB/s of memory bandwidth to support large data sets and help eliminate bottlenecks in moving data in and out of memory.5
  • Support for Industry's Latest PCIe Gen 4.0 - Designed with the latest PCIe Gen 4.0 technology support providing up to 64 GB/s peak theoretical transport data bandwidth from CPU to GPU.
Available Server Solutions
The AMD Instinct MI100 accelerators are expected by end of the year in systems from major OEM and ODM partners in the enterprise markets, including:

Dell
"Dell EMC PowerEdge servers will support the new AMD Instinct MI100, which will enable faster insights from data. This would help our customers achieve more robust and efficient HPC and AI results rapidly," said Ravi Pendekanti, Senior Vice President, PowerEdge Servers, Dell Technologies. "AMD has been a valued partner in our support for advancing innovation in the data center. The high-performance capabilities of AMD Instinct accelerators are a natural fit for our PowerEdge server AI & HPC portfolio."

Gigabyte
"We're pleased to again work with AMD as a strategic partner offering customers server hardware for high performance computing," said Alan Chen, Assistant Vice President in NCBU, GIGABYTE. "AMD Instinct MI100 accelerators represent the next level of high-performance computing in the data center, bringing greater connectivity and data bandwidth for energy research, molecular dynamics, and deep learning training. As a new accelerator in the GIGABYTE portfolio, our customers can look to benefit from improved performance across a range of scientific and industrial HPC workloads."

Hewlett Packard Enterprise (HPE)
"Customers use HPE Apollo systems for purpose-built capabilities and performance to tackle a range of complex, data-intensive workloads across high-performance computing (HPC), deep learning and analytics," said Bill Mannel, vice president and general manager, HPC at HPE. "With the introduction of the new HPE Apollo 6500 Gen10 Plus system, we are further advancing our portfolio to improve workload performance by supporting the new AMD Instinct MI100 accelerator, which enables greater connectivity and data processing, alongside the 2nd Gen AMD EPYC processor and. We look forward to continuing our collaboration with AMD to expand our offerings with its latest CPUs and accelerators."

Supermicro
"We're excited that AMD has produced the world's fastest HPC GPU accelerator. The combination of the compute power gained with the new AMD CDNA architecture, along with the high memory and GPU peer-to-peer bandwidth the MI100 brings, our customers will get access to great solutions that will meet their accelerated compute requirements. Add the open AMD ROCm software stack, and they will get an open, flexible and portable environment to meet their demand for exceptional application support for critical enterprise workloads," said Vik Malyala, senior vice president, field application engineering and business development, Supermicro. "The AMD Instinct MI100 will be a great addition for our multi-GPU servers and our suite of high-performance systems."



The complete slide-deck follows.



For more information, visit the product page.

View at TechPowerUp Main Site
 
Joined
Jan 8, 2017
Messages
8,862 (3.36/day)
System Name Good enough
Processor AMD Ryzen R9 7900 - Alphacool Eisblock XPX Aurora Edge
Motherboard ASRock B650 Pro RS
Cooling 2x 360mm NexXxoS ST30 X-Flow, 1x 360mm NexXxoS ST30, 1x 240mm NexXxoS ST30
Memory 32GB - FURY Beast RGB 5600 Mhz
Video Card(s) Sapphire RX 7900 XT - Alphacool Eisblock Aurora
Storage 1x Kingston KC3000 1TB 1x Kingston A2000 1TB, 1x Samsung 850 EVO 250GB , 1x Samsung 860 EVO 500GB
Display(s) LG UltraGear 32GN650-B + 4K Samsung TV
Case Phanteks NV7
Power Supply GPS-750C
This is going to be insanely fast for general purpose computing, 300W is also surprisingly low. I don't really understand what they are comparing in the context of A100, they say "mixed precision" but the footnotes quote the the FP16/32/64 throughput.
 
Joined
Mar 18, 2008
Messages
5,717 (0.98/day)
System Name Virtual Reality / Bioinformatics
Processor Undead CPU
Motherboard Undead TUF X99
Cooling Noctua NH-D15
Memory GSkill 128GB DDR4-3000
Video Card(s) EVGA RTX 3090 FTW3 Ultra
Storage Samsung 960 Pro 1TB + 860 EVO 2TB + WD Black 5TB
Display(s) 32'' 4K Dell
Case Fractal Design R5
Audio Device(s) BOSE 2.0
Power Supply Seasonic 850watt
Mouse Logitech Master MX
Keyboard Corsair K70 Cherry MX Blue
VR HMD HTC Vive + Oculus Quest 2
Software Windows 10 P
Good they put lots of emphasis on software support. I wonder how good ROCm is nowadays. The bad effect of open source everything is fragmentation.
 
Joined
Apr 24, 2020
Messages
2,518 (1.75/day)
1. GCN-style SIMD16 across 4-clock cycles (64-sized wavefronts). "Based on Vega instead of RDNA" seems to be true.

2. Matrix operations, catching up (and beating) NVidia A100 in raw FP16 matrix-operations. Support for bfloat16 (an alternative FP16 format, more bits to the exponent, more similar to FP32 exponents)

3. FP32 matrix operation support (!!!!).

Already tested to a significant degree.

The AMD Instinct™ MI100 GPU is built to accelerate today’s most demanding HPC and AI workloads. Oak Ridge National Laboratory tested their exascale science codes on the MI100 as they ramp users to take advantage of the upcoming exascale Frontier system. Some of the performance results ranged from 1.4x faster to 3x faster performance compared to a node with V100. In the case of CHOLLA, an astrophysics application, the code was ported from CUDA to AMD ROCm™ in just an afternoon while enjoying 1.4x performance boost over V100.

This thing is a compute beast. Its sad though that no consumer equivalent seems to exist: this MI100 is probably a $5000 or maybe even a $10,000 part.
 
Joined
Jul 3, 2019
Messages
300 (0.17/day)
Location
Bulgaria
Processor 6700K
Motherboard M8G
Cooling D15S
Memory 16GB 3k15
Video Card(s) 2070S
Storage 850 Pro
Display(s) U2410
Case Core X2
Audio Device(s) ALC1150
Power Supply Seasonic
Mouse Razer
Keyboard Logitech
Software 21H2
They dropped Radeon from the naming it seems.
It's interesting to me, how small they are relative to their competitors, yet spitting products left and right.
 
Last edited:
Joined
Apr 24, 2020
Messages
2,518 (1.75/day)
good for miner, bad for us

Miners don't need matrix-multiplication or double-precision floating point support.

This is for supercomputers, scientific compute, and finally "Deep Learning" applications.

Good they put lots of emphasis on software support. I wonder how good ROCm is nowadays. The bad effect of open source everything is fragmentation.

ROCm / HIP is pretty decent in my experience. The only problem with HIP is that AMD supports its MI-line of cards with HIP, not really their gaming cards. So RDNA / RDNA2 probably won't see much support, if at all.

Vega and Radeon VII get really good support, because they're similar to MI25 or MI50. But without an RDNA card in the MI line, it means that Navi / Navi 2x will probably not get very good support at all from ROCm/HIP.
 
Joined
Dec 16, 2017
Messages
2,721 (1.19/day)
Location
Buenos Aires, Argentina
System Name System V
Processor AMD Ryzen 5 3600
Motherboard Asus Prime X570-P
Cooling Cooler Master Hyper 212 // a bunch of 120 mm Xigmatek 1500 RPM fans (2 ins, 3 outs)
Memory 2x8GB Ballistix Sport LT 3200 MHz (BLS8G4D32AESCK.M8FE) (CL16-18-18-36)
Video Card(s) Gigabyte AORUS Radeon RX 580 8 GB
Storage SHFS37A240G / DT01ACA200 / WD20EZRX / MKNSSDTR256GB-3DL / LG BH16NS40 / ST10000VN0008
Display(s) LG 22MP55 IPS Display
Case NZXT Source 210
Audio Device(s) Logitech G430 Headset
Power Supply Corsair CX650M
Mouse Microsoft Trackball Optical 1.0
Keyboard HP Vectra VE keyboard (Part # D4950-63004)
Software Whatever build of Windows 11 is being served in Dev channel at the time.
Benchmark Scores Corona 1.3: 3120620 r/s Cinebench R20: 3355 FireStrike: 12490 TimeSpy: 4624
They dropped Radeon from the naming it seems.

Indeed. According to some marketing docs that came out a little while ago, these new products are considered as AMD Instinct processors/GPUs, no Radeon brand in sight. Seems like a good move, is you ask me.
 
Joined
Mar 18, 2008
Messages
5,717 (0.98/day)
System Name Virtual Reality / Bioinformatics
Processor Undead CPU
Motherboard Undead TUF X99
Cooling Noctua NH-D15
Memory GSkill 128GB DDR4-3000
Video Card(s) EVGA RTX 3090 FTW3 Ultra
Storage Samsung 960 Pro 1TB + 860 EVO 2TB + WD Black 5TB
Display(s) 32'' 4K Dell
Case Fractal Design R5
Audio Device(s) BOSE 2.0
Power Supply Seasonic 850watt
Mouse Logitech Master MX
Keyboard Corsair K70 Cherry MX Blue
VR HMD HTC Vive + Oculus Quest 2
Software Windows 10 P
Miners don't need matrix-multiplication or double-precision floating point support.

This is for supercomputers, scientific compute, and finally "Deep Learning" applications.



ROCm / HIP is pretty decent in my experience. The only problem with HIP is that AMD supports its MI-line of cards with HIP, not really their gaming cards. So RDNA / RDNA2 probably won't see much support, if at all.

Vega and Radeon VII get really good support, because they're similar to MI25 or MI50. But without an RDNA card in the MI line, it means that Navi / Navi 2x will probably not get very good support at all from ROCm/HIP.


One of my colleague bought a pair of MI50 to put in her buy-in nodes. It was a PITA to get it working: first the University's cluster work scheduler wont recognize the 2 new installed GPU and even basic testing scripts would fail. After some adjustments and online search she was able to sort it out along with over 100hrs paid to the University's HPC support team for customized support. She tried contacting AMD directly and she only got email response like every week. That was just the beginning. Trying to compile anything written for CUDA (which is the vast majority of bioinformatics) was a PITA. She started off having a postdoc working on it, in the end the project was given to a group of computer science undergrads to figure out. Only took 9 months, meanwhile very little can be done with them. In the end they were able to get some work done on those MI50, but man, it was a shit show from the software point of view. SO MANY F*CKING BUGS!. That experience, versus the super clean and polished CUDA experience on Nvidia HPC hardware AND software eventually drove her to purchase some V100s in the end.

For research teams who basically thrive on trouble shooting, or writing up a custom platform on AMD ROCm and GPU computing it would make sense. For research team that just need 0 drama and consistent output, unfortunately CUDA and Nvidia is the only player able to deliver.

With AMD getting more profit, I would hope they would put more and more emphasis on the software side. Hardware is less than 50% of the problems.
 
Joined
Dec 16, 2017
Messages
2,721 (1.19/day)
Location
Buenos Aires, Argentina
System Name System V
Processor AMD Ryzen 5 3600
Motherboard Asus Prime X570-P
Cooling Cooler Master Hyper 212 // a bunch of 120 mm Xigmatek 1500 RPM fans (2 ins, 3 outs)
Memory 2x8GB Ballistix Sport LT 3200 MHz (BLS8G4D32AESCK.M8FE) (CL16-18-18-36)
Video Card(s) Gigabyte AORUS Radeon RX 580 8 GB
Storage SHFS37A240G / DT01ACA200 / WD20EZRX / MKNSSDTR256GB-3DL / LG BH16NS40 / ST10000VN0008
Display(s) LG 22MP55 IPS Display
Case NZXT Source 210
Audio Device(s) Logitech G430 Headset
Power Supply Corsair CX650M
Mouse Microsoft Trackball Optical 1.0
Keyboard HP Vectra VE keyboard (Part # D4950-63004)
Software Whatever build of Windows 11 is being served in Dev channel at the time.
Benchmark Scores Corona 1.3: 3120620 r/s Cinebench R20: 3355 FireStrike: 12490 TimeSpy: 4624
With that kind of experience, AMD definitely better invest a lot of money into this... Still, damn, that's a terrible experience.
 
Joined
Jan 4, 2013
Messages
1,151 (0.28/day)
Location
Denmark
System Name R9 5950x/Skylake 6400
Processor R9 5950x/i5 6400
Motherboard Gigabyte Aorus Master X570/Asus Z170 Pro Gaming
Cooling Arctic Liquid Freezer II 360/Stock
Memory 4x8GB Patriot PVS416G4440 CL14/G.S Ripjaws 32 GB F4-3200C16D-32GV
Video Card(s) 7900XTX/6900XT
Storage RIP Seagate 530 4TB (died after 7 months), WD SN850 2TB, Aorus 2TB, Corsair MP600 1TB / 960 Evo 1TB
Display(s) 3x LG 27gl850 1440p
Case Custom builds
Audio Device(s) -
Power Supply Silverstone 1000watt modular Gold/1000Watt Antec
Software Win11pro/win10pro / Win10 Home / win7 / wista 64 bit and XPpro
Wonder if we will se a Vega bad binning spin off again?
 
Joined
Apr 24, 2020
Messages
2,518 (1.75/day)
One of my colleague bought a pair of MI50 to put in her buy-in nodes. It was a PITA to get it working: first the University's cluster work scheduler wont recognize the 2 new installed GPU and even basic testing scripts would fail. After some adjustments and online search she was able to sort it out along with over 100hrs paid to the University's HPC support team for customized support. She tried contacting AMD directly and she only got email response like every week. That was just the beginning. Trying to compile anything written for CUDA (which is the vast majority of bioinformatics) was a PITA. She started off having a postdoc working on it, in the end the project was given to a group of computer science undergrads to figure out. Only took 9 months, meanwhile very little can be done with them. In the end they were able to get some work done on those MI50, but man, it was a shit show from the software point of view. SO MANY F*CKING BUGS!. That experience, versus the super clean and polished CUDA experience on Nvidia HPC hardware AND software eventually drove her to purchase some V100s in the end.

For research teams who basically thrive on trouble shooting, or writing up a custom platform on AMD ROCm and GPU computing it would make sense. For research team that just need 0 drama and consistent output, unfortunately CUDA and Nvidia is the only player able to deliver.

With AMD getting more profit, I would hope they would put more and more emphasis on the software side. Hardware is less than 50% of the problems.

Some CUDA code can be ported, but not all. You'll really need a programmer who knows how the CUDA was written to know whether or not it'd be an easy job to port over to AMD's HIP.

HIP is missing a large set of CUDA features, such as thread-groups (EDIT: I mean cooperative groups), or some other various synchronization primitives. Something like hipThrust or hipPrim seem to be compile-time equivalent to cudaThrust or Cub, but its clearly a different implementation. The most important CUDA feature missing is device-side kernel launching.

If you have a CUDA system, I think you should just stay on CUDA. But if you're writing new software, I think that HIP is perfectly acceptable as a development environment.
 
Last edited:
Joined
Nov 6, 2016
Messages
1,559 (0.58/day)
Location
NH, USA
System Name Lightbringer
Processor Ryzen 7 2700X
Motherboard Asus ROG Strix X470-F Gaming
Cooling Enermax Liqmax Iii 360mm AIO
Memory G.Skill Trident Z RGB 32GB (8GBx4) 3200Mhz CL 14
Video Card(s) Sapphire RX 5700XT Nitro+
Storage Hp EX950 2TB NVMe M.2, HP EX950 1TB NVMe M.2, Samsung 860 EVO 2TB
Display(s) LG 34BK95U-W 34" 5120 x 2160
Case Lian Li PC-O11 Dynamic (White)
Power Supply BeQuiet Straight Power 11 850w Gold Rated PSU
Mouse Glorious Model O (Matte White)
Keyboard Royal Kludge RK71
Software Windows 10
I think this is the one going in the El Capitan supercomputer with the new Zen3 Epyc CPUs

One of my colleague bought a pair of MI50 to put in her buy-in nodes. It was a PITA to get it working: first the University's cluster work scheduler wont recognize the 2 new installed GPU and even basic testing scripts would fail. After some adjustments and online search she was able to sort it out along with over 100hrs paid to the University's HPC support team for customized support. She tried contacting AMD directly and she only got email response like every week. That was just the beginning. Trying to compile anything written for CUDA (which is the vast majority of bioinformatics) was a PITA. She started off having a postdoc working on it, in the end the project was given to a group of computer science undergrads to figure out. Only took 9 months, meanwhile very little can be done with them. In the end they were able to get some work done on those MI50, but man, it was a shit show from the software point of view. SO MANY F*CKING BUGS!. That experience, versus the super clean and polished CUDA experience on Nvidia HPC hardware AND software eventually drove her to purchase some V100s in the end.

For research teams who basically thrive on trouble shooting, or writing up a custom platform on AMD ROCm and GPU computing it would make sense. For research team that just need 0 drama and consistent output, unfortunately CUDA and Nvidia is the only player able to deliver.

With AMD getting more profit, I would hope they would put more and more emphasis on the software side. Hardware is less than 50% of the problems.

AMD literally has a tenth of the financial resources of Nvidia...what do you expect?
 
Joined
Mar 18, 2008
Messages
5,717 (0.98/day)
System Name Virtual Reality / Bioinformatics
Processor Undead CPU
Motherboard Undead TUF X99
Cooling Noctua NH-D15
Memory GSkill 128GB DDR4-3000
Video Card(s) EVGA RTX 3090 FTW3 Ultra
Storage Samsung 960 Pro 1TB + 860 EVO 2TB + WD Black 5TB
Display(s) 32'' 4K Dell
Case Fractal Design R5
Audio Device(s) BOSE 2.0
Power Supply Seasonic 850watt
Mouse Logitech Master MX
Keyboard Corsair K70 Cherry MX Blue
VR HMD HTC Vive + Oculus Quest 2
Software Windows 10 P
I think this is the one going in the El Capitan supercomputer with the new Zen3 Epyc CPUs



AMD literally has a tenth of the financial resources of Nvidia...what do you expect?

If they want a piece of that sweet HPC pie, they better have the good support.

Some CUDA code can be ported, but not all. You'll really need a programmer who knows how the CUDA was written to know whether or not it'd be an easy job to port over to AMD's HIP.

HIP is missing a large set of CUDA features, such as thread-groups, or some other various synchronization primitives. Something like hipThrust or hipPrim seem to be compile-time equivalent to cudaThrust or Cub, but its clearly a different implementation. The most important CUDA feature missing is device-side kernel launching.

If you have a CUDA system, I think you should just stay on CUDA. But if you're writing new software, I think that HIP is perfectly acceptable as a development environment.


Yeah, as I said, she bought into it initially for the low upfront cost and promises of "amazing raw power"

In the end, what matters is reliability and consistency for researchers. The development part is better left for pure CS people.

Then again it circles back to the problem of open source everything: fragmentation. There will be tons of badly supported implementation by the small academia research groups. Most would stop supporting and updating within a year or two once the funding period ends. Then nobody would understand or be able to use it any more from the research group that use it for actual work. Rinse and repeat. The consistency is bad
 
Last edited:
Joined
Apr 24, 2020
Messages
2,518 (1.75/day)
If they want a piece of that sweet HPC pie, they better have the good support.

I always thought that HIP vs CUDA was a strategic disadvantage. There's no way AMD can provide feature parity with CUDA, especially as NVidia just updates CUDA every few months with a new feature. At best, HIP will always lag behind by months, or years in terms of capability. But apparently there are enough people wanting CUDA-portability (not necessarily compatibility... but just an "easier port") that HIP is worthwhile.

What's more promising is technologies like OpenMP 4.5 or 5.0, providing easier to program device offload. This C / C++ / Fortran code can be "just recompiled" between NVidia or AMD systems, and "just work". Furthermore, scientific compute programmers are already very familiar with OpenMP / Fortran.

Then again it circles back to the problem of open source everything: fragmentation. There will be tons of badly supported implementation by the small academia research groups. Most would stop supporting and updating within a year or two once the funding period ends. Then nobody would understand or be able to use it any more from the research group that use it for actual work. Rinse and repeat. The consistency is bad

AMD has put a lot of money into hipThrust / TensorFlow, and other stuff for that PyCUDA community. AMD's matrix multiplication libraries are also pretty decent on HIP. If you want to just BLAS some matrixes, the software is more than capable to do that sort of thing.

What AMD needs is to support higher-level languages, like Julia, better. Julia's community clearly wants to work on HIP better, but there's been a lot of confusion about which cards support ROCm. AMD needs to be more clear that NAVI / NAVI 2.x do not support ROCm.
 
Last edited:
Joined
Dec 22, 2011
Messages
3,890 (0.87/day)
Processor AMD Ryzen 7 3700X
Motherboard MSI MAG B550 TOMAHAWK
Cooling AMD Wraith Prism
Memory Team Group Dark Pro 8Pack Edition 3600Mhz CL16
Video Card(s) NVIDIA GeForce RTX 3080 FE
Storage Kingston A2000 1TB + Seagate HDD workhorse
Display(s) Samsung 50" QN94A Neo QLED
Case Antec 1200
Power Supply Seasonic Focus GX-850
Mouse Razer Deathadder Chroma
Keyboard Logitech UltraX
Software Windows 11
Interestingly Nvidia also announced a new version of the A100 today which comes loaded with a whopping 80GB of 3.2Gbps HBM2e pushing 2TB/s in bandwidth. Eek!
 

Exyvia

New Member
Joined
Oct 13, 2020
Messages
19 (0.02/day)
While performance is great, many data centres do not look for performance as the determining factor, it'll most likely be support both in software and customer care.

Since they will always be a faster card coming out.
 
Joined
Dec 30, 2010
Messages
2,082 (0.43/day)
300W is also surprisingly low

In the above image, they display one server with up to 8 instinct cards, and 2x epyc CPU's. When running at full load, your looking at 2400W alone for the GPU's, and an estimated 600W for the CPU's as well. A very expensive setup esp when your having this in a datacenter where you pay for Kw/h / amps an hour. Thats more then 3Kwh to have that thing at full blast.

Amazing tech tho.
 
Joined
May 30, 2015
Messages
1,865 (0.58/day)
Location
Seattle, WA
Wonder if we will se a Vega bad binning spin off again?

No. Just like nVidia's A100 these have no render pipeline. Anything that fails out of top bin can't be sold as a dGPU, it'll just be a lower tier HPC GPGPU. There is a chance these could be sold as individual accelerators for headless compute boxes.

In the above image, they display one server with up to 8 instinct cards, and 2x epyc CPU's. When running at full load, your looking at 2400W alone for the GPU's, and an estimated 600W for the CPU's as well. A very expensive setup esp when your having this in a datacenter where you pay for Kw/h / amps an hour. Thats more then 3Kwh to have that thing at full blast.

It's not so much about how much power it uses. It's about how much can get done with the power. If your 3kWh 4U does the customer's work in half the time of the competitor's 2kWh 4U, you're still the cheaper option overall.
 
Top