News Posts matching #Blackwell

Return to Keyword Browsing

NVIDIA Cancels Dual-Rack NVL36x2 in Favor of Single-Rack NVL72 Compute Monster

NVIDIA has reportedly discontinued its dual-rack GB200 NVL36x2 GPU model, opting to focus on the single-rack GB200 NVL72 and NVL36 models. This shift, revealed by industry analyst Ming-Chi Kuo, aims to simplify NVIDIA's offerings in the AI and HPC markets. The decision was influenced by major clients like Microsoft, who prefer the NVL72's improved space efficiency and potential for enhanced inference performance. While both models perform similarly in AI large language model (LLM) training, the NVL72 is expected to excel in non-parallelizable inference tasks. As a reminder, the NVL72 features 36 Grace CPUs, delivering 2,592 Arm Neoverse V2 cores with 17 TB LPDDR5X memory with 18.4 TB/s aggregate bandwidth. Additionally, it includes 72 Blackwell GB200 SXM GPUs that have a massive 13.5 TB of HBM3e combined, running at 576 TB/s aggregate bandwidth.

However, this shift presents significant challenges. The NVL72's power consumption of around 120kW far exceeds typical data center capabilities, potentially limiting its immediate widespread adoption. The discontinuation of the NVL36x2 has also sparked concerns about NVIDIA's execution capabilities and may disrupt the supply chain for assembly and cooling solutions. Despite these hurdles, industry experts view this as a pragmatic approach to product planning in the dynamic AI landscape. While some customers may be disappointed by the dual-rack model's cancellation, NVIDIA's long-term outlook in the AI technology market remains strong. The company continues to work with clients and listen to their needs, to position itself as a leader in high-performance computing solutions.

NVIDIA RTX 5090 "Blackwell" Could Feature Two 16-pin Power Connectors

NVIDIA CEO Jensen Huang never misses an opportunity to remind us that Moore's Law is cooked, and that future generations of logic hardware will only get larger and hotter, or hungrier for power. NVIDIA's next generation "Blackwell" graphics architecture promises to bring certain architecture-level performance/Watt improvements, coupled with the node-level performance/Watt improvements from the switch to the TSMC 4NP (4 nm-class) node. Even so, the GeForce RTX 5090, or the part that succeeds the current RTX 4090, will be a power hungry GPU, with rumors suggesting the need for two 16-pin power inputs.

TweakTown reports that the RTX 5090 could come with two 16-pin power connectors, which should give the card the theoretical ability to pull 1200 W (continuous). This doesn't mean that the GPU's total graphics power (TGP) is 1200 W, but a number close to or greater than 600 W, which calls for two of these connectors. Even if the TGP is exactly 600 W, NVIDIA would want to deploy two inputs, to spread the load among two connectors, and improve physical resilience of the connector. It's likely that both connectors will have 600 W input capability, so end-users don't mix up connectors should one of them be 600 W and the other keyed to 150 W or 300 W.

Oracle Offers First Zettascale Cloud Computing Cluster

Oracle today announced the first zettascale cloud computing clusters accelerated by the NVIDIA Blackwell platform. Oracle Cloud Infrastructure (OCI) is now taking orders for the largest AI supercomputer in the cloud—available with up to 131,072 NVIDIA Blackwell GPUs.

"We have one of the broadest AI infrastructure offerings and are supporting customers that are running some of the most demanding AI workloads in the cloud," said Mahesh Thiagarajan, executive vice president, Oracle Cloud Infrastructure. "With Oracle's distributed cloud, customers have the flexibility to deploy cloud and AI services wherever they choose while preserving the highest levels of data and AI sovereignty."

NVIDIA GeForce RTX 5090 and RTX 5080 Reach Final Stages This Month, Chinese "D" Variant Arrives for Both SKUs

NVIDIA is on the brink of finalizing its next-generation "Blackwell" graphics cards, the GeForce RTX 5090 and RTX 5080. Sources close to BenchLife indicate that NVIDIA is targeting September for the official design specification finalization of both models. This timeline hints at a possible unveiling at CES 2025, with a market release shortly after. The RTX 5090 is rumored to boast a staggering 550 W TGP, a significant 22% increase from its predecessor, the RTX 4090. Meanwhile, the RTX 5080 is expected to draw 350 W, a more modest 9.3% bump from the current RTX 4080. Interestingly, NVIDIA appears to be developing "D" variants for both cards, which are likely tailored for the Chinese market to comply with export regulations.

Regarding raw power, the RTX 5090 is speculated to feature 24,576 CUDA cores paired with 512-bit GDDR7 memory. The RTX 5080, while less mighty, is still expected to pack a punch with 10,752 CUDA cores and 256-bit GDDR7 memory. As NVIDIA prepares to launch these powerhouses, rumors suggest the RTX 4090D may be discontinued by December 2024, paving the way for its successor. We are curious to see how the power consumption is handled and if these cards are packed efficiently within the higher power envelope. Some rumors indicate that the RTX 5090 could reach 600 watts at its peak, while RTX 5080 reaches 400 watts. However, that is just a rumor for now. As always, until NVIDIA makes an official announcement, these details should be taken with a grain of salt.

NVIDIA Resolves "Blackwell" Yield Issues with New Photomask

During its Q2 2024 earnings call, NVIDIA confirmed that its upcoming Blackwell-based products are facing low-yield challenges. However, the company announced that it has implemented design changes to improve the production yields of its B100 and B200 processors. Despite these setbacks, NVIDIA remains optimistic about its production timeline. The tech giant plans to commence the production ramp of Blackwell GPUs in Q4 2024, with expected shipments worth several billion dollars by the end of the year. In an official statement, NVIDIA explained, "We executed a change to the Blackwell GPU mask to improve production yield." The company also reaffirmed that it had successfully sampled Blackwell GPUs with customers in the second quarter.

However, NVIDIA acknowledged that meeting demand required producing "low-yielding Blackwell material," which impacted its gross margins. During an earnings call, NVIDIA's CEO Jensen Huang assured investors that the supply of B100 and B200 GPUs will be there. He expressed confidence in the company's ability to mass-produce these chips starting in the fourth quarter. The Blackwell B100 and B200 GPUs use TSMC's CoWoS-L packaging technology and a complex design, which prompted rumors about the company facing yield issues with its designs. Reports suggest that initial challenges arose from mismatched thermal expansion coefficients among various components, leading to warping and system failures. However, now the company claims that the fix that solved these problems was a new GPU photomask, which bumped yields back to normal levels.

NVIDIA Blackwell Sets New Standard for Generative AI in MLPerf Inference Benchmark

As enterprises race to adopt generative AI and bring new services to market, the demands on data center infrastructure have never been greater. Training large language models is one challenge, but delivering LLM-powered real-time services is another. In the latest round of MLPerf industry benchmarks, Inference v4.1, NVIDIA platforms delivered leading performance across all data center tests. The first-ever submission of the upcoming NVIDIA Blackwell platform revealed up to 4x more performance than the NVIDIA H100 Tensor Core GPU on MLPerf's biggest LLM workload, Llama 2 70B, thanks to its use of a second-generation Transformer Engine and FP4 Tensor Cores.

The NVIDIA H200 Tensor Core GPU delivered outstanding results on every benchmark in the data center category - including the latest addition to the benchmark, the Mixtral 8x7B mixture of experts (MoE) LLM, which features a total of 46.7 billion parameters, with 12.9 billion parameters active per token. MoE models have gained popularity as a way to bring more versatility to LLM deployments, as they're capable of answering a wide variety of questions and performing more diverse tasks in a single deployment. They're also more efficient since they only activate a few experts per inference - meaning they deliver results much faster than dense models of a similar size.

ASUS Presents Comprehensive AI Server Lineup

ASUS today announced its ambitious All in AI initiative, marking a significant leap into the server market with a complete AI infrastructure solution, designed to meet the evolving demands of AI-driven applications from edge, inference and generative AI the new, unparalleled wave of AI supercomputing. ASUS has proven its expertise lies in striking the perfect balance between hardware and software, including infrastructure and cluster architecture design, server installation, testing, onboarding, remote management and cloud services - positioning the ASUS brand and AI server solutions to lead the way in driving innovation and enabling the widespread adoption of AI across industries.

Meeting diverse AI needs
In partnership with NVIDIA, Intel and AMD, ASUS offer comprehensive AI-infrastructure solutions with robust software platforms and services, from entry-level AI servers and machine-learning solutions to full racks and data centers for large-scale supercomputing. At the forefront is the ESC AI POD with NVIDIA GB200 NVL72, a cutting-edge rack designed to accelerate trillion-token LLM training and real-time inference operations. Complemented by the latest NVIDIA Blackwell GPUs, NVIDIA Grace CPUs and 5th Gen NVIDIA NVLink technology, ASUS servers ensure unparalleled computing power and efficiency.

NVIDIA's New B200A Targets OEM Customers; High-End GPU Shipments Expected to Grow 55% in 2025

Despite recent rumors speculating on NVIDIA's supposed cancellation of the B100 in favor of the B200A, TrendForce reports that NVIDIA is still on track to launch both the B100 and B200 in the 2H24 as it aims to target CSP customers. Additionally, a scaled-down B200A is planned for other enterprise clients, focusing on edge AI applications.

TrendForce reports that NVIDIA will prioritize the B100 and B200 for CSP customers with higher demand due to the tight production capacity of CoWoS-L. Shipments are expected to commence after 3Q24. In light of yield and mass production challenges with CoWoS-L, NVIDIA is also planning the B200A for other enterprise clients, utilizing CoWoS-S packaging technology.

Design Issues May Postpone Launch of NVIDIA's Advanced Blackwell AI Chips

NVIDIA may face delays in releasing its newest artificial intelligence chips due to design issues, according to anonymous sources involved in chip and server hardware production cited by The Information. The delay could extend to three months or more, potentially affecting major customers such as Meta, Google, and Microsoft. An unnamed Microsoft employee and another source claim that NVIDIA has already informed Microsoft about delays affecting the most advanced models in the Blackwell AI chip series. As a result, significant shipments are not expected until the first quarter of 2025.

When approached for comment, an NVIDIA spokesperson did not address communications with customers regarding the delay but stated that "production is on track to ramp" later this year. The Information reports that Microsoft, Google, Amazon Web Services, and Meta declined to comment on the matter, while Taiwan Semiconductor Manufacturing Company (TSMC) did not respond to inquiries.

NVIDIA Hit with DOJ Antitrust Probe over AI GPUs, Unfair Sales Tactics and Pricing Alleged

NVIDIA has reportedly been hit with a US Department of Justice (DOJ) antitrust probe over the tactics the company allegedly employs to sell or lease its AI GPUs and data-center networking equipment, "The Information" reported. Shares of the NVIDIA stock fell 3.6% in the pre-market trading on Friday (08/02). The main complainants behind the probe appear to be a special interest group among the customers of AI GPUs, and not NVIDIA's competitors in the AI GPU industry per se. US Senator Elizabeth Warren and US progressives have been most vocal about calling upon the DOJ to investigate antitrust allegations against NVIDIA.

Meanwhile, US officials are reportedly reaching out to NVIDIA's competitors, including AMD and Intel, to gather information about the complaints. NVIDIA holds 80% of the AI GPU market, while AMD, and to a much lesser extent, Intel, have received spillover demand for AI GPUs. "The Information" report says that the complaint alleges NVIDIA pressured cloud customers to buy "multiple products". We don't know what this means, one theory holds that NVIDIA is getting them to commit to buying multiple generations of products (eg: Ampere, Hopper, and over to Blackwell); while another holds that it's getting them to buy multiple kinds of products, which include not just the AI GPUs, but also NVIDIA's first-party server systems and networking equipment. Yet another theory holds that it is bundle first-party software and services to go with the hardware, far beyond the basic software needed to get the hardware to work.

NVIDIA Blackwell's High Power Consumption Drives Cooling Demands; Liquid Cooling Penetration Expected to Reach 10% by Late 2024

With the growing demand for high-speed computing, more effective cooling solutions for AI servers are gaining significant attention. TrendForce's latest report on AI servers reveals that NVIDIA is set to launch its next-generation Blackwell platform by the end of 2024. Major CSPs are expected to start building AI server data centers based on this new platform, potentially driving the penetration rate of liquid cooling solutions to 10%.

Air and liquid cooling systems to meet higher cooling demands
TrendForce reports that the NVIDIA Blackwell platform will officially launch in 2025, replacing the current Hopper platform and becoming the dominant solution for NVIDIA's high-end GPUs, accounting for nearly 83% of all high-end products. High-performance AI server models like the B200 and GB200 are designed for maximum efficiency, with individual GPUs consuming over 1,000 W. HGX models will house 8 GPUs each, while NVL models will support 36 or 72 GPUs per rack, significantly boosting the growth of the liquid cooling supply chain for AI servers.

NVIDIA GeForce "Blackwell" Won't Arrive Before January 2025?

It appears like 2024 will go down as the second consecutive year without any new GPU generation launch from either NVIDIA or AMD. Kopite7kimi, a reliable source with NVIDIA leaks, says that the GeForce RTX 50-series "Blackwell" generation won't see a debut before the 2025 International CES (January 2025). It was earlier expected that the company would launch at least its top two SKUs—the RTX 5090 and RTX 5080—toward the end of 2024, and ramp the series up from 2025. There is no explanation behind this "delay." Like everyone else, NVIDIA could be rationing its foundry allocation of the 3 nm wafers from TSMC for its high-margin "Blackwell" AI GPUs. The company now makes over five times the revenue from selling AI GPUs than it does from gaming GPUs, so this development should come as little surprise.

Things aren't any different with NVIDIA's rivals in this space, AMD and Intel. AMD's RDNA 4 graphics architecture and the Radeon RX series GPUs based on it, aren't expected to arrive before 2025. AMD is making several architectural upgrades with RDNA 4, particularly to its ray tracing hardware; and the company is expected to build these GPUs on a new foundry node. Meanwhile, Intel's Arc B-series gaming GPUs based on the Xe2 "Battlemage" graphics architecture are expected to arrive in 2025, too, although these chips are rumored to be based on a more mature 4 nm-class foundry node.

NVIDIA Shifts Gears: Open-Source Linux GPU Drivers Take Center Stage

Just a few months after hiring Ben Skeggs, a lead maintainer of the open-source NVIDIA GPU driver for Linux kernel, NVIDIA has announced a complete transition to open-source GPU kernel modules in its upcoming R560 driver release for Linux. This decision comes two years after the company's initial foray into open-source territory with the R515 driver in May 2022. The tech giant began focusing on data center compute GPUs, while GeForce and Workstation GPU support remained in the alpha stages. Now, after extensive development and optimization, NVIDIA reports that its open-source modules have achieved performance parity with, and in some cases surpassed, their closed-source counterparts. This transition brings a host of new capabilities, including heterogeneous memory management support, confidential computing features, and compatibility with NVIDIA's Grace platform's coherent memory architectures.

The move to open-source is expected to foster greater collaboration within the Linux ecosystem and potentially lead to faster bug fixes and feature improvements. However, not all GPUs will be compatible with the new open-source modules. While cutting-edge platforms like NVIDIA Grace Hopper and Blackwell will require open-source drivers, older GPUs from the Maxwell, Pascal, or Volta architectures must stick with proprietary drivers. NVIDIA has developed a detection helper script to guide driver selection for users who are unsure about compatibility. The shift also brings changes to NVIDIA's installation processes. The default driver version for most installation methods will now be the open-source variant. This affects package managers with the CUDA meta package, run file installations and even Windows Subsystem for Linux.

Global AI Server Demand Surge Expected to Drive 2024 Market Value to US$187 Billion; Represents 65% of Server Market

TrendForce's latest industry report on AI servers reveals that high demand for advanced AI servers from major CSPs and brand clients is expected to continue in 2024. Meanwhile, TSMC, SK hynix, Samsung, and Micron's gradual production expansion has significantly eased shortages in 2Q24. Consequently, the lead time for NVIDIA's flagship H100 solution has decreased from the previous 40-50 weeks to less than 16 weeks.

TrendForce estimates that AI server shipments in the second quarter will increase by nearly 20% QoQ, and has revised the annual shipment forecast up to 1.67 million units—marking a 41.5% YoY growth.

NVIDIA GeForce RTX 50 Series "Blackwell" TDPs Leaked, All Powered by 16-Pin Connector

In the preparation season for NVIDIA's upcoming GeForce RTX 50 Series of GPUs, codenamed "Blackwell," one power supply manufacturer accidentally leaked the power configurations of all SKUs. Seasonic operates its power supply wattage calculator, allowing users to configure their systems online and get power supply recommendations. This means that the system often gets filled with CPU/GPU SKUs to accommodate the massive variety of components. This time we have the upcoming GeForce RTX 50 series, with RTX 5050 all the way up to the top RTX 5090 GPU. Starting with the GeForce RTX 5050, this SKU is expected to carry a 100 W TDP. Its bigger brother, the RTX 5060, bumps the TDP to 170 W, 55 W higher than the previous generation "Ada Lovelace" RTX 4060.

The GeForce RTX 5070, with a 220 W TDP, is in the middle of the stack, featuring a 20 W increase over the Ada generation. For higher-end SKUs, NVIDIA prepared the GeForce RTX 5080 and RTX 5090, with 350 W and 500 W TDP, respectively. This also represents a jump in TDP from Ada generation with an increase of 30 W for RTX 5080 and 50 W for RTX 5090. Interestingly, this time NVIDIA wants to unify the power connection system of the entire family with a 16-pin 12V-2x6 connector but with an updated PCIe 6.0 CEM specification. The increase in power requirements for the "Blackwell" generation across the SKUs is interesting, and we are eager to see if the performance gains are enough to balance efficiency.

AI Startup Etched Unveils Transformer ASIC Claiming 20x Speed-up Over NVIDIA H100

A new startup emerged out of stealth mode today to power the next generation of generative AI. Etched is a company that makes an application-specific integrated circuit (ASIC) to process "Transformers." The transformer is an architecture for designing deep learning models developed by Google and is now the powerhouse behind models like OpenAI's GPT-4o in ChatGPT, Antrophic Claude, Google Gemini, and Meta's Llama family. Etched wanted to create an ASIC for processing only the transformer models, making a chip called Sohu. The claim is Sohu outperforms NVIDIA's latest and greatest by an entire order of magnitude. Where a server configuration with eight NVIDIA H100 GPU clusters pushes Llama-3 70B models at 25,000 tokens per second, and the latest eight B200 "Blackwell" GPU cluster pushes 43,000 tokens/s, the eight Sohu clusters manage to output 500,000 tokens per second.

Why is this important? Not only does the ASIC outperform Hopper by 20x and Blackwell by 10x, but it also serves so many tokens per second that it enables an entirely new fleet of AI applications requiring real-time output. The Sohu architecture is so efficient that 90% of the FLOPS can be used, while traditional GPUs boast a 30-40% FLOP utilization rate. This translates into inefficiency and waste of power, which Etched hopes to solve by building an accelerator dedicated to power transformers (the "T" in GPT) at massive scales. Given that the frontier model development costs more than one billion US dollars, and hardware costs are measured in tens of billions of US Dollars, having an accelerator dedicated to powering a specific application can help advance AI faster. AI researchers often say that "scale is all you need" (resembling the legendary "attention is all you need" paper), and Etched wants to build on that.

NVIDIA MLPerf Training Results Showcase Unprecedented Performance and Elasticity

The full-stack NVIDIA accelerated computing platform has once again demonstrated exceptional performance in the latest MLPerf Training v4.0 benchmarks. NVIDIA more than tripled the performance on the large language model (LLM) benchmark, based on GPT-3 175B, compared to the record-setting NVIDIA submission made last year. Using an AI supercomputer featuring 11,616 NVIDIA H100 Tensor Core GPUs connected with NVIDIA Quantum-2 InfiniBand networking, NVIDIA achieved this remarkable feat through larger scale - more than triple that of the 3,584 H100 GPU submission a year ago - and extensive full-stack engineering.

Thanks to the scalability of the NVIDIA AI platform, Eos can now train massive AI models like GPT-3 175B even faster, and this great AI performance translates into significant business opportunities. For example, in NVIDIA's recent earnings call, we described how LLM service providers can turn a single dollar invested into seven dollars in just four years running the Llama 3 70B model on NVIDIA HGX H200 servers. This return assumes an LLM service provider serving Llama 3 70B at $0.60/M tokens, with an HGX H200 server throughput of 24,000 tokens/second.

Possible Specs of NVIDIA GeForce "Blackwell" GPU Lineup Leaked

Possible specifications of the various NVIDIA GeForce "Blackwell" gaming GPUs were leaked to the web by Kopite7kimi, a reliable source with NVIDIA leaks. These are specs of the maxed out silicon, NVIDIA will carve out several GeForce RTX 50-series SKUs based on these chips, which could end up with lower shader counts than those shown here. We've known from older reports that there will be five chips in all, the GB202 being the largest, followed by the GB203, the GB205, the GB206, and the GB207. There is a notable absence of a successor to the AD104, GA104, and TU104, because NVIDIA is trying a slightly different way to approach the performance segment with this generation.

The GB202 is the halo segment chip that will drive the possible RTX 5090 (RTX 4090 successor). This chip is endowed with 192 streaming multiprocessors (SM), or 96 texture processing clusters (TPCs). These 96 TPCs are spread across 12 graphics processing clusters (GPCs), which each have 8 of them. Assuming that "Blackwell" has the same 256 CUDA cores per TPC that the past several generations of NVIDIA gaming GPUs have had, we end up with a total CUDA core count of 24,576. Another interesting aspect about this mega-chip is memory. The GPU implements the next-generation GDDR7 memory, and uses a mammoth 512-bit memory bus. Assuming the 28 Gbps memory speed that was being rumored for NVIDIA's "Blackwell" generation, this chip has 1,792 GB/s of memory bandwidth on tap!

ASRock Rack Unveils GPU Servers, Offers AI GPU Choices from All Three Brands

ASRock Rack sells the entire stack of servers a data-center could possibly want, and at Computex 2024, the company showed us their servers meant for AI GPUs. The 6U8M-GENOA2, as its name suggests, is a 6U server based on 2P AMD EPYC 9004 series "Genoa" processors in the SP5 package. You can configure it with even the variants of "Genoa" that come with 3D V-cache, for superior compute performance from the large cache. Each of the two SP5 sockets is wired to 12 DDR5 RDIMM slots, for a total of 24 memory channels. The server supports eight AMD Instinct MI300X or MI325X AI GPUs, which it wires out using Infinity Fabric links and PCIe Gen 5 x16 individually. A 3 kW 80 Plus Titanium PSU keeps the server fed. There are vacant Gen 5 x16 slots left even after connecting the GPUs, so you could give it a DPU-based 40 GbE NIC.

The 6U8X-EGS2 B100 is a 6U AI GPU server modeled along the 6U8M-GENOA2, with a couple of big changes. To begin with, the EPYC "Genoa" chips make way for a 2P Intel Xeon Socket E (LGA4677) CPU setup, for 2P Xeon 5 "Emerald Rapids" processors. Each socket is wired to 16 DDR5 DIMM slots (the processor itself has 8-channel DDR5, but this is a 2 DIMM-per-channel setup). The server integrates an NVIDIA NVSwitch that wires out NVLinks to eight NVIDIA B100 "Blackwell" AI GPUs. The server features eight HHHL PCIe Gen 5 x16, and five FHHL PCIe Gen 5 x16 connectors. There are vacant x16 slots for your DPU/NIC, you can even use an AIC NVIDIA BlueField card. The same 3 kW PSU as the "Genoa" system is also featured here.

GIGABYTE Joins COMPUTEX to Unveil Energy Efficiency and AI Acceleration Solutions

Giga Computing, a subsidiary of GIGABYTE and an industry leader in AI servers and green computing, today announced its participation in COMPUTEX and unveiling of solutions tackling complex AI workloads at scale, as well as advanced cooling infrastructure that will lead to greater energy efficiency. Additionally, to support innovations in accelerated computing and generative AI, GIGABYTE will have NVIDIA GB200 NVL72 systems available in Q1 2025. Discussions around GIGABYTE products will be held in booth #K0116 in Hall 1 at the Taipei Nangang Exhibition Center. As an NVIDIA-Certified System provider, GIGABYTE servers also support NVIDIA NIM inference microservices, part of the NVIDIA AI Enterprise software platform.

Redefining AI Servers and Future Data Centers
All new and upcoming CPU and accelerated computing technologies are being showcased at the GIGABYTE booth alongside GIGA POD, a rack-scale AI solution by GIGABYTE. The flexibility of GIGA POD is demonstrated with the latest solutions such as the NVIDIA HGX B100, NVIDIA HGX H200, NVIDIA GH200 Grace Hopper Superchip, and other OAM baseboard GPU systems. As a turnkey solution, GIGA POD is designed to support baseboard accelerators at scale with switches, networking, compute nodes, and more, including support for NVIDIA Spectrum -X to deliver powerful networking capabilities for generative AI infrastructures.

Blackwell Shipments Imminent, Total CoWoS Capacity Expected to Surge by Over 70% in 2025

TrendForce reports that NVIDIA's Hopper H100 began to see a reduction in shortages in 1Q24. The new H200 from the same platform is expected to gradually ramp in Q2, with the Blackwell platform entering the market in Q3 and expanding to data center customers in Q4. However, this year will still primarily focus on the Hopper platform, which includes the H100 and H200 product lines. The Blackwell platform—based on how far supply chain integration has progressed—is expected to start ramping up in Q4, accounting for less than 10% of the total high-end GPU market.

The die size of Blackwell platform chips like the B100 is twice that of the H100. As Blackwell becomes mainstream in 2025, the total capacity of TSMC's CoWoS is projected to grow by 150% in 2024 and by over 70% in 2025, with NVIDIA's demand occupying nearly half of this capacity. For HBM, the NVIDIA GPU platform's evolution sees the H100 primarily using 80 GB of HBM3, while the 2025 B200 will feature 288 GB of HBM3e—a 3-4 fold increase in capacity per chip. The three major manufacturers' expansion plans indicate that HBM production volume will likely double by 2025.

NVIDIA to Stick to Monolithic GPU Dies for its GeForce "Blackwell" Generation

NVIDIA's GeForce "Blackwell" generation of gaming GPUs will stick to being traditional monolithic die chips. The company will not build its next generation of chips as either disaggregated devices, or multi-chip modules. Kopite7kimi, a reliable source with NVIDIA leaks, says that the largest GPU in the generation, the "GB202," is based on a physically monolithic design. The GB202 is expected to power the flagship GeForce RTX 5090 (or RTX 4090 successor), and if NVIDIA sticking to traditional chip design for this, then it's unlikely that smaller GPUs will be any different.

In contrast, AMD started building disaggregated devices with its current RDNA 3 generation, with its top two chips, the "Navi 31" and "Navi 32," being disaggregated chips. An interesting rumor suggests that team red's RDNA 4 generation will see a transition from disaggregated chips to multi-chip modules—packages that contain multiple fully-integrated GPU dies. Back to the green camp, and NVIDIA is expected to use an advanced 4 nm-class node for its GeForce "Blackwell" GPUs.

NVIDIA's Arm-based AI PC Processor Could Leverage Arm Cortex X5 CPU Cores and Blackwell Graphics

Last week, we got confirmation from the highest levels of Dell and NVIDIA that the latter is making a client PC processor for the Windows on Arm (WoA) AI PC ecosystem that only has one player in it currently, Qualcomm. Michael Dell hinted that this NVIDIA AI PC processor would be ready in 2025. Since then, speculation has been rife about the various IP blocks NVIDIA could use in the development of this chip, the two key areas of debate have been the CPU cores and the process node.

Given that NVIDIA is gunning toward a 2025 launch of its AI PC processor, the company could implement reference Arm IP CPU cores, such as the Arm Cortex X5 "Blackhawk," and not venture out toward developing its own CPU cores on the Arm machine architecture, unlike Apple. Depending on how the market recieves its chips, NVIDIA could eventually develop its own cores. Next up, the company could use the most advanced 3 nm-class foundry node available in 2025 for its chip, such as the TSMC N3P. Given that even Apple and Qualcomm will build their contemporary notebook chips on this node, it would be a logical choice of node for NVIDIA. Then there's graphics and AI acceleration hardware.

NVIDIA RTX 5090 "Blackwell" Founders Edition to Implement the "RTX 4090 Ti" Cinderblock Design

NVIDIA's upcoming GeForce RTX 5090 Founders Edition graphics card may implement a design closely resembling the cinder block product design the company readied for its RTX 4090 Ti graphics card that never materialized into a marketable product. This sees a 4-slot thick board design, with a slender main logic PCB arranged along the plane of the motherboard, on top of which the cooling solution is mounted perpendicular to the plane, as shown in the images below. This main logic board contains the GPU, memory, and VRM. There two additional PCBs—one has the display I/O, and the other has the PCIe interface. There is a fourth disaggregated component, the 12V-2x6 receptacle, located somewhere along the top of the cooling solution.

Confirmation of NVIDIA using the RTX 4090 Ti "cinder block" board design for the RTX 5090 comes from kopite7kimi, a reliable source with NVIDIA leaks. Kopite7kimi mentions a card that has a "Main Board, IO Rigid Board and a separate PCIE slot component (perhaps it should not be considered as the third PCB)," which perfectly describes with the RTX 4090 Ti. NVIDIA had completed the design phase of this card, which made it to its cooling solution OEM (which is likely where the images leaked out from). The company probably decided against launching this product because the AMD Radeon RX 7900 XTX fell significantly short of the performance proposition of the RTX 4090.

NVIDIA Announces Financial Results for 1Q Fiscal 2025, Data Center Revenue Now 10x Gaming Revenue

NVIDIA (NASDAQ: NVDA) today reported revenue for the first quarter ended April 28, 2024, of $26.0 billion, up 18% from the previous quarter and up 262% from a year ago. For the quarter, GAAP earnings per diluted share was $5.98, up 21% from the previous quarter and up 629% from a year ago. Non-GAAP earnings per diluted share was $6.12, up 19% from the previous quarter and up 461% from a year ago. "The next industrial revolution has begun—companies and countries are partnering with NVIDIA to shift the trillion-dollar traditional data centers to accelerated computing and build a new type of data center—AI factories—to produce a new commodity: artificial intelligence," said Jensen Huang, founder and CEO of NVIDIA. "AI will bring significant productivity gains to nearly every industry and help companies be more cost- and energy-efficient, while expanding revenue opportunities.

"Our data center growth was fueled by strong and accelerating demand for generative AI training and inference on the Hopper platform. Beyond cloud service providers, generative AI has expanded to consumer internet companies, and enterprise, sovereign AI, automotive and healthcare customers, creating multiple multibillion-dollar vertical markets. "We are poised for our next wave of growth. The Blackwell platform is in full production and forms the foundation for trillion-parameter-scale generative AI. Spectrum-X opens a brand-new market for us to bring large-scale AI to Ethernet-only data centers. And NVIDIA NIM is our new software offering that delivers enterprise-grade, optimized generative AI to run on CUDA everywhere—from the cloud to on-prem data centers and RTX AI PCs—through our expansive network of ecosystem partners."
Return to Keyword Browsing
Oct 5th, 2024 05:43 EDT change timezone

New Forum Posts

Popular Reviews

Controversial News Posts