News Posts matching #hyperscaler

Return to Keyword Browsing

AMD EPYC "Turin" with 192 Cores and 384 Threads Delivers Almost 40% Higher Performance Than Intel Xeon 6

AMD has unveiled its latest EPYC processors, codenamed "Turin," featuring Zen 5 and Zen 5C dense cores. Phoronix's thorough testing reveals remarkable advancements in performance, efficiency, and value. The new lineup includes the EPYC 9575F (64-core), EPYC 9755 (128-core), and EPYC 9965 (192-core) models, all showing impressive capabilities across various server and HPC workloads. In benchmarks, a dual-socket configuration of the 128-core EPYC 9755 Turin outperformed Intel's dual Xeon "Granite Rapids" 6980P setup with MRDIMM-8800 by 40% in the geometric mean of all tests. Surprisingly, even a single EPYC 9755 or EPYC 9965 matched the dual Xeon 6980P in expanded tests with regular DDR5-6400. Within AMD's lineup, the EPYC 9755 showed a 1.55x performance increase over its predecessor, the 96-core EPYC 9654 "Genoa". The EPYC 9965 surpassed the dual EPYC 9754 "Bergamo" by 45%.

These gains come with improved efficiency. While power consumption increased moderately, performance improvements resulted in better overall efficiency. For example, the EPYC 9965 used 32% more power than the EPYC 9654 but delivered 1.55x the performance. Power consumption remains competitive: the EPYC 9965 averaged 275 Watts (peak 461 Watts), the EPYC 9755 averaged 324 Watts (peak 500 Watts), while Intel's Xeon 6980P averaged 322 Watts (peak 547 Watts). AMD's pricing strategy adds to the appeal. The 192-core model is priced at $14,813, compared to Intel's 128-core CPU at $17,800. This competitive pricing, combined with superior performance per dollar and watt, has resonated with hyperscalers. Estimates suggest 50-60% of hyperscale deployments now use AMD processors.

NVIDIA "Blackwell" GB200 Server Dedicates Two-Thirds of Space to Cooling at Microsoft Azure

Late Tuesday, Microsoft Azure shared an interesting picture on its social media platform X, showcasing the pinnacle of GPU-accelerated servers—NVIDIA "Blackwell" GB200-powered AI systems. Microsoft is one of NVIDIA's largest customers, and the company often receives products first to integrate into its cloud and company infrastructure. Even NVIDIA listens to feedback from companies like Microsoft about designing future products, especially those like the now-canceled NVL36x2 system. The picture below shows a massive cluster that roughly divides the compute area into a single-third of the entire system, with a gigantic two-thirds of the system dedicated to closed-loop liquid cooling.

The entire system is connected using Infiniband networking, a standard for GPU-accelerated systems due to its lower latency in packet transfer. While the details of the system are scarce, we can see that the integrated closed-loop liquid cooling allows the GPU racks to be in a 1U form for increased density. Given that these systems will go into the wider Microsoft Azure data centers, a system needs to be easily maintained and cooled. There are indeed limits in power and heat output that Microsoft's data centers can handle, so these types of systems often fit inside internal specifications that Microsoft designs. There are more compute-dense systems, of course, like NVIDIA's NVL72, but hyperscalers should usually opt for other custom solutions that fit into their data center specifications. Finally, Microsoft noted that we can expect to see more details at the upcoming Microsoft Ignite conference in November and learn more about its GB200-powered AI systems.

Oracle Offers First Zettascale Cloud Computing Cluster

Oracle today announced the first zettascale cloud computing clusters accelerated by the NVIDIA Blackwell platform. Oracle Cloud Infrastructure (OCI) is now taking orders for the largest AI supercomputer in the cloud—available with up to 131,072 NVIDIA Blackwell GPUs.

"We have one of the broadest AI infrastructure offerings and are supporting customers that are running some of the most demanding AI workloads in the cloud," said Mahesh Thiagarajan, executive vice president, Oracle Cloud Infrastructure. "With Oracle's distributed cloud, customers have the flexibility to deploy cloud and AI services wherever they choose while preserving the highest levels of data and AI sovereignty."

Linux Patch Boosts Intel 5th Generation Xeon "Emerald Rapids" Performance by up to 38%, up to 18% Less Power

Intel's 5th generation Xeon Scalable processors codenamed Emerald Rapids, have been shipping since late 2023 and are installed at numerous servers today. However, Emerald Rapids appears to possess more performance and efficiency tricks than it initially revealed at launch. According to the report from Phoronix, reporting on a Linux kernel patch sent to the Linux Kernel Mailing List (LKML), there is a chance for up to 38% performance increase while using up to 18% less power on all Intel 5th generation Xeon machines. Thanks to Canonical (maker of Ubuntu Linux) engineer Pedro Henrique Kopper, who explained the patch on the LKML, we found out that changing a single line of code yielded this massive increase.

Ubuntu Linux, as well as many other distributions, ship with Energy Performance Preference (EPP) for Emerald Rapids with a "balance_performance" value of 128. However, changing the value to 32 now yields a massive performance improvement alongside using less power. The EPP "balance_performance" is the default out-of-the-box setting for many Linux distributions. Users manually setting the "performance" mode in the EPP are not expecting any increase from this patch, as the "balance_performance" mode had issues balancing power and efficiency. Introducing this new setting yields more performance for machines that run at default settings, and this is especially important for data centers where the need for lower power and increased performance is constantly surging. Especially at hyperscalers like Amazon, Google, and Meta, which may run tens of thousands of these CPUs at default settings to keep them stable and well-cooled, who can now enjoy a massive performance increase with less power consumed.
Below, you can see the patch quote as well as more performance/power measurements.

Alphawave Semi Launches Industry's First 3nm UCIe IP with TSMC CoWoS Packaging

Alphawave Semi, a global leader in high-speed connectivity and compute silicon for the world's technology infrastructure, has launched the industry's first 3 nm successful silicon bring-up of Universal Chiplet Interconnect Express (UCIe) Die-to-Die (D2D) IP with TSMC's Chip-on-Wafer-on-Substrate (CoWoS) advanced packaging technology.

The complete PHY and Controller subsystem was developed in collaboration with TSMC and targets applications such as hyperscaler, high-performance computing (HPC) and artificial intelligence (AI).

X-Silicon Startup Wants to Combine RISC-V CPU, GPU, and NPU in a Single Processor

While we are all used to having a system with a CPU, GPU, and, recently, NPU—X-Silicon Inc. (XSi), a startup founded by former Silicon Valley veterans—has unveiled an interesting RISC-V processor that can simultaneously handle CPU, GPU, and NPU workloads in a chip. This innovative chip architecture, which will be open-source, aims to provide a flexible and efficient solution for a wide range of applications, including artificial intelligence, virtual reality, automotive systems, and IoT devices. The new microprocessor combines a RISC-V CPU core with vector capabilities and GPU acceleration into a single chip, creating a versatile all-in-one processor. By integrating the functionality of a CPU and GPU into a single core, X-Silicon's design offers several advantages over traditional architectures. The chip utilizes the open-source RISC-V instruction set architecture (ISA) for both CPU and GPU operations, running a single instruction stream. This approach promises lower memory footprint execution and improved efficiency, as there is no need to copy data between separate CPU and GPU memory spaces.

Called the C-GPU architecture, X-Silicon uses RISC-V Vector Core, which has 16 32-bit FPUs and a Scaler ALU for processing regular integers as well as floating point instructions. A unified instruction decoder feeds the cores, which are connected to a thread scheduler, texture unit, rasterizer, clipping engine, neural engine, and pixel processors. All is fed into a frame buffer, which feeds the video engine for video output. The setup of the cores allows the users to program each core individually for HPC, AI, video, or graphics workloads. Without software, there is no usable chip, which prompts X-Silicon to work on OpenGL ES, Vulkan, Mesa, and OpenCL APIs. Additionally, the company plans to release a hardware abstraction layer (HAL) for direct chip programming. According to Jon Peddie Research (JPR), the industry has been seeking an open-standard GPU that is flexible and scalable enough to support various markets. X-Silicon's CPU/GPU hybrid chip aims to address this need by providing manufacturers with a single, open-chip design that can handle any desired workload. The XSi gave no timeline, but it has plans to distribute the IP to OEMs and hyperscalers, so the first silicon is still away.

US Government Wants Nuclear Plants to Offload AI Data Center Expansion

The expansion of AI technology affects not only the production and demand for graphics cards but also the electricity grid that powers them. Data centers hosting thousands of GPUs are becoming more common, and the industry has been building new facilities for GPU-enhanced servers to serve the need for more AI. However, these powerful GPUs often consume over 500 Watts per single card, and NVIDIA's latest Blackwell B200 GPU has a TGP of 1000 Watts or a single kilowatt. These kilowatt GPUs will be present in data centers with 10s of thousands of cards, resulting in multi-megawatt facilities. To combat the load on the national electricity grid, US President Joe Biden's administration has been discussing with big tech to re-evaluate their power sources, possibly using smaller nuclear plants. According to an Axios interview with Energy Secretary Jennifer Granholm, she has noted that "AI itself isn't a problem because AI could help to solve the problem." However, the problem is the load-bearing of the national electricity grid, which can't sustain the rapid expansion of the AI data centers.

The Department of Energy (DOE) has been reportedly talking with firms, most notably hyperscalers like Microsoft, Google, and Amazon, to start considering nuclear fusion and fission power plants to satisfy the need for AI expansion. We have already discussed the plan by Microsoft to embed a nuclear reactor near its data center facility and help manage the load of thousands of GPUs running AI training/inference. However, this time, it is not just Microsoft. Other tech giants are reportedly thinking about nuclear as well. They all need to offload their AI expansion from the US national power grid and develop a nuclear solution. Nuclear power is a mere 20% of the US power sourcing, and DOE is currently financing a Holtec Palisades 800-MW electric nuclear generating station with $1.52 billion in funds for restoration and resumption of service. Microsoft is investing in a Small Modular Reactors (SMRs) microreactor energy strategy, which could be an example for other big tech companies to follow.

Google: CPUs are Leading AI Inference Workloads, Not GPUs

The AI infrastructure of today is mostly fueled by the expansion that relies on GPU-accelerated servers. Google, one of the world's largest hyperscalers, has noted that CPUs are still a leading compute for AI/ML workloads, recorded on their Google Cloud Services cloud internal analysis. During the TechFieldDay event, a speech by Brandon Royal, product manager at Google Cloud, explained the position of CPUs in today's AI game. The AI lifecycle is divided into two parts: training and inference. During training, massive compute capacity is needed, along with enormous memory capacity, to fit ever-expanding AI models into memory. The latest models, like GPT-4 and Gemini, contain billions of parameters and require thousands of GPUs or other accelerators working in parallel to train efficiently.

On the other hand, inference requires less compute intensity but still benefits from acceleration. The pre-trained model is optimized and deployed during inference to make predictions on new data. While less compute is needed than training, latency and throughput are essential for real-time inference. Google found out that, while GPUs are ideal for the training phase, models are often optimized and run inference on CPUs. This means that there are customers who choose CPUs as their medium of AI inference for a wide variety of reasons.

Arm Launches Next-Generation Neoverse CSS V3 and N3 Designs for Cloud, HPC, and AI Acceleration

Last year, Arm introduced its Neoverse Compute Subsystem (CSS) for the N2 and V2 series of data center processors, providing a reference platform for the development of efficient Arm-based chips. Major cloud service providers like AWS with Graviton 4 and Trainuium 2, Microsoft with Cobalt 100 and Maia 100, and even NVIDIA with Grace CPU and Bluefield DPUs are already utilizing custom Arm server CPU and accelerator designs based on the CSS foundation in their data centers. The CSS allows hyperscalers to optimize Arm processor designs specifically for their workloads, focusing on efficiency rather than outright performance. Today, Arm has unveiled the next generation CSS N3 and V3 for even greater efficiency and AI inferencing capabilities. The N3 design provides up to 32 high-efficiency cores per die with improved branch prediction and larger caches to boost AI performance by 196%, while the V3 design scales up to 64 cores and is 50% faster overall than previous generations.

Both the N3 and V3 leverage advanced features like DDR5, PCIe 5.0, CXL 3.0, and chiplet architecture, continuing Arm's push to make chiplets the standard for data center and cloud architectures. The chiplet approach enables customers to connect their own accelerators and other chiplets to the Arm cores via UCIe interfaces, reducing costs and time-to-market. Looking ahead, Arm has a clear roadmap for its Neoverse platform. The upcoming CSS V4 "Adonis" and N4 "Dionysus" designs will build on the improvements in the N3 and V3, advancing Arm's goal of greater efficiency and performance using optimized chiplet architectures. As more major data center operators introduce custom Arm-based designs, the Neoverse CSS aims to provide a flexible, efficient foundation to power the next generation of cloud computing.

Samsung Lands Significant 2 nm AI Chip Order from Unnamed Hyperscaler

This week in its earnings call, Samsung announced that its foundry business has received a significant order for a two nanometer AI chips, marking a major win for its advanced fabrication technology. The unnamed customer has contracted Samsung to produce AI accelerators using its upcoming 2 nm process node, which promises significant gains in performance and efficiency over today's leading-edge chips. Along with the AI chips, the deal includes supporting HBM and advanced packaging - indicating a large-scale and complex project. Industry sources speculate the order may be from a major hyperscaler like Google, Microsoft, or Alibaba, who are aggressively expanding their AI capabilities. Competition for AI chip contracts has heated up as the field becomes crucial for data centers, autonomous vehicles, and other emerging applications. Samsung said demand recovery in 2023 across smartphones, PCs and enterprise hardware will fuel growth for its broader foundry business. It's forging ahead with 3 nm production while eyeing 2 nm for launch around 2025.

Compared to its 3 nm process, 2 nm aims to increase power efficiency by 25% and boost performance by 12% while reducing chip area by 5%. The new order provides validation for Samsung's billion-dollar investments in next-generation manufacturing. It also bolsters Samsung's position against Taiwan-based TSMC, which holds a large portion of the foundry market share. TSMC landed Apple as its first 2 nm customer, while Intel announced 5G infrastructure chip orders from Ericsson and Faraday Technology using its "Intel 18A" node. With rivals securing major customers, Samsung is aggressively pricing 2 nm to attract clients. Reports indicate Qualcomm may shift some flagship mobile chips to Samsung's foundry at the 2 nm node, so if the yields are good, the node has a great potential to attract customers.
Return to Keyword Browsing
Dec 5th, 2024 22:52 CST change timezone

New Forum Posts

Popular Reviews

Controversial News Posts