News Posts matching #HBM2E

Return to Keyword Browsing

NVIDIA DGX-A100 Systems Feature AMD EPYC "Rome" Processors

NVIDIA is leveraging the 128-lane PCI-Express gen 4.0 root complex of AMD 2nd generation EPYC "Rome" enterprise processors in building its DGX-A100 super scalar compute systems that leverage the new A100 "Ampere" compute processors. Each DGX-A100 block is endowed with two AMD EPYC 7742 64-core/128-thread processors in a 2P setup totaling 128-cores/256-threads, clocked up to 3.40 GHz boost.

This 2P EPYC "Rome" processor setup is configured to feed PCIe gen 4.0 connectivity to eight NVIDIA A100 GPUs, and 8-port Mellanox ConnectX 200 Gbps InfiniBand NIC. Six NVSwitches provide NVLink connectivity complementing PCI-Express gen 4.0 from the AMD sIODs. The storage and memory subsystem is equally jaw-dropping: 1 TB of hexadeca-channel (16-channel) DDR4 memory, two 1.92 TB NVMe gen 4.0 SSDs, and 15 TB of U.2 NVMe drives (4x 3.84 TB units). The GPU memory of the eight A100 units add up to 320 GB (that's 8x 40 GB, 6144-bit HBM2E). When you power it up, you're greeted with the Ubuntu Linux splash screen. All this can be yours for USD $199,000.

NVIDIA GA100 Scalar Processor Specs Sheet Released

NVIDIA today unveiled the GTC 2020, online event, and the centerpiece of it all is the GA100 scalar processor GPU, which debuts the "Ampere" graphics architecture. Sifting through a mountain of content, we finally found the slide that matters the most - the specifications sheet of GA100. The GA100 is a multi-chip module that has the 7 nm GPU die at the center, and six HBM2E memory stacks at its either side. The GPU die is built on the TSMC N7P 7 nm silicon fabrication process, measures 826 mm², and packing an unfathomable 54 billion transistors - and we're not even counting the transistors on the HBM2E stakcs of the interposer.

The GA100 packs 6,912 FP32 CUDA cores, and independent 3,456 FP64 (double-precision) CUDA cores. It has 432 third-generation tensor cores that have FP64 capability. The three are spread across a gargantuan 108 streaming multiprocessors. The GPU has 40 GB of total memory, across a 6144-bit wide HBM2E memory interface, and 1.6 TB/s total memory bandwidth. It has two interconnects: a PCI-Express 4.0 x16 (64 GB/s), and an NVLink interconnect (600 GB/s). Compute throughput values are mind-blowing: 19.5 TFLOPs classic FP32, 9.7 TFLOPs classic FP64, and 19.5 TFLOPs tensor cores; TF32 156 TFLOPs single-precision (312 TFLOPs with neural net sparsity enabled); 312 TFLOPs BFLOAT16 throughout (doubled with sparsity enabled); 312 TFLOPs FP16; 624 TOPs INT8, and 1,248 TOPS INT4. The GPU has a typical power draw of 400 W in the SXM form-factor. We also found the architecture diagram that reveals GA100 to be two almost-independent GPUs placed on a single slab of silicon. We also have our first view of the "Ampere" streaming multiprocessor with its FP32 and FP64 CUDA cores, and 3rd gen tensor cores. The GeForce version of this SM could feature 2nd gen RT cores.

NVIDIA Ampere A100 Has 54 Billion Transistors, World's Largest 7nm Chip

Not long ago, Intel's Raja Koduri claimed that the Xe HP "Ponte Vecchio" silicon was the "big daddy" of Xe GPUs, and the "largest chip co-developed in India," larger than the 35 billion-transistor Xilinix VU19P FPGA co-developed in the country. It turns out that NVIDIA is in the mood for setting records. The "Ampere" A100 silicon has 54 billion transistors crammed into a single 7 nm die (not counting transistor counts of the HBM2E memory stacks).

NVIDIA claims a 20 Times boost in both AI inference and single-precision (FP32) performance over its "Volta" based predecessor, the Tesla V100. The chip also offers a 2.5X gain in FP64 performance over "Volta." NVIDIA has also invented a new number format for AI compute, called TF32 (tensor float 32). TF32 uses 10-bit mantissa of FP16, and the 8-bit exponent of FP32, resulting in a new, efficient format. NVIDIA attributes its 20x performance gains over "Volta" to this. The 3rd generation tensor core introduced with Ampere supports FP64 natively. Another key design focus for NVIDIA is to leverage the "sparsity" phenomenon in neural nets, to reduce their size, and improve performance.

NVIDIA Tesla A100 GPU Pictured

Thanks to the sources of VideoCardz, we now have the first picture of the next-generation NVIDIA Tesla A100 graphics card. Designed for computing oriented applications, the Tesla A100 is a socketed GPU designed for NVIDIA's proprietary SXM socket. In a post few days ago, we were suspecting that you might be able to fit the Tesla A100 GPU in the socket of the previous Volta V100 GPUs as it is a similar SXM socket. However, the mounting holes have been re-arranged and this one requires a new socket/motherboard. The Tesla A100 GPU is based on GA100 GPU die, which we don't know specifications of. From the picture, we can only see that there is one very big die attached to six HBM modules, most likely HBM2E. Besides that everything else is unknown. More details are expected to be announced today at the GTC 2020 digital keynote.
NVIDIA Tesla A100

NVIDIA DGX A100 is its "Ampere" Based Deep-learning Powerhouse

NVIDIA will give its DGX line of pre-built deep-learning research workstations its next major update in the form of the DGX A100. This system will likely pack number of the company's upcoming Tesla A100 scalar compute accelerators based on its next-generation "Ampere" architecture and "GA100" silicon. The A100 came to light though fresh trademark applications by the company. As for specs and numbers, we don't know yet. The "Volta" based DGX-2 has up to sixteen "GV100" based Tesla boards adding up to 81,920 CUDA cores and 512 GB of HBM2 memory. One can expect NVIDIA to beat this count. The leading "Ampere" part could be HPC-focused, featuring a large CUDA-, and tensor core count, besides exotic memory such as HBM2E. We should learn more about it at the upcoming GTC 2020 online event.

SK hynix Inc. Reports First Quarter 2020 Results

SK hynix Inc. today announced financial results for its first quarter 2020 ended on March 31, 2020. The consolidated revenue of first quarter 2020 was 7.20 trillion won while the operating profit amounted to 800 billion won, and the net income 649 billion won. Operating margin for the quarter was 11% and net margin was 9%.

Despite abrupt changes of external business conditions due to COVID-19, our first quarter revenue and operating income increased by 4% and 239% quarter-over-quarter (QoQ) respectively, driven by increased sales of server products, yield rates improvement, and cost reduction. For DRAM, strong demand of server clients offset the weak mobile demand which declined due to both seasonal slowdown and the COVID-19 impact. As a result, the Company's DRAM bit shipments declined only by 4% QoQ and DRAM average selling price increased by 3% QoQ.

Rambus Designs HBM2E Controller and PHY

Rambus, a maker of various Interface IP solutions, today announced the latest addition to its high-speed memory interface IP product portfolio in form of High Bandwidth Memory 2E (HBM2E) controller and physical layer (PHY) IP solution. The two IPs are enabling customers to completely integrate the HBM2E memory into their products, given that Rambus provides a complete solution for controlling and interfacing the memory. The design that Ramus offers can support for 12-high DRAM stacks of up to 24 Gb devices, making for up to 36 GB of memory per 3D stack. This single 3D stack is capable of delivering 3.2 Gbps over a 1024-bit wide interface, delivering 410 GB/s of bandwidth per stack.

The HBM2E controller core is DFI 3.1 compatible and has support for logic interfaces like AXI, OCP, or a custom one, so the customer can choose a way to integrate this core in their design. With a purchase of their HBM2E IP, Rambus will provide source code written in Hardware Description Language (HDL) and GDSII file containing the layout of the interface.

Samsung Launches 3rd-Generation "Flashbolt" HBM2E Memory

Samsung Electronics, the world leader in advanced memory technology, today announced the market launch of 'Flashbolt', its third-generation High Bandwidth Memory 2E (HBM2E). The new 16-gigabyte (GB) HBM2E is uniquely suited to maximize high performance computing (HPC) systems and help system manufacturers to advance their supercomputers, AI-driven data analytics and state-of-the-art graphics systems in a timely manner.

"With the introduction of the highest performing DRAM available today, we are taking a critical step to enhance our role as the leading innovator in the fast-growing premium memory market," said Cheol Choi, executive vice president of Memory Sales & Marketing at Samsung Electronics. "Samsung will continue to deliver on its commitment to bring truly differentiated solutions as we reinforce our edge in the global memory marketplace."

SK hynix Displays its Semiconductor Technologies Leading the 4th Industrial Revolution

SK hynix Inc. presents its innovative semiconductor technologies leading the 4th Industrial Revolution at CES 2020, the world's largest trade show for IT and consumer electronics in Las Vegas, USA, from January 7-10, 2020. In line with its "Memory Centric World" theme, SK hynix depicts a futuristic city which effectively utilizes enormous amounts of data. The Company also showcases its semiconductor solutions across six crucial business fields - artificial intelligence (AI), augmented reality (AR) / virtual reality (VR), automotive, Internet of Things (IoT), big data and 5G.

Headlining at CES 2020 are SK hynix's memory solutions including HBM2E, DDR5 for servers, and SSD, which are already highly regarded and widely used in 4th industrial fields such as 5G and AI for their stability, speed, power consumption and density excellence. Other cutting-edge products set to make headlines in January are the Company's highly durable LPDDR4X and eMMC 5.1, which are optimized for automobiles. What's more, SK hynix is displaying its LPDDR5 and UFS that enhance the performance of 5G smartphones as well as CIS (CMOS Image Sensor) which is essential in establishing effective environments for AR/VR and IoT.

GLOBALFOUNDRIES and SiFive to Deliver Next Level of High Bandwidth Memory on 12LP

GLOBALFOUNDRIES (GF ) and SiFive, Inc. announced today at GLOBALFOUNDRIES Technology Conference (GTC) in Taiwan that they are working to extend high DRAM performance levels with High Bandwidth Memory (HBM2E) on GF's recently announced 12LP+ FinFET solution, with 2.5D packaging design services to enable fast time-to-market for Artificial Intelligence (AI) applications.

In order to achieve the capacity and bandwidth for data-intensive AI training applications, system designers are challenged with squeezing more bandwidth into a smaller area while maintaining a reasonable power profile. SiFive's customizable high bandwidth memory interface on GF's 12LP platform and 12LP+ solution will enable easy integration of high bandwidth memory into a single System-on-Chip (SoC) solutions to deliver fast, power-efficient data processing for AI applications in the computing and wired infrastructure markets.

SK hynix Inc. Reports Third Quarter 2019 Results

SK hynix Inc. today announced financial results for its third quarter 2019 ended on September 30, 2019. The consolidated third quarter revenue was 6.84 trillion won while the operating profit amounted to 473 billion won and the net income 495 billion won. Operating margin and net margin for the quarter was 7%.

The revenue in the third quarter increased by 6% quarter-over-quarter (QoQ) as demand began to pick up. However, the operating profit fell by 26% QoQ as DRAM unit cost reduction was not enough to offset the price drop. DRAM bit shipments increased by 23% QoQ as the Company actively responded to the new products in the mobile market and purchases from some data center customers also increased. DRAM prices remained weak during the quarter, leading to a 16% drop in the average selling price, with the decline smaller than the previous quarter.

SK Hynix Announces its HBM2E Memory Products, 460 GB/s and 16GB per Stack

SK Hynix Inc. announced today that it has developed HBM2E DRAM product with the industry's highest bandwidth. The new HBM2E boasts approximately 50% higher bandwidth and 100% additional capacity compared to the previous HBM2. SK Hynix's HBM2E supports over 460 GB (Gigabyte) per second bandwidth based on the 3.6 Gbps (gigabits-per-second) speed performance per pin with 1,024 data I/Os (Inputs/Outputs). Through utilization of the TSV (Through Silicon Via) technology, a maximum of eight 16-gigabit chips are vertically stacked, forming a single, dense package of 16 GB data capacity.

SK Hynix's HBM2E is an optimal memory solution for the fourth Industrial Era, supporting high-end GPU, supercomputers, machine learning, and artificial intelligence systems that require the maximum level of memory performance. Unlike commodity DRAM products which take on module package forms and mounted on system boards, HBM chip is interconnected closely to processors such as GPUs and logic chips, distanced only a few µm units apart, which allows even faster data transfer.

Samsung Electronics Introduces New Flashbolt HBM2E High Bandwidth Memory

Samsung Electronics Co., Ltd., the world leader in advanced semiconductor technology, today announced its new High Bandwidth Memory (HBM2E) product at NVIDIA's GPU Technology Conference (GTC) to deliver the highest DRAM performance levels for use in next-generation supercomputers, graphics systems, and artificial intelligence (AI).

The new solution, Flashbolt , is the industry's first HBM2E to deliver a 3.2 gigabits-per-second (Gbps) data transfer speed per pin, which is 33 percent faster than the previous-generation HBM2. Flashbolt has a density of 16Gb per die, double the capacity of the previous generation. With these improvements, a single Samsung HBM2E package will offer a 410 gigabytes-per-second (GBps) data bandwidth and 16 GB of memory.
Return to Keyword Browsing