News Posts matching #HPC

Return to Keyword Browsing

Synopsys Accelerates Multi-Die Designs with Industry's First Complete HBM3 IP and Verification Solutions

Synopsys, Inc. today announced the industry's first complete HBM3 IP solution, including controller, PHY, and verification IP for 2.5D multi-die package systems. HBM3 technology helps designers meet essential high-bandwidth and low-power memory requirements for system-on-chip (SoC) designs targeting high-performance computing, AI and graphics applications. Synopsys' DesignWare HBM3 Controller and PHY IP, built on silicon-proven HBM2E IP, leverage Synopsys' interposer expertise to provide a low-risk solution that enables high memory bandwidth at up to 921 GB/s.

The Synopsys verification solution, including Verification IP with built-in coverage and verification plans, off-the-shelf HBM3 memory models for ZeBu emulation, and HAPS prototyping system, accelerates verification from HBM3 IP to SoCs. To accelerate development of HBM3 system designs, Synopsys' 3DIC Compiler multi-die design platform provides a fully integrated architectural exploration, implementation and system-level analysis solution.

European Processor Initiative EPAC 1.0 RISC-V Test Chip Samples Delivered

The European Processor Initiative (EPI) https://www.european-processor-initiative.eu/, a project with 28 partners from 10 European countries, with the goal of making EU achieve independence in HPC chip technologies and HPC infrastructure, is proud to announce that EPAC 1.0 RISC-V Test Chip samples were delivered to EPI and initial tests of their operation were successful.

One key segment of EPI activities is to develop and demonstrate fully European-grown processor IPs based on the RISC-V Instruction Set Architecture, providing power-efficient and high-throughput accelerator cores named EPAC (European Processor Accelerators).

Revenue of Top 10 IC Design (Fabless) Companies Reaches US$29.8 Billion for 2Q21, Though Growth May Potentially Slow in 2H21, Says TrendForce

In view of the ongoing production capacity shortage in the semiconductor industry and the resultant price hike of chips, revenue of the top 10 IC design companies for 2Q21 reached US$29.8 billion, a 60.8% YoY increase, according to TrendForce's latest investigations. In particular, Taiwanese companies put up remarkable performances during this period, with both MediaTek and Novatek posting YoY growths of more than 95%. AMD, on the other hand, experienced a nearly 100% YoY revenue growth, the highest among the top 10.

TrendForce indicates that the ranking of the top five companies for 2Q21 remained unchanged from the previous quarter, although there were major changes in the 6th to 10th spots. More specifically, after finalizing its acquisition of Inphi, Marvell experienced a major revenue growth and leapfrogged Xilinx and Realtek in the rankings from 9th place in 1Q21 to 7th place in 2Q21.

Revenue of Top 10 OSAT Companies for 2Q21 Reaches US$7.88 Billion Due to Strong Demand and Increased Package/Test Prices, Says TrendForce

Despite the intensifying COVID-19 pandemic that swept Taiwan in 2Q21, the domestic OSAT (outsourced semiconductor assembly and test) industry remained largely intact, according to TrendForce's latest investigations. Global sales of large-sized TVs were brisk thanks to major sporting events such as the Tokyo Olympics and UEFA Euro 2020. Likewise, the proliferation of WFH and distance learning applications propelled the demand for IT products, while the automotive semiconductor and data center markets also showed upward trajectories. Taking into account the above factors, OSAT companies raised their quotes in response, resulting in a 26.4% YoY increase in the top 10 OSAT companies' revenue to US$7.88 billion for 2Q21.

TrendForce indicates that, in light of the ongoing global chip shortage and the growing production capacities of foundries/IDMs in the upstream semiconductor supply chain, OSAT companies gradually increased their CAPEX and expanded their fabs and equipment in order to meet the persistently growing client demand. However, the OSAT industry still faces an uncertain future in 2H21 due to the Delta variant's global surge and the health crisis taking place in Southeast Asia, home to a significant number of OSAT facilities.

Tachyum Boots Linux on Prodigy FPGA

Tachyum Inc. today announced that it has successfully executed the Linux boot process on the field-programmable gate array (FPGA) prototype of its Prodigy Universal Processor, in 2 months after taking delivery of the IO motherboard from manufacturing. This achievement proves the stability of the Prodigy emulation system and allows the company to move forward with additional testing before advancing to tape out.

Tachyum engineers were able to perform the Linux boot, execute a short user-mode program and shutdown the system on the fully functional FPGA emulation system. Not only does this successful test prove that the basic processor is stable, but interrupts, exceptions, timing, and system-mode transitions are, as well. This is a key milestone, which dramatically reduces risk, as booting and running large and complex pieces of software like Linux reliably on the Tachyum FPGA processor prototype shows that verification and hardware stability are past the most difficult turning point, and it is now obvious that verification and testing should successfully complete in the coming months. Designers are now shifting their attention to debug and verification processes, running hundreds of trillions of test cycles over the next few months, and running large scale user mode applications with compatibility testing to get the processor to production quality.

AMD EPYC Processors Picked by Argonne National Laboratory to Prepare for Exascale Future

AMD announced that the U.S. Department of Energy's (DOE) Argonne National Laboratory (Argonne) has chosen AMD EPYC processors to power a new supercomputer, called Polaris, which will prepare researchers for the forthcoming exascale supercomputer at Argonne called Aurora. Polaris is built by Hewlett Packard Enterprise (HPE), will use 2nd Gen EPYC processors and then upgrade to 3rd Gen AMD EPYC processors, and will allow scientists and developers to test and optimize software codes and applications to tackle a range of AI, engineering, and scientific projects.

"AMD EPYC server processors continue to be the leading choice for modern HPC research, delivering the performance and capabilities needed to help solve the complex problems that pre-exascale and exascale computing will address," said Forrest Norrod, senior vice president and general manager, Datacenter and Embedded Solutions Business Group, AMD. "We are extremely proud to support Argonne National Laboratory and their critical research into areas including low carbon technologies, medical research, astronomy, solar power and more as we draw closer to the exascale era."

Intel Ponte Vecchio Early Silicon Puts Out 45 TFLOPs FP32 at 1.37 GHz, Already Beats NVIDIA A100 and AMD MI100

Intel in its 2021 Architecture Day presentation put out fine technical details of its Xe HPC Ponte Vecchio accelerator, including some [very] preliminary performance claims for its current A0-silicon-based prototype. The prototype operates at 1.37 GHz, but achieves out at least 45 TFLOPs of FP32 throughput. We calculated the clock speed based on simple math. Intel obtained the 45 TFLOPs number on a machine running a single Ponte Vecchio OAM (single MCM with two stacks), and a Xeon "Sapphire Rapids" CPU. 45 TFLOPs sees the processor already beat the advertised 19.5 TFLOPs of the NVIDIA "Ampere" A100 Tensor Core 40 GB processor. AMD isn't faring any better, with its production Instinct MI100 processor only offering 23.1 TFLOPs FP32.

NVIDIA Announces Financial Results for Second Quarter Fiscal 2022

NVIDIA (NASDAQ: NVDA) today reported record revenue for the second quarter ended August 1, 2021, of $6.51 billion, up 68 percent from a year earlier and up 15 percent from the previous quarter, with record revenue from the company's Gaming, Data Center and Professional Visualization platforms. GAAP earnings per diluted share for the quarter were $0.94, up 276 percent from a year ago and up 24 percent from the previous quarter. Non-GAAP earnings per diluted share were $1.04, up 89 percent from a year ago and up 14 percent from the previous quarter.

"NVIDIA's pioneering work in accelerated computing continues to advance graphics, scientific computing and AI," said Jensen Huang, founder and CEO of NVIDIA. "Enabled by the NVIDIA platform, developers are creating the most impactful technologies of our time - from natural language understanding and recommender systems, to autonomous vehicles and logistic centers, to digital biology and climate science, to metaverse worlds that obey the laws of physics.

Rambus Innovates 8.4 Gbps HBM3-ready Memory Subsystem

Rambus Inc., a premier chip and silicon IP provider making data faster and safer, today announced the Rambus HBM3-ready memory interface subsystem consisting of a fully-integrated PHY and digital controller. Supporting breakthrough data rates of up to 8.4 Gbps, the solution can deliver over a terabyte per second of bandwidth, more than double that of high-end HBM2E memory subsystems. With a market-leading position in HBM2/2E memory interface deployments, Rambus is ideally suited to enable customers' implementations of accelerators using next-generation HBM3 memory.

"The memory bandwidth requirements of AI/ML training are insatiable with leading-edge training models now surpassing billions of parameters," said Soo Kyoum Kim, associate vice president, Memory Semiconductors at IDC. "The Rambus HBM3-ready memory subsystem raises the bar for performance enabling state-of-the-art AI/ML and HPC applications."

Infortrend Launches U.2 NVMe Scale-out NAS Solution

Infortrend Technology, Inc., the industry-leading enterprise storage provider, has launched U.2 SSD solution for EonStor CS scale-out NAS. The new all-flash CS 4014U satisfies high performance-demanding requirements for high throughput and low latency workloads, such as media & entertainment (M&E), HPC, Big Data, etc.

EonStor CS is a scale-out NAS storage able to expand capacity and linearly increase performance by adding more nodes. CS provides complete data protection and high availability to avoid data loss and system downtime caused by disk damage or system failures. Each node of CS 4014U model can be installed with 14 U.2 SSDs, and a 5-node cluster can reach 20 GB/s throughput.

AMD MI200 "Aldebaran" Memory Size of 128GB Per Package Confirmed

The 128 GB per package memory size of AMD's upcoming Instinct MI200 HPC accelerator was confirmed, in a document released by Pawsey SuperComputing Centre, a Perth, Australia-based supercomputing firm that's popular with mineral prospecting companies located there. The company is currently working on Setonix, a 50-petaFLOP supercomputer being put together by HP Enterprise, which combines over 750 next-generation "Aldebaran" GPUs (referenced only as "AMD MI-Next GPUs"); and over 200,000 AMD EPYC "Milan" processor cores (the actual processor package count would be lower, and depend on the various core configs the builder is using).

The Pawsey document mentions 128 GB as the per-GPU memory. This corresponds with the rumored per-package memory of "Aldebaran." Recently imagined by Locuza_, an enthusiast who specializes in annotation of logic silicon dies, "Aldebaran" is a multi-chip module of two logic dies and eight HBM2E stacks. Each of the two logic dies, or chiplets, has 8,192 CDNA2 stream processors that add up to 16,384 on the package; and each of the two dies is wired to four HBM2E stacks over a 4096-bit memory bus. These are 128 Gbit (16 GB) stacks, so we have 64 GB memory per logic die, and 128 GB on the package. Find other drool worthy specs of the Pawsey Setonix in the screengrab below.

AMD CDNA2 "Aldebaran" MI200 HPC Accelerator with 256 CU (16,384 cores) Imagined

AMD Instinct MI200 will be an important product for the company in the HPC and AI supercomputing market. It debuts the CDNA2 compute architecture, and is based on a multi-chip module (MCM) codenamed "Aldebaran." PC enthusiast Locuza, who conjures highly detailed architecture based on public information, imagined what "Aldebaran" could look like. The MCM contains two logic dies, and eight HBM2E stacks. Each of the two dies has a 4096-bit HBM2E interface, which talks to 64 GB of memory (128 GB per package). A silicon interposer provides microscopic wiring among the ten dies.

Each of the two logic dies, or chiplets, has sixteen shader engines that have 16 compute units (CU), each. The CDNA2 compute unit is capable of full-rate FP64, packed FP32 math, and Matrix Engines V2 (fixed function hardware for matrix multiplication, accelerating DNN building, training, and AI inference). With 128 CUs per chiplet, assuming the CDNA2 CU has 64 stream processors, one arrives at 8,192 SP. Two such dies add up to a whopping 16,384, more than three times that of the "Navi 21" RDNA2 silicon. Each die further features its independent PCIe interface, and XGMI (AMD's rival to CXL), an interconnect designed for high-density HPC scenarios. A rudimentary VCN (Video CoreNext) component is also present. It's important to note here, that the CDNA2 CU, as well as the "Aldebaran" MCM itself, doesn't have a dual-use as a GPU, since it lacks much of the hardware needed for graphics processing. The MI200 is expected to launch later this year.

AMD Leads High Performance Computing Towards Exascale and Beyond

At this year's International Supercomputing 2021 digital event, AMD (NASDAQ: AMD) is showcasing momentum for its AMD EPYC processors and AMD Instinct accelerators across the High Performance Computing (HPC) industry. The company also outlined updates to the ROCm open software platform and introduced the AMD Instinct Education and Research (AIER) initiative. The latest Top500 list showcased the continued growth of AMD EPYC processors for HPC systems. AMD EPYC processors power nearly 5x more systems compared to the June 2020 list, and more than double the number of systems compared to November 2020. As well, AMD EPYC processors power half of the 58 new entries on the June 2021 list.

"High performance computing is critical to addressing the world's biggest and most important challenges," said Forrest Norrod, senior vice president and general manager, data center and embedded systems group, AMD. "With our AMD EPYC processor family and Instinct accelerators, AMD continues to be the partner of choice for HPC. We are committed to enabling the performance and capabilities needed to advance scientific discoveries, break the exascale barrier, and continue driving innovation."

Certain Intel Xeon "Sapphire Rapids" SKUs Come with On-Package HBM

Intel today, in its 2021 International Supercomputing Conference presentation, revealed that certain next-generation Xeon "Sapphire Rapids" SKUs come with on-package high-bandwidth memory (HBM). Given the context of its presentation, these could be special SKUs designed for high-density HPC setups, in which the processor package includes certain amount of "PMEM" (package memory), besides the processor's 8-channel DDR5 memory interface.

The size of the HBM PMEM, and its position in the memory hierarchy, were detailed, too. Given its high-density applications, PMEM may not serve as a victim cache for the processor, but rather be capable of serving as main memory, with none of the DDR5 DRAM channels populated with DIMMs. On machines with DIMMs, the PMEM will serve as a victim cache for the processor's on-die last-level cache, accelerating the memory I/O. "The next-generation of Intel Xeon Scalable processors (code-named "Sapphire Rapids) will offer integrated High Bandwidth Memory (HBM), providing a dramatic boost in memory bandwidth and a significant performance improvement for HPC applications that operate memory bandwidth-sensitive workloads. Users can power through workloads using just High Bandwidth Memory or in combination with DDR5," says Intel.

New Intel XPU Innovations Target HPC and AI

At the 2021 International Supercomputing Conference (ISC) Intel is showcasing how the company is extending its lead in high performance computing (HPC) with a range of technology disclosures, partnerships and customer adoptions. Intel processors are the most widely deployed compute architecture in the world's supercomputers, enabling global medical discoveries and scientific breakthroughs. Intel is announcing advances in its Xeon processor for HPC and AI as well as innovations in memory, software, exascale-class storage, and networking technologies for a range of HPC use cases.

"To maximize HPC performance we must leverage all the computer resources and technology advancements available to us," said Trish Damkroger, vice president and general manager of High Performance Computing at Intel. "Intel is the driving force behind the industry's move toward exascale computing, and the advancements we're delivering with our CPUs, XPUs, oneAPI Toolkits, exascale-class DAOS storage, and high-speed networking are pushing us closer toward that realization."

NVIDIA and Global Partners Launch New HGX A100 Systems to Accelerate Industrial AI and HPC

NVIDIA today announced it is turbocharging the NVIDIA HGX AI supercomputing platform with new technologies that fuse AI with high performance computing, making supercomputing more useful to a growing number of industries.

To accelerate the new era of industrial AI and HPC, NVIDIA has added three key technologies to its HGX platform: the NVIDIA A100 80 GB PCIe GPU, NVIDIA NDR 400G InfiniBand networking, and NVIDIA Magnum IO GPUDirect Storage software. Together, they provide the extreme performance to enable industrial HPC innovation.

Intel Makes Changes to Executive Team, Raja got Promoted

Intel CEO Pat Gelsinger announced the addition of two new technology leaders to its executive leadership team, as well as several changes to Intel business units. Current Intel executives Sandra Rivera and Raja Koduri will each take on new senior leadership roles, and technology industry veterans Nick McKeown and Greg Lavender will join the company.

"Since re-joining Intel, I have been impressed with the depth of talent and incredible innovation throughout the company, but we must move faster to fulfill our ambitions," said Gelsinger. "By putting Sandra, Raja, Nick and Greg - with their decades of technology expertise - at the forefront of some of our most essential work, we will sharpen our focus and execution, accelerate innovation, and unleash the deep well of talent across the company."

Tachyum Receives Prodigy FPGA DDR-IO Motherboard to Create Full System Emulation

Tachyum Inc. today announced that it has taken delivery of an IO motherboard for its Prodigy Universal Processor hardware emulator from manufacturing. This provides the company with a complete system prototype integrating CPU, memory, PCI Express, networking and BMC management subsystems when connected to the previously announced field-programmable gate array (FPGA) emulation system board.

The Tachyum Prodigy FPGA DDR-IO Board connects to the Prodigy FPGA CPU Board to provide memory and IO connectivity for the FPGA-based CPU tiles. The fully functional Prodigy emulation system is now ready for further build out, including Linux boot and incorporation of additional test chips. It is available to customers to perform early testing and software development prior to a full four-socket reference design motherboard, which is expected to be available Q4 2021.

AMD Instinct MI200 "Aldebaran" to Launch Later This Year

AMD's next-generation HPC accelerator card, the Instinct MI200, is expected to launch later this year. CEO Dr Lisa Su, speaking at a financial event hosted by JPMorgan stated that the company would launch the next-generation of CDNA architecture this year. The card debuts the company's new CDNA2 compute architecture, and is on its way to supercomputers already announced. The Instinct MI200 HPC accelerator card is based on the new "Aldebaran" compute accelerator package, which is a multi-chip module of not just the compute silicon and memory dies; but one that has multiple compute dies.

Intel "Sapphire Rapids" Xeon Processors Use "Golden Cove" CPU Cores, Company Clarifies in Linux Kernel Dev E-Mail Chain

Intel's upcoming Xeon "Sapphire Rapids" processors which debut in the second half of 2021, will feature up to 80 "Golden Cove" CPU cores, and not the previously rumored "Willow Cove." This was clarified by an Intel developer in a Linux Kernel code e-mail chain. "Golden Cove" CPU cores are more advanced than the "Willow Cove" cores found in current-generation Intel products, such as the client "Tiger Lake" processors. Intel stated that "Golden Cove" introduces an IPC gain over "Willow Cove" (expressed as "ST perf"), increased AI inference performance from an updated GNI component, "network and 5G perf," which is possibly some form of network stack acceleration, and additional security features.

Over in the client segment, the 12th Gen Core "Alder Lake" processor debuts a client variant of "Golden Cove." The "Alder Lake-S" silicon features eight "Golden Cove" cores serving as the "big" performance cores, next to eight "little" low-power "Gracemont" cores. The client- and server implementations of "Golden Cove" could differ mainly in the ISA, with the client chip receiving a slightly skimmed AVX-512 and DLBoost instruction-sets, with only client-relevant instructions. The server variant, in addition being optimized for a high core-count multi-core topology; could feature a more substantial AVX-512 and DLBoost implementation relevant for HPC use-cases.

AMD EPYC 7003 Processors to Power Singapore's Fastest Supercomputer

AMD announced that AMD EPYC 7003 Series processors will be used to power a new supercomputer for the National Supercomputing Centre (NSCC) Singapore, the national high-performance computing (HPC) resource center dedicated to supporting science and engineering computing needs.

The system will be based on the HPE Cray EX supercomputer and will use a combination of the EPYC 7763 and EPYC 75F3 processors. The supercomputer is planned to be fully operational by 2022 and is expected to have a peak theoretical performance of 10 petaFLOPS, 8x faster than NSCC's existing pool of HPC resources. Researchers will use the system to advance scientific research across biomedicine, genomics, diseases, climate, and more.

UK Competition Regulator Probes AMD's Buyout of Xilinx

British competition regulator Competition and Markets Authority (CMA) on Monday, launched an enquiry into the ramifications of AMD's buy-out of FPGA maker Xilinx. The agency is soliciting opinions from the public on whether the $35 billion all-stock purchase will make goods and services less competitive for the UK. Unlike NVIDIA's Arm buyout the Xilinx acquisition is seeing no opposition from tech-giants. The Register notes that AMD could combine Xilinx's FPGAs with its x86 CPU and RDNA SIMD to create highly customizable HPC accelerators. AMD president Dr Lisa Su said "By combining our world-class engineering team and deep domain expertise, we will create an industry leader with the vision, talent and scale to define the future of high performance computing."

Samsung Unveils Industry-First Memory Module Incorporating New CXL Interconnect

Samsung Electronics Co., Ltd., the world leader in advanced memory technology, today unveiled the industry's first memory module supporting the new Compute Express Link (CXL) interconnect standard. Integrated with Samsung's Double Data Rate 5 (DDR5) technology, this CXL-based module will enable server systems to significantly scale memory capacity and bandwidth, accelerating artificial intelligence (AI) and high-performance computing (HPC) workloads in data centers.

The rise of AI and big data has been fueling the trend toward heterogeneous computing, where multiple processors work in parallel to process massive volumes of data. CXL—an open, industry-supported interconnect based on the PCI Express (PCIe) 5.0 interface—enables high-speed, low latency communication between the host processor and devices such as accelerators, memory buffers and smart I/O devices, while expanding memory capacity and bandwidth well beyond what is possible today. Samsung has been collaborating with several data center, server and chipset manufacturers to develop next-generation interface technology since the CXL consortium was formed in 2019.

Intel Ponte Vecchio GPU Scores Another Win in Leibniz Supercomputing Centre

Today, Lenovo in partnership with Intel has announced that Leibniz Supercomputing Centre (LRZ) is building a supercomputer powered by Intel's next-generation technologies. Specifically, the supercomputer will use Intel's Sapphire Rapids CPUs in combination with the highly-teased Ponte Vecchio GPUs to power the applications running at Leibniz Supercomputing Centre. Along with the various processors, the LRZ will also deploy Intel Optane persistent memory to process the huge amount of data the LRZ has and is producing. The integration of HPC and AI processing will be enabled by the expansion of LRZ's current supercomputer called SuperMUG-NG, which will receive an upgrade in 2022, which will feature both Sapphire Rapids and Ponte Vecchio.

Mr. Raja Koduri, Intel graphics guru, has on Twitter teased that this supercomputer installment will represent a combination of Sapphire Rapids, Ponte Vecchio, Optane, and One API all in one machine. The system will use over one petabyte of Distributed Asynchronous Object Storage (DAOS) based on the Optane technologies. Then, Mr. Koduri has teased some Ponte Vecchio eye candy, which is a GIF of tiles combining to form a GPU, which you can check out here. You can also see some pictures of Ponte Vecchio below.
Intel Ponte Vecchio GPU Intel Ponte Vecchio GPU Intel Ponte Vecchio GPU Intel Ponte Vecchio GPU

Samsung Announces Availability of Its Next Generation 2.5D Integration Solution I-Cube4 for High-Performance Applications

Samsung Electronics Co., Ltd., a world leader in advanced semiconductor technology, today announced the immediate availability of its next-generation 2.5D packaging technology Interposer-Cube4 (I-Cube4), leading the evolution of chip packaging technology once again. Samsung's I-CubeTM is a heterogeneous integration technology that horizontally places one or more logic dies (CPU, GPU, etc.) and several High Bandwidth Memory (HBM) dies on top of a silicon interposer, making multiple dies operate as a single chip in one package.

Samsung's new I-Cube4, which incorporates four HBMs and one logic die, was developed in March as the successor of I-Cube2. From high-performance computing (HPC) to AI, 5G, cloud and large data center applications, I-Cube4 is expected to bring another level of fast communication and power efficiency between logic and memory through heterogeneous integration.
Return to Keyword Browsing