Tuesday, August 24th 2021

Samsung Brings In-memory Processing Power to Wider Range of Applications

Samsung Electronics the world leader in advanced memory technology, today showcased its latest advancements with processing-in-memory (PIM) technology at Hot Chips 33—a leading semiconductor conference where the most notable microprocessor and IC innovations are unveiled each year. Samsung's revelations include the first successful integration of its PIM-enabled High Bandwidth Memory (HBM-PIM) into a commercialized accelerator system, and broadened PIM applications to embrace DRAM modules and mobile memory, in accelerating the move toward the convergence of memory and logic.

In February, Samsung introduced the industry's first HBM-PIM (Aquabolt-XL), which incorporates the AI processing function into Samsung's HBM2 Aquabolt, to enhance high-speed data processing in supercomputers and AI applications. The HBM-PIM has since been tested in the Xilinx Virtex Ultrascale+ (Alveo) AI accelerator, where it delivered an almost 2.5X system performance gain as well as more than a 60% cut in energy consumption.
"HBM-PIM is the industry's first AI-tailored memory solution being tested in customer AI-accelerator systems, demonstrating tremendous commercial potential," said Nam Sung Kim, senior vice president of DRAM Product & Technology at Samsung Electronics. "Through standardization of the technology, applications will become numerous, expanding into HBM3 for next-generation supercomputers and AI applications, and even into mobile memory for on-device AI as well as for memory modules used in data centers."

"Xilinx has been collaborating with Samsung Electronics to enable high-performance solutions for data center, networking and real-time signal processing applications starting with the Virtex UltraScale+ HBM family, and recently introduced our new and exciting Versal HBM series products," said Arun Varadarajan Rajagopal, senior director, Product Planning at Xilinx, Inc. "We are delighted to continue this collaboration with Samsung as we help to evaluate HBM-PIM systems for their potential to achieve major performance and energy-efficiency gains in AI applications."

DRAM modules powered by PIM
The Acceleration DIMM (AXDIMM) brings processing to the DRAM module itself, minimizing large data movement between the CPU and DRAM to boost the energy efficiency of AI accelerator systems. With an AI engine built inside the buffer chip, the AXDIMM can perform parallel processing of multiple memory ranks (sets of DRAM chips) instead of accessing just one rank at a time, greatly enhancing system performance and efficiency. Since the module can retain its traditional DIMM form factor, the AXDIMM facilitates drop-in replacement without requiring system modifications. Currently being tested on customer servers, the AXDIMM can offer approximately twice the performance in AI-based recommendation applications and a 40% decrease in system-wide energy usage.

"SAP has been continuously collaborating with Samsung on their new and emerging memory technologies to deliver optimal performance on SAP HANA and help database acceleration," said Oliver Rebholz, head of HANA core research & innovation at SAP. "Based on performance projections and potential integration scenarios, we expect significant performance improvements for in-memory database management system (IMDBMS) and higher energy efficiency via disaggregated computing on AXDIMM. SAP is looking to continue its collaboration with Samsung in this area."

Mobile memory that brings AI from data center to device
Samsung's LPDDR5-PIM mobile memory technology can provide independent AI capabilities without data center connectivity. Simulation tests have shown that the LPDDR5-PIM can more than double performance while reducing energy usage by over 60% when used in applications such as voice recognition, translation and chatbot.

Energizing the ecosystem
Samsung plans to expand its AI memory portfolio by working with other industry leaders to complete standardization of the PIM platform in the first half of 2022. The company will also continue to foster a highly robust PIM ecosystem in assuring wide applicability across the memory market.
Add your own comment

9 Comments on Samsung Brings In-memory Processing Power to Wider Range of Applications

#1
bogami
WoW . Nice Bla BLLA , wish to see som disc RAM tests scores !:cool:
Posted on Reply
#3
defaultluser
the closest thing released for memristors has been Intel 3d Xpoint

en.wikipedia.org/wiki/3D_XPoint

It has a lot lower latency than flash, but it is still worse than DRAM for access times and cell life. It's also between Flash and DRAM on density
Posted on Reply
#4
mtcn77
HP has been trying to make them into a PIM computer which is different than a CPU computer in that the instruction is sent to the memory, not to the cpu.
Posted on Reply
#5
defaultluser
mtcn77HP has been trying to make them into a PIM computer which is different than a CPU computer in that the instruction is sent to the memory, not to the cpu.
it's kinda impossible to do without that Memristor - it's the only way you can get distributed compute, along with permanent storage attached.

But if 3d X Point is the future or Memristor, you're going to have ti keep separate cache RAM (Optane performance is too slow)
Posted on Reply
#6
mtcn77
defaultluserit's kinda impossible to do without that Memristor - it's the only way you can get distributed compute, along with permanent storage attached.

But if 3d X Point is the future or Memristor, you're going to have ti keep separate cache RAM (Optane performance is too slow)
It is a whole new ballgame. The data has no coherency issues. The programs are racing towards the data. However there might be issues with integrity since programs inherently change the operands unless they are saved for a backup first. Crazy architecture...
Posted on Reply
#7
Wirko
The second illustration is immensely informative. Here's another one that reveals a little bit more:
Posted on Reply
#8
defaultluser
mtcn77It is a whole new ballgame. The data has no coherency issues. The programs are racing towards the data. However there might be issues with integrity since programs inherently change the operands unless they are saved for a backup first. Crazy architecture...
You're still going to have timing issues (there will always be a delay between accessioning different parts of your distributed compute), so you're stuck going Asynbcchronius Compute. But you will still need to figure out how to interconnect all thos data lines between DSP blocks

Coherency is a relatively tame beast, by-comparison.
Posted on Reply
#9
mtcn77
defaultluserYou're dstill going to have toiming issues (there will always be a delay between accessioning different parts of your distributed compute).

Coherency is a relativly tabe beast, by-comparison.
I said it as a security problem, same as meltdown.
Posted on Reply