Wednesday, March 4th 2020

Ampere Computing Uncovers 80 Core "Cloud-Native" Arm Processor

Ampere Computing, a startup focusing on making HPC and processors from cloud applications based on Arm Instruction Set Architecture, today announced the release of a first 80 core "cloud-native" processor based on the Arm ISA. The new Ampere Altra CPU is the company's first 80 core CPU meant for hyper scalers like Amazon AWS, Microsoft Azure, and Google Cloud. Being built on TSMC's 7 nm semiconductor manufacturing process, the Altra is a CPU that is utilizing a monolithic die to achieve maximum performance. Using Arm's v8.2+ instruction set, the CPU is using the Neoverse N1 platform as its core, to be ready for any data center workload needed. It also borrows a few security features from v8.3 and v8.5, namely the hardware mitigations of speculative attacks.

When it comes to the core itself, the CPU is running at 3.0 GHz frequency and has some very interesting specifications. The design of the core is such that it is 4-wide superscalar Out of Order Execution (OoOE), which Ampere refers to as "aggressive" meaning that there is a lot of data throughput going on. The cache levels are structured in a way that there is 64 KB of L1D and L1I cache per core, along with 1 MB of L2 cache per core as well. For system-level cache, there is 32 MB of L3 available to the SoC. All of the caches have Error-correcting code (ECC) built-in, giving the CPU a much-needed feature. There are two 128-bit wide Single Instruction Multiple Data (SIMD) units, which are there to do parallel processing if needed. There is no mention if they implement Arm's Scalable Vector Extensions (SVE) or not.
The SoC is capable of handling 8-channel DDR4 memory running at 3200 MHz, and it supports up to 4 TB of memory per socket. Given that the CPU is also available in dual-socket configurations, you can get up to 8 TB of RAM in your system. From the CPU, there are 128 PCIe 4.0 lanes coming, however, if you opt to use a dual-socket configuration, 32 of those PCIe lanes are wasted on CPU-to-CPU communication and connection. That makes for a total of 192 PCIe 4.0 lanes in the dual-socket configuration, which is a decent amount. Of course, if a system like this wants to be a solid choice for hyper scalers, there needs to be a cache coherency protocol in place. Ampere is implementing the CCIX protocol here that runs over the PCIe lanes and it provides speeds of 25 GB/s per x16 slot. Whole SoC runs anywhere from 45 W to 210 W of TDP, given the core amount. The exact details on available SKUs are unknown yet.
Show 30 Comments