Monday, October 18th 2021

Apple Introduces M1 Pro and M1 Max: the Most Powerful Chips Apple Has Ever Built

Apple today announced M1 Pro and M1 Max, the next breakthrough chips for the Mac. Scaling up M1's transformational architecture, M1 Pro offers amazing performance with industry-leading power efficiency, while M1 Max takes these capabilities to new heights. The CPU in M1 Pro and M1 Max delivers up to 70 percent faster CPU performance than M1, so tasks like compiling projects in Xcode are faster than ever. The GPU in M1 Pro is up to 2x faster than M1, while M1 Max is up to an astonishing 4x faster than M1, allowing pro users to fly through the most demanding graphics workflows.

M1 Pro and M1 Max introduce a system-on-a-chip (SoC) architecture to pro systems for the first time. The chips feature fast unified memory, industry-leading performance per watt, and incredible power efficiency, along with increased memory bandwidth and capacity. M1 Pro offers up to 200 GB/s of memory bandwidth with support for up to 32 GB of unified memory. M1 Max delivers up to 400 GB/s of memory bandwidth—2x that of M1 Pro and nearly 6x that of M1—and support for up to 64 GB of unified memory. And while the latest PC laptops top out at 16 GB of graphics memory, having this huge amount of memory enables graphics-intensive workflows previously unimaginable on a notebook. The efficient architecture of M1 Pro and M1 Max means they deliver the same level of performance whether MacBook Pro is plugged in or using the battery. M1 Pro and M1 Max also feature enhanced media engines with dedicated ProRes accelerators specifically for pro video processing. M1 Pro and M1 Max are by far the most powerful chips Apple has ever built.
"M1 has transformed our most popular systems with incredible performance, custom technologies, and industry-leading power efficiency. No one has ever applied a system-on-a-chip design to a pro system until today with M1 Pro and M1 Max," said Johny Srouji, Apple's senior vice president of Hardware Technologies. "With massive gains in CPU and GPU performance, up to six times the memory bandwidth, a new media engine with ProRes accelerators, and other advanced technologies, M1 Pro and M1 Max take Apple silicon even further, and are unlike anything else in a pro notebook."

M1 Pro: A Whole New Level of Performance and Capability
Utilizing the industry-leading 5-nanometer process technology, M1 Pro packs in 33.7 billion transistors, more than 2x the amount in M1. A new 10-core CPU, including eight high-performance cores and two high-efficiency cores, is up to 70 percent faster than M1, resulting in unbelievable pro CPU performance. Compared with the latest 8-core PC laptop chip, M1 Pro delivers up to 1.7x more CPU performance at the same power level and achieves the PC chip's peak performance using up to 70 percent less power. Even the most demanding tasks, like high-resolution photo editing, are handled with ease by M1 Pro.
M1 Pro has an up-to-16-core GPU that is up to 2x faster than M1 and up to 7x faster than the integrated graphics on the latest 8-core PC laptop chip. Compared to a powerful discrete GPU for PC notebooks, M1 Pro delivers more performance while using up to 70 percent less power. And M1 Pro can be configured with up to 32 GB of fast unified memory, with up to 200 GB/s of memory bandwidth, enabling creatives like 3D artists and game developers to do more on the go than ever before.
M1 Max: The World's Most Powerful Chip for a Pro Notebook
M1 Max features the same powerful 10-core CPU as M1 Pro and adds a massive 32-core GPU for up to 4x faster graphics performance than M1. With 57 billion transistors—70 percent more than M1 Pro and 3.5x more than M1—M1 Max is the largest chip Apple has ever built. In addition, the GPU delivers performance comparable to a high-end GPU in a compact pro PC laptop while consuming up to 40 percent less power, and performance similar to that of the highest-end GPU in the largest PC laptops while using up to 100 watts less power. This means less heat is generated, fans run quietly and less often, and battery life is amazing in the new MacBook Pro. M1 Max transforms graphics-intensive workflows, including up to 13x faster complex timeline rendering in Final Cut Pro compared to the previous-generation 13-inch MacBook Pro.
M1 Max also offers a higher-bandwidth on-chip fabric, and doubles the memory interface compared with M1 Pro for up to 400 GB/s, or nearly 6x the memory bandwidth of M1. This allows M1 Max to be configured with up to 64 GB of fast unified memory. With its unparalleled performance, M1 Max is the most powerful chip ever built for a pro notebook.

Fast, Efficient Media Engine, Now with ProRes
M1 Pro and M1 Max include an Apple-designed media engine that accelerates video processing while maximizing battery life. M1 Pro also includes dedicated acceleration for the ProRes professional video codec, allowing playback of multiple streams of high-quality 4K and 8K ProRes video while using very little power. M1 Max goes even further, delivering up to 2x faster video encoding than M1 Pro, and features two ProRes accelerators. With M1 Max, the new MacBook Pro can transcode ProRes video in Compressor up to a remarkable 10x faster compared with the previous-generation 16-inch MacBook Pro.
Advanced Technologies for a Complete Pro System
Both M1 Pro and M1 Max are loaded with advanced custom technologies that help push pro workflows to the next level:
  • A 16-core Neural Engine for on-device machine learning acceleration and improved camera performance.
  • A new display engine drives multiple external displays.
  • Additional integrated Thunderbolt 4 controllers provide even more I/O bandwidth.
  • Apple's custom image signal processor, along with the Neural Engine, uses computational video to enhance image quality for sharper video and more natural-looking skin tones on the built-in camera.
  • Best-in-class security, including Apple's latest Secure Enclave, hardware-verified secure boot, and runtime anti-exploitation technologies.A Huge Step in the Transition to Apple Silicon
  • The Mac is now one year into its two-year transition to Apple silicon, and M1 Pro and M1 Max represent another huge step forward. These are the most powerful and capable chips Apple has ever created, and together with M1, they form a family of chips that lead the industry in performance, custom technologies, and power efficiency.
macOS and Apps Unleash the Capabilities of M1 Pro and M1 Max
macOS Monterey is engineered to unleash the power of M1 Pro and M1 Max, delivering breakthrough performance, phenomenal pro capabilities, and incredible battery life. By designing Monterey for Apple silicon, the Mac wakes instantly from sleep, and the entire system is fast and incredibly responsive. Developer technologies like Metal let apps take full advantage of the new chips, and optimizations in Core ML utilize the powerful Neural Engine so machine learning models can run even faster. Pro app workload data is used to help optimize how macOS assigns multi-threaded tasks to the CPU cores for maximum performance, and advanced power management features intelligently allocate tasks between the performance and efficiency cores for both incredible speed and battery life.

The combination of macOS with M1, M1 Pro, or M1 Max also delivers industry-leading security protections, including hardware-verified secure boot, runtime anti-exploitation technologies, and fast, in-line encryption for files. All of Apple's Mac apps are optimized for—and run natively on—Apple silicon, and there are over 10,000 Universal apps and plug-ins available. Existing Mac apps that have not yet been updated to Universal will run seamlessly with Apple's Rosetta 2 technology, and users can also run iPhone and iPad apps directly on the Mac, opening a huge new universe of possibilities.
Apple's Commitment to the Environment
Today, Apple is carbon neutral for global corporate operations, and by 2030, plans to have net-zero climate impact across the entire business, which includes manufacturing supply chains and all product life cycles. This also means that every chip Apple creates, from design to manufacturing, will be 100 percent carbon neutral.
Add your own comment

156 Comments on Apple Introduces M1 Pro and M1 Max: the Most Powerful Chips Apple Has Ever Built

#51
dragontamer5788
Arc1t3ctThe M1 Max, at least on paper, makes every other CPU seem like a decade out of date... How can this be?
The only crazy thing I'm seeing so far is the high LPDDR5 bandwidth of 400GBps.

I'm not really seeing anything else super-special about this actually. EDIT: 5nm is also cool, but that's largely TSMC + a function of Apple's money. TSMC is very advanced, and Apple can afford the best.
Posted on Reply
#52
ValenOne
dragontamer5788That's pretty big. I'm curious how this memory system works.

Its big enough that I'm instinctively thinking that's a typo there. 400GB/s is huge for a CPU / iGPU. The only systems close to that are XBox / PS5 game consoles with GDDR graphics ram.
AMD 4700S has 256-bit GDDR6-14000 i.e. PS5 recycled APU for the PC market.
Posted on Reply
#53
r9
Oh shiiiiiiiiiiit ....
The M1 was running Witcher 3 x86 at 30fps, I really want to see what this monster can do with 4x gpu power on games that run natively.
Posted on Reply
#54
Richards
ValantarWe know how the M1 performs - in terms of IPC it trounces both Intel and AMD, matching or beating their peak single core performance at 2/3-3/5 the clock speed (3.1GHz vs 4.8/5.3-ish). These more than double the core counts, and double/quadruple the memory bandwidth to feed the cores. Also, Apple has absolutely insane amounts of cache (at equally insane latencies) with their recent chips. This will be a beast, it just needs software to make use of the power. Which it likely will have (Adobe CS etc. are already native).

The interfaces are 256-bit (M1 Pro) and 512-bit (M1 Max). Probably a bit power hungry, sure, but they are mounted extremely close to the SoC, on the same package, so they've likely optimized for that. Plus, these are 40-60W SoCs. The memory power isn't going to be an issue.
Rafael h and alder lake s will destroy overrated m1
Posted on Reply
#55
billEST
r9Oh shiiiiiiiiiiit ....
The M1 was running Witcher 3 x86 at 30fps, I really want to see what this monster can do with 4x gpu power on games that run natively.
easy : look xbox serie S or ps5 : slide write 10.4 TFLOP
Posted on Reply
#56
Ravenas
r9Oh shiiiiiiiiiiit ....
The M1 was running Witcher 3 x86 at 30fps, I really want to see what this monster can do with 4x gpu power on games that run natively.
30 fps, 1080p, and lowest settings. Maybe you will get 30 fps, 1080p, and max settings.
Posted on Reply
#57
ValenOne
dragontamer5788CPUs have to transfer data to the GPUs all the time (and sometimes rarely, maybe a GPU->CPU transfer). One of the key advantages of a SOC is that this "data transfer" takes place in L3 cache instead of over system memory.

I find it hard to believe that Microsoft would design a SOC like the XBox Series X and ignore this simple and useful optimization. I see that Microsoft is playing cute games with its 10+6 GB layout, but I'm pretty sure they're just saying that CPUs use less memory bandwidth, so 10GB of fast-RAM + 6GB of slow-RAM is intended for the CPU to use slow-RAM and GPU to use fast-RAM. But both CPU+GPU should have access to both halfs.

If for no other reason than to optimize the "no copy" methodology between CPU -> GPU data transfers. (Why ever copy data when GPUs can simply just read the RAM themselves?). In dGPU world, you need to transfer the data over PCIe because the VRAM is physically a different chip. But in XBox Series X land, VRAM and RAM are literally the same chips, no copying needed.
1. For games, the shared memory usage is relatively minor. PC has reBar resize that enabled PC CPU to directly access the entire GPU's VRAM. CPU wouldn't be able to keep up with dGPU's large-scale scatter-gather capability.

2. Shared memory has its downsides with context switch overheads. CPU IO access can gimp GPU's burst mode IO access e.g. frame buffer burst IO access shouldn't be disturbed.

Late 1980s Amiga's Chip Ram is shared memory between the CPU and iGPU (custom chips).
Posted on Reply
#58
Aquinus
Resident Wat-man
TheoneandonlyMrK6K , with what sounds like 3.5k minimum spend for 32 GB ram, and in some cases it's said (look around I'm not posting links to other tech sites)to be beat by the outgoing Intel 9th gen chip's sooo, there's that.
Go one their site, I just priced it out. $4,200 for the Max, 64GB of memory, and a 2TB drive which is about what I paid (sans discounts I can get,) for mine in my specs. That's really not bad considering what you're getting if you're comparing it to the previous 16". In that respect, Apple has kept pricing consistent, but has theoretically given it an absolutely massive performance uplift within the same power constraints.

Edit: Mind you that these are US prices in USD.
Posted on Reply
#59
ValenOne
TheoneandonlyMrKUnified is exactly like the Ps5 and Xbox.
One pool of memory for any use.
So apple clearly were not first and are doing something similar..
The GPU or CPU Can make memory calls in those.
Though inevitably the MMU is going to be on the edge of the soc on a buss.
1985 era Amiga 1000 has a shared memory design.
Posted on Reply
#60
Bomby569
AquinusGo one their site, I just priced it out. $4,200 for the Max, 64GB of memory, and a 2TB drive which is about what I paid (sans discounts I can get,) for mine in my specs. That's really not bad considering what you're getting if you're comparing it to the previous 16". In that respect, Apple has kept pricing consistent, but has theoretically given it an absolutely massive performance uplift within the same power constraints.

Edit: Mind you that these are US prices in USD.
I wonder how can people seriously consider buying that crap when they know, or should at least, what they do with anti consumer BS?
Posted on Reply
#61
Aquinus
Resident Wat-man
Bomby569I wonder how can people seriously consider buying that crap when they know, or should at least, what they do with anti consumer BS?
For being so anti-consumer, they sure do make a good machine for work and play if you can afford it.
Posted on Reply
#62
xkm1948
That M1 Max GPU might be amazing for mining cryptos
Posted on Reply
#63
MxPhenom 216
ASIC Engineer
TiggerThese TSMC? or?

Very interesting Apple upping the ante
Pretty certain its 5nm TSMC node.
Posted on Reply
#64
apoklyps3
man does apple make me laugh with their closed limited OSes and their potato mobile processors.
Posted on Reply
#65
BorisDG
Interesting they are still based on the A14 platform and not A15.
Posted on Reply
#66
windwhirl
apoklyps3man does apple make me laugh with their closed limited OSes and their potato mobile processors.
Oh yeah, it's just a mobile SoC built using the most advanced node available in the world that can probably beat every other mobile SoC around singlehandedly. Nothing worthy a second of actual interest. /s

:rolleyes:
Posted on Reply
#67
Flanker
BorisDGInteresting they are still based on the A14 platform and not A15.
A15 based will probably be call M2 or something. I think Apple is still trying to find out what happens when these chips are scaled up
Posted on Reply
#68
apoklyps3
windwhirlOh yeah, it's just a mobile SoC built using the most advanced node available in the world that can probably beat every other mobile SoC around singlehandedly. Nothing worthy a second of actual interest. /s

:rolleyes:
proof or didn't happen
Posted on Reply
#69
Fourstaff
BorisDGInteresting they are still based on the A14 platform and not A15.
They probably designed this based on the A14 while another team was working on the A15.
Posted on Reply
#70
R0H1T
ValantarIt's a bit strange for you to bring up the Epyc/TR comparison just to then say it's not a valid comparison once people get into why this is likely to be more efficient.
That's because off hand I can't think of any other chip(s) that move such vast sums of data between massive cores, in case of Apple it's also the GPU cores now, & pay a heavy (energy) price for that. Moving (lots of) data quickly is the next big hurdle in computing & the SoC approach for now seems to be more efficient ~ the reason why it isn't directly comparable because even now the top end server chips should beat Apple in most tasks they're actually designed for but they're also generally less efficient. The SoC approach isn't really scalable beyond low double digit CPU cores especially if you're putting such a massive GPU in there!
Posted on Reply
#71
apoklyps3
seems everybody should drop epyc processors for their servers. they are obsolete :roll:
long time since I had such a good laugh
no wonder apple makes millions. you guys would believe about anything they say
Posted on Reply
#72
R0H1T
Right, no one's saying that unless you meant some other poster?

The EPYC/TR way is meant for massive amounts of CPU cores which Apple doesn't seem to need right now. That's also in part due to the dedicated accelerators they're using for a lot of tasks. IIRC zen4 (5?) will introduce similar accelerators on die probably courtesy their Xilinx acquisition. My biggest curiosity then would be how much efficient their monolithic (APU) dies would be wrt the M1 & now M1 Pro & Max.
Posted on Reply
#73
Richards
windwhirlOh yeah, it's just a mobile SoC built using the most advanced node available in the world that can probably beat every other mobile SoC around singlehandedly. Nothing worthy a second of actual interest. /s

:rolleyes:
They wouldn't beat amd on the same node thou.. zen 4 on 5nm will crush this expensive chip
Posted on Reply
#74
Valantar
Minus InfinityOr AMD is going to be doing that with Zen 4/RDNA3. The consoles APU's are custom designs, not a straight up Zen 2 design. They have features not in the deskptop APU's.
And? Unified memory isn't just a hardware feature, it's a hardware+OS feature. And there's no indication that either XSX or PS5 have truly unified memory.
WirkoWhat's the difference? Is the memory "truly unified" only if memory access is governed by a single MMU for both CPU and GPU?
No, it must also be accessible to the entire system without the need for copying.
dragontamer5788I mean... its called the PS5 / XBox Series X.

I'm pretty sure they have unified memory. Hell, CUDA + CPU / OpenCL + CPU has unified memory. Its just emulated over PCIe. PS5 / XBox Series X actually have the same, literal RAM work for the iGPU side and CPU side.
It's still walled off, and needs copying, thus it isn't actually unified.
TheoneandonlyMrKUnified is exactly like the Ps5 and Xbox.
One pool of memory for any use.
So apple clearly were not first and are doing something similar..
The GPU or CPU Can make memory calls in those.
Though inevitably the MMU is going to be on the edge of the soc on a buss.
See above. It is only truly unified if every component has full access to RAM, which is what Apple is claiming here. No PC or current x86-based platform has that.
dragontamer5788CPUs have to transfer data to the GPUs all the time (and sometimes rarely, maybe a GPU->CPU transfer). One of the key advantages of a SOC is that this "data transfer" takes place in L3 cache instead of over system memory.

I find it hard to believe that Microsoft would design a SOC like the XBox Series X and ignore this simple and useful optimization. I see that Microsoft is playing cute games with its 10+6 GB layout, but I'm pretty sure they're just saying that CPUs use less memory bandwidth, so 10GB of fast-RAM + 6GB of slow-RAM is intended for the CPU to use slow-RAM and GPU to use fast-RAM. But both CPU+GPU should have access to both halfs.

If for no other reason than to optimize the "no copy" methodology between CPU -> GPU data transfers. (Why ever copy data when GPUs can simply just read the RAM themselves?). In dGPU world, you need to transfer the data over PCIe because the VRAM is physically a different chip. But in XBox Series X land, VRAM and RAM are literally the same chips, no copying needed.
But copying is needed for those, as the CPU and GPU have discrete areas of memory set aside for them.
WirkoIsn't that the case with every Intel and AMD processor with integrated graphics? At least since Haswell for Intel (AnandTech) and since Kaveri for AMD (Wikipedia).
No, iGPUs have system memory set aside for them - some static, some dynamic. This memory is not accessible to the CPU, and regular system memory is not accessible to the iGPU, necessitating copying data between the two.
Darmok N JaladAnandtech is speculating it’s probably 64MB on the Max, 32MB on the Pro. They are looking at the actual die shots (provided in the presentation, interestingly), not the illustrative diagram Apple used in the presentation.
www.anandtech.com/show/17019/apple-announced-m1-pro-m1-max-giant-new-socs-with-allout-performance
That's lower than I would have expected, but then diagrams are always misleading. I wonder if that judgement is correct though, as the new SLC blocks look much bigger than on the M1, which had 16MB. On the M1 the SLC block is slightly larger than two GPU "cores", on the M1P/M it's larger than four. Of course, not all of this is actually cache, and a lot of it is likely interconnects and other stuff, but 2x16MB still seems low to me.
dragontamer5788Yeah, its not a new feature at all.



But as Wirko has pointed out: this isn't new at all. Intel / AMD chips have been doing zero-copy transfers on Windows for nearly a decade now on its iGPUs.

Yes, that is even on Windows 10, which is HyperV virtualized for security purposes. (The most secure parts of Windows start up in a separate VM these days, so that not even a kernel-level hack can reach those secrets... unless it also includes a VM-break of some kind)

Now don't get me wrong: XBox Series X has a weird / complicated memory scheme going on. But I'd still expect that this extremely strange memory scheme was unified, much akin to AMD's Kaveri or Intel iGPU stuffs that you'd find on any typical iGPU for the past decade.
It clearly isn't, when they wall off sections of RAM for the OS, CPU software and GPU software. Discrete memory regions implies that copying is needed between them, which means it isn't unified.
Arc1t3ctThe M1 Max, at least on paper, makes every other CPU seem like a decade out of date... How can this be?
Money, mainly. Apple can afford to outspend everyone on R&D, by a huge margin.
rvalencia1. For games, the shared memory usage is relatively minor. PC has reBar resize that enabled PC CPU to directly access the entire GPU's VRAM. CPU wouldn't be able to keep up with dGPU's large-scale scatter-gather capability.

2. Shared memory has its downsides with context switch overheads. CPU IO access can gimp GPU's burst mode IO access e.g. frame buffer burst IO access shouldn't be disturbed.

Late 1980s Amiga's Chip Ram is shared memory between the CPU and iGPU (custom chips).
ReBAR doesn't have anything to do with this - it allows the CPU to write to the entire VRAM rather than smaller chunks, but the CPU still can't work off of VRAM - it needs copying to system RAM for the CPU to work on it. You're right that shared memory has its downsides, but with many times the bandwidth of any x86 CPU (and equal to many dGPUs) I doubt that will be a problem, especially considering Apple's penchant for massive caches.
R0H1TThat's because off hand I can't think of any other chip(s) that move such vast sums of data between massive cores, in case of Apple it's also the GPU cores now, & pay a heavy (energy) price for that. Moving (lots of) data quickly is the next big hurdle in computing & the SoC approach for now seems to be more efficient ~ the reason why it isn't directly comparable because even now the top end server chips should beat Apple in most tasks they're actually designed for but they're also generally less efficient. The SoC approach isn't really scalable beyond low double digit CPU cores especially if you're putting such a massive GPU in there!
Yes, but that's precisely why pointing out that the M1P/M are monolithic allows for huge power savings as they don't need off-die interfaces for most of this. Keeping data on silicon is a massive power savings. Of course they're working with 10 (8+2) CPU cores and an 8-"core" GPU, not a 32-64-core CPU, so the interfaces can also be much, much simpler.
RichardsThey wouldn't beat amd on the same node thou.. zen 4 on 5nm will crush this expensive chip
That's debatable. Apple's architecture team is doing some incredible work the past years. Their cache architecture (which is something that doesn't gain that much from node changes) is far superior to anything else (look at the cache access benchmarks in the AnandTech article I linked above), and their huge CPU cores have a >50% IPC lead over both Intel and AMD, matching their performance at much lower clocks (in part thanks to those huge, low-latency caches, but not only that). A higher core count chip from AMD will still likely win in a 100% MT workload, but the power difference is likely to be significant.
apoklyps3proof or didn't happen
Here's Anandtech's SPEC2006 and SPEC2017 testing of the M1. Those are industry standard benchmarks for ST performance, and the M1 rivals the 5950X at a fraction of the power, and much lower clocks. These chips use the same architecture but with more cache, more RAM, and a much higher power budget.
Posted on Reply
#75
R0H1T
ValantarYes, but that's precisely why pointing out that the M1P/M are monolithic allows for huge power savings as they don't need off-die interfaces for most of this. Keeping data on silicon is a massive power savings. Of course they're working with 10 (8+2) CPU cores and an 8-"core" GPU, not a 32-64-core CPU, so the interfaces can also be much, much simpler.
My point is/was that Apple has at least 3-4 things that make the Mxx chips so much more efficient, I won't count ARM in there so ~
  • 5nm node
  • LPDDR5
  • monolithic die
  • UMA for much more efficient use of memory
AMD or Intel need at least 3 of them to come close to some of the claims Apple are making today, IIRC AMD is planning at least 2 if not 3 of these for their zen4/5 APUs so I'm honestly expecting much higher efficiency numbers especially if they can bring true unified memory access quickly.
Posted on Reply
Add your own comment
Apr 26th, 2024 22:11 EDT change timezone

New Forum Posts

Popular Reviews

Controversial News Posts