Monday, October 18th 2021

Apple Introduces M1 Pro and M1 Max: the Most Powerful Chips Apple Has Ever Built

Apple today announced M1 Pro and M1 Max, the next breakthrough chips for the Mac. Scaling up M1's transformational architecture, M1 Pro offers amazing performance with industry-leading power efficiency, while M1 Max takes these capabilities to new heights. The CPU in M1 Pro and M1 Max delivers up to 70 percent faster CPU performance than M1, so tasks like compiling projects in Xcode are faster than ever. The GPU in M1 Pro is up to 2x faster than M1, while M1 Max is up to an astonishing 4x faster than M1, allowing pro users to fly through the most demanding graphics workflows.

M1 Pro and M1 Max introduce a system-on-a-chip (SoC) architecture to pro systems for the first time. The chips feature fast unified memory, industry-leading performance per watt, and incredible power efficiency, along with increased memory bandwidth and capacity. M1 Pro offers up to 200 GB/s of memory bandwidth with support for up to 32 GB of unified memory. M1 Max delivers up to 400 GB/s of memory bandwidth—2x that of M1 Pro and nearly 6x that of M1—and support for up to 64 GB of unified memory. And while the latest PC laptops top out at 16 GB of graphics memory, having this huge amount of memory enables graphics-intensive workflows previously unimaginable on a notebook. The efficient architecture of M1 Pro and M1 Max means they deliver the same level of performance whether MacBook Pro is plugged in or using the battery. M1 Pro and M1 Max also feature enhanced media engines with dedicated ProRes accelerators specifically for pro video processing. M1 Pro and M1 Max are by far the most powerful chips Apple has ever built.
"M1 has transformed our most popular systems with incredible performance, custom technologies, and industry-leading power efficiency. No one has ever applied a system-on-a-chip design to a pro system until today with M1 Pro and M1 Max," said Johny Srouji, Apple's senior vice president of Hardware Technologies. "With massive gains in CPU and GPU performance, up to six times the memory bandwidth, a new media engine with ProRes accelerators, and other advanced technologies, M1 Pro and M1 Max take Apple silicon even further, and are unlike anything else in a pro notebook."

M1 Pro: A Whole New Level of Performance and Capability
Utilizing the industry-leading 5-nanometer process technology, M1 Pro packs in 33.7 billion transistors, more than 2x the amount in M1. A new 10-core CPU, including eight high-performance cores and two high-efficiency cores, is up to 70 percent faster than M1, resulting in unbelievable pro CPU performance. Compared with the latest 8-core PC laptop chip, M1 Pro delivers up to 1.7x more CPU performance at the same power level and achieves the PC chip's peak performance using up to 70 percent less power. Even the most demanding tasks, like high-resolution photo editing, are handled with ease by M1 Pro.
M1 Pro has an up-to-16-core GPU that is up to 2x faster than M1 and up to 7x faster than the integrated graphics on the latest 8-core PC laptop chip. Compared to a powerful discrete GPU for PC notebooks, M1 Pro delivers more performance while using up to 70 percent less power. And M1 Pro can be configured with up to 32 GB of fast unified memory, with up to 200 GB/s of memory bandwidth, enabling creatives like 3D artists and game developers to do more on the go than ever before.
M1 Max: The World's Most Powerful Chip for a Pro Notebook
M1 Max features the same powerful 10-core CPU as M1 Pro and adds a massive 32-core GPU for up to 4x faster graphics performance than M1. With 57 billion transistors—70 percent more than M1 Pro and 3.5x more than M1—M1 Max is the largest chip Apple has ever built. In addition, the GPU delivers performance comparable to a high-end GPU in a compact pro PC laptop while consuming up to 40 percent less power, and performance similar to that of the highest-end GPU in the largest PC laptops while using up to 100 watts less power. This means less heat is generated, fans run quietly and less often, and battery life is amazing in the new MacBook Pro. M1 Max transforms graphics-intensive workflows, including up to 13x faster complex timeline rendering in Final Cut Pro compared to the previous-generation 13-inch MacBook Pro.
M1 Max also offers a higher-bandwidth on-chip fabric, and doubles the memory interface compared with M1 Pro for up to 400 GB/s, or nearly 6x the memory bandwidth of M1. This allows M1 Max to be configured with up to 64 GB of fast unified memory. With its unparalleled performance, M1 Max is the most powerful chip ever built for a pro notebook.

Fast, Efficient Media Engine, Now with ProRes
M1 Pro and M1 Max include an Apple-designed media engine that accelerates video processing while maximizing battery life. M1 Pro also includes dedicated acceleration for the ProRes professional video codec, allowing playback of multiple streams of high-quality 4K and 8K ProRes video while using very little power. M1 Max goes even further, delivering up to 2x faster video encoding than M1 Pro, and features two ProRes accelerators. With M1 Max, the new MacBook Pro can transcode ProRes video in Compressor up to a remarkable 10x faster compared with the previous-generation 16-inch MacBook Pro.
Advanced Technologies for a Complete Pro System
Both M1 Pro and M1 Max are loaded with advanced custom technologies that help push pro workflows to the next level:
  • A 16-core Neural Engine for on-device machine learning acceleration and improved camera performance.
  • A new display engine drives multiple external displays.
  • Additional integrated Thunderbolt 4 controllers provide even more I/O bandwidth.
  • Apple's custom image signal processor, along with the Neural Engine, uses computational video to enhance image quality for sharper video and more natural-looking skin tones on the built-in camera.
  • Best-in-class security, including Apple's latest Secure Enclave, hardware-verified secure boot, and runtime anti-exploitation technologies.A Huge Step in the Transition to Apple Silicon
  • The Mac is now one year into its two-year transition to Apple silicon, and M1 Pro and M1 Max represent another huge step forward. These are the most powerful and capable chips Apple has ever created, and together with M1, they form a family of chips that lead the industry in performance, custom technologies, and power efficiency.
macOS and Apps Unleash the Capabilities of M1 Pro and M1 Max
macOS Monterey is engineered to unleash the power of M1 Pro and M1 Max, delivering breakthrough performance, phenomenal pro capabilities, and incredible battery life. By designing Monterey for Apple silicon, the Mac wakes instantly from sleep, and the entire system is fast and incredibly responsive. Developer technologies like Metal let apps take full advantage of the new chips, and optimizations in Core ML utilize the powerful Neural Engine so machine learning models can run even faster. Pro app workload data is used to help optimize how macOS assigns multi-threaded tasks to the CPU cores for maximum performance, and advanced power management features intelligently allocate tasks between the performance and efficiency cores for both incredible speed and battery life.

The combination of macOS with M1, M1 Pro, or M1 Max also delivers industry-leading security protections, including hardware-verified secure boot, runtime anti-exploitation technologies, and fast, in-line encryption for files. All of Apple's Mac apps are optimized for—and run natively on—Apple silicon, and there are over 10,000 Universal apps and plug-ins available. Existing Mac apps that have not yet been updated to Universal will run seamlessly with Apple's Rosetta 2 technology, and users can also run iPhone and iPad apps directly on the Mac, opening a huge new universe of possibilities.
Apple's Commitment to the Environment
Today, Apple is carbon neutral for global corporate operations, and by 2030, plans to have net-zero climate impact across the entire business, which includes manufacturing supply chains and all product life cycles. This also means that every chip Apple creates, from design to manufacturing, will be 100 percent carbon neutral.
Add your own comment

156 Comments on Apple Introduces M1 Pro and M1 Max: the Most Powerful Chips Apple Has Ever Built

#1
Lucas_
This is getting intresting!!
Posted on Reply
#2
Tigger
I'm the only one
These TSMC? or?

Very interesting Apple upping the ante
Posted on Reply
#3
Vya Domus
The SoCs themselves are meh, in the sense that they're just scaled up variants of M1, nothing particularly interesting there. Those memory bandwidth claims are however rather curios, I can't see how they'd achieve that other than by using GDDR6 modules. Besides the performance implications (some of which being negative, actually) the other thing is that they're pretty power hungry and dissipate a lot of heat (probably more than the SoC itself). To put up to 64GB of that in a laptop is a questionable choice.

Also, these have gotten soo big with such a large transistor budget that it now defeats the purpose of having an SoC in the first place.

Edit: Apparently it's LPDDR5, which is still kind of stupid because that means a very wide interface, which is also power hungry.
Posted on Reply
#4
heinztvoert
I saw the Event. So, they took away the ports - forcing us to buy a bunch of peripherals and adapters. They took away HDMI, SD card, etc.. Now they bring them all back like if it was a big deal. I was never a fan of the touch bar, but now that am used to it I kind of like it - so now they take it away. They say they introduce cutting edge innovative products - Yes sorry I don't see where the innovation or the cutting-edge is.
Posted on Reply
#5
Initialised
I need to see some cross platform benchmarks on the full fat M1 Max.
Posted on Reply
#6
dragontamer5788
AleksandarKM1 Max also offers a higher-bandwidth on-chip fabric, and doubles the memory interface compared with M1 Pro for up to 400 GB/s
That's pretty big. I'm curious how this memory system works.

Its big enough that I'm instinctively thinking that's a typo there. 400GB/s is huge for a CPU / iGPU. The only systems close to that are XBox / PS5 game consoles with GDDR graphics ram.
Posted on Reply
#7
bonehead123
I'm all for progress & innovation & new designs etc, but the REAL question is:

How many arms, legs, kidneys, 1st born children, and banker's gold cards do we need to sacrifice in order to buy a machine with these chips in them hahahahaha ???
Posted on Reply
#8
P4-630

-------------------------------------------------------------------------
bonehead123How many arms, legs, kidneys, 1st born children, and banker's gold cards do we need to sacrifice in order to buy a machine with these chips in them hahahahaha ???
Here in Europe:
The laptops can be ordered immediately and will be delivered from October 26.
The 14-inch MacBook Pro has a starting price of 2249 euros and the 16-inch model starting at 2749 euros.
Posted on Reply
#9
seth1911
Yeah most powerfull for a closed OS,
PS3 can even run GTA 5 too, with the Hardware like a PS3 u cant even run GTA 4 on 30 FPS on PC

PS3
IBM Cell 7 Cores with 256MB RAM (System use about 64MB) in real 192MB, and a Geforce with 256MB GPU RAM
Posted on Reply
#10
windwhirl
heinztvoertYes sorry I don't see where the innovation or the cutting-edge is.
The cutting edge is for cutting away everything you like /joke
TiggerThese TSMC? or?
TSMC. No one else has a better node. Dare I say no one else has an equal node?
Vya DomusApparently it's LPDDR5,
Uh, I thought that was low power?
InitialisedI need to see some cross platform benchmarks on the full fat M1 Max.
Best you might get is Linux.
bonehead123How many arms, legs, kidneys, 1st born children, and banker's gold cards do we need to sacrifice in order to buy a machine with these chips in them hahahahaha ???
Yes.
Posted on Reply
#11
JalleR
bonehead123I'm all for progress & innovation & new designs etc, but the REAL question is:

How many arms, legs, kidneys, 1st born children, and banker's gold cards do we need to sacrifice in order to buy a machine with these chips in them hahahahaha ???
My Gues is 10 ARMs :D
Posted on Reply
#12
Patriot
InitialisedI need to see some cross platform benchmarks on the full fat M1 Max.
Yeah, lots of claims, no data to back it.
Posted on Reply
#13
Vya Domus
windwhirlUh, I thought that was low power?
Not when paired with a huge interface. A single module can give you 50GB/s, in order to reach 400GB/s you need 8 channels, that's insane.
Posted on Reply
#14
dragontamer5788
Vya DomusNot when paired with a huge interface. A single module can give you 50GB/s, in order to reach 400GB/s you need 8 channels, that's insane.
That being said: replicating a "low power design" 8x makes sense as a strategy. Each individual channel is low power, but with 8x replication, it gets you the performance you need.

Of course, having 8x the channels means that you're using 8x the power. But LPDDR5 uses less power per channel, so maybe it only uses the same amount of power as 4x channel DDR4 (aka EPYC) or 6x channel DDR4 (aka Skylake-X / Server) ?
Posted on Reply
#15
Valantar
PatriotYeah, lots of claims, no data to back it.
We know how the M1 performs - in terms of IPC it trounces both Intel and AMD, matching or beating their peak single core performance at 2/3-3/5 the clock speed (3.1GHz vs 4.8/5.3-ish). These more than double the core counts, and double/quadruple the memory bandwidth to feed the cores. Also, Apple has absolutely insane amounts of cache (at equally insane latencies) with their recent chips. This will be a beast, it just needs software to make use of the power. Which it likely will have (Adobe CS etc. are already native).
Vya DomusNot when paired with a huge interface. A single module can give you 50GB/s, in order to reach 400GB/s you need 8 channels, that's insane.
The interfaces are 256-bit (M1 Pro) and 512-bit (M1 Max). Probably a bit power hungry, sure, but they are mounted extremely close to the SoC, on the same package, so they've likely optimized for that. Plus, these are 40-60W SoCs. The memory power isn't going to be an issue.
Posted on Reply
#16
Fouquin
Vya DomusThe SoCs themselves are meh, in the sense that they're just scaled up variants of M1, nothing particularly interesting there. Those memory bandwidth claims are however rather curios, I can't see how they'd achieve that other than by using GDDR6 modules. Besides the performance implications (some of which being negative, actually) the other thing is that they're pretty power hungry and dissipate a lot of heat (probably more than the SoC itself). To put up to 64GB of that in a laptop is a questionable choice.

Also, these have gotten soo big with such a large transistor budget that it now defeats the purpose of having an SoC in the first place.

Edit: Apparently it's LPDDR5, which is still kind of stupid because that means a very wide interface, which is also power hungry.
dragontamer5788That's pretty big. I'm curious how this memory system works.

Its big enough that I'm instinctively thinking that's a typo there. 400GB/s is huge for a CPU / iGPU. The only systems close to that are XBox / PS5 game consoles with GDDR graphics ram.
256-bit and 512-bit LPDDR5 6,400Mbps for Pro and Max respectively. Package power is 50W.
Posted on Reply
#17
R0H1T
dragontamer5788But LPDDR5 uses less power per channel, so maybe it only uses the same amount of power as 4x channel DDR4 (aka EPYC) or 6x channel DDR4 (aka Skylake-X / Server) ?
The LPDDR5 isn't the problem, it's moving data ~ that's what takes up the vast majority of EPYC/TR's power. Either Apple is only using part of the 8(or 16) channels for regular work or they're come up with something that's even more efficient than say IF o_O


Looks like 4x 128bit bus for the top end M1 Max :twitch:
Posted on Reply
#18
Vya Domus
ValantarWe know how the M1 performs - in terms of IPC it trounces both Intel and AMD
"Trounce" is a bit extreme, plus, whatever advantages it has they most likely come from the huge caches and not because of the core architecture itself.
ValantarPlus, these are 40-60W SoCs. The memory power isn't going to be an issue.
That's just for the SoC ? Then the memory and package power can easily reach a good chunk of that in addition to the 40-60W. That doesn't really matter, I just wonder what's the point in having an SoC at this stage.
Posted on Reply
#19
Valantar
R0H1TThe LPDDR5 isn't the problem, it's moving data ~ that's what takes up the vast majority of EPYC/TR's power. Either Apple is only using part of the 8(or 16) channels for regular work or they're come up with something that's even more efficient than say IF o_O


Looks like 4x 128bit bus for the top end M1 Max :twitch:
Yes, those numbers are public, 256/512-bit. I would also expect at least 32MB of last level cache for the Pro, though likely more. Also, Epyc/TR uses interconnects for core-to-core and cache-to-cache transfers, while this is monolithic, so it'll be far more efficient. Also, these chips are likely gargantuan. I would guess at least 500mm² for the Max. On 5nm, that isn't going to be cheap.
Posted on Reply
#20
defaultluser
it looks exactly like I thought it would: 4x the GPU cores requires 4x the memory interface - any GPU maker will tell you that!

But yeah, that musty be a mess of staking that many chips vertically!
Posted on Reply
#21
Fouquin
ValantarYes, those numbers are public, 256/512-bit. I would also expect at least 32MB of last level cache for the Pro, though likely more. Also, Epyc/TR uses interconnects for core-to-core and cache-to-cache transfers, while this is monolithic, so it'll be far more efficient. Also, these chips are likely gargantuan. I would guess at least 500mm² for the Max. On 5nm, that isn't going to be cheap.
24MB L2 is the P-core LLC. LLC is not shared by P and E cores, they each have dedicated L2.

Posted on Reply
#22
R0H1T
ValantarEpyc/TR uses interconnects for core-to-core and cache-to-cache transfers
They also massive caches which negates some of the disadvantages of having IF, besides they're designed for workstations & servers so they're kinda made for different loads.
Valantarso it'll be far more efficient.
Maybe maybe not, we can't do a truly apples to apples comparison here without a monolithic AMD APU having unified memory subsystem ~ that IMO is the biggest gamechanger!

AMD & Intel have long talked about unified memory (CPU+GPU) for nearly half a decade now, even more for AMD, & yet Apple is the one that stole the show.
Posted on Reply
#23
dragontamer5788
R0H1TThe LPDDR5 isn't the problem, it's moving data ~ that's what takes up the vast majority of EPYC/TR's power. Either Apple is only using part of the 8(or 16) channels for regular work or they're come up with something that's even more efficient than say IF o_O
Infinity Fabric can't possibly be that efficient. Infinity Fabric is an external-die interface designed for absurdly high numbers of cores.

AMD probably made IF as efficient as possible given the restrictions. But Apple here only has an 8-core CPU + 16/32-core GPU to feed here. GPUs can be absurdly high-latency no problem and that CPU-count is small enough that a classic ring-bus would be fine.

Apple is benefiting from the Amdahl's law of low-core counts being easier to feed here. It will simply be more efficient to feed 8 cores on a singular die rather than 64-cores spread out across 8 different dies.
Posted on Reply
#24
R0H1T
dragontamer5788Apple is benefiting from the Amdahl's law of low-core counts being easier to feed here. It will simply be more efficient to feed 8 cores on a singular die rather than 64-cores spread out across 8 different dies.
That is true & that's why I said that a truly Apples to Apples comparison would need a monolithic AMD APU with unified memory, but even now Apple should be using some on die fabric like Ringbus/UPI for Intel or IF for AMD ~ that also has to be mightily efficient if they can fit all of this within 60W(?) without thermal throttling most of the time.
Posted on Reply
#25
Wirko
ValantarThe interfaces are 256-bit (M1 Pro) and 512-bit (M1 Max). Probably a bit power hungry, sure, but they are mounted extremely close to the SoC, on the same package, so they've likely optimized for that. Plus, these are 40-60W SoCs. The memory power isn't going to be an issue.
The interfaces might be power hungry but energy per bit transferred should be mighty low.
Posted on Reply
Add your own comment