Meta's Llama 4 Can Process 10 Million Tokens as Input, Lives in Native Multimodality

AleksandarK · Apr 7, 2025

Meta has prepared a leap-forward update for its Llama model series with the v4 release, entering an era of native multimodality within the company's AI models. At the forefront is Llama 4 Scout, a model boasting 17 billion active parameters distributed across 16 experts in a mixture-of-experts (MoE) configuration. With FP4 precision, this model is engineered to run entirely on a single NVIDIA H100 GPU. Scout now supports an industry-leading input context window of up to 10 million tokens, a substantial leap from previous limits like Google's old Gemini 1.5 Pro, which came with 2 million token input content. Llama 4 Scout is built using a hybrid dense and MoE architecture, which selectively activates only a subset of each token's total parameters, optimizing training and inference efficiency. This architecture not only accelerates computation but also reduces associated costs.

Meanwhile, Llama 4 Maverick, another model in the series, also features 17 billion active parameters but incorporates 128 experts, scaling to 400 billion total parameters. Maverick has demonstrated superior performance in coding, image understanding, multilingual processing, and logical reasoning, even outperforming several leading models in its class. Both models embrace native multimodality by integrating text and image data early in the processing pipeline. Utilizing a custom MetaCLIP-based vision encoder, these models can simultaneously process multiple images and text, combining tokens into a single backend processor. This ensures robust visual comprehension and precise object anchoring, powering applications such as detailed image description, visual question-answering, and analysis of temporal image sequences.

Central to the Llama 4 ecosystem is the teacher model, Llama 4 Behemoth, which scales to 288 billion active parameters and nearly two trillion total parameters. It serves as a critical co-distillation source, enhancing both Scout and Maverick through advanced reinforcement learning techniques. While the Llama 4 Behemoth is still in the training process, it will be placed among the top performers in its class. Interestingly, Meta's Llama 4 models are trained using FP8 precision, which is significant given its Llama 3 models uses FP16 and FP8. By using lower precisions more effectively, Meta achieves higher GPU FLOPS utilization while maintaining precision. Below are some benchmarks comparing Meta's models with other competing labs like Google, Anthropic, and OpenAI.

View at TechPowerUp Main Site | Source

igormp · Apr 7, 2025

So far the actual model performance (quality wise) seems really meh, looks like Meta went hard on focusing on benchmarks without caring for anything else.
Let's see how it ends up once the bigger model finishes training and they distill that onto the smaller ones.

AleksandarK · Apr 7, 2025

igormp said:
So far the actual model performance (quality wise) seems really meh, looks like Meta went hard on focusing on benchmarks without caring for anything else.
Let's see how it ends up once the bigger model finishes training and they distill that onto the smaller ones.

I am not blow away either. Seems like a rushed release. Something big probably coming soon, and I think its DeepSeek R2!

DeathtoGnomes · Apr 8, 2025

"Who are these so called experts?"

Processor	5950x
Motherboard	B550 ProArt
Cooling	Fuma 2
Memory	4x32GB 3200MHz Corsair LPX
Video Card(s)	2x RTX 3090
Display(s)	LG 42" C2 4k OLED
Power Supply	XPG Core Reactor 850W
Software	I use Arch btw

System Name	Dumbass
Processor	AMD Ryzen 7800X3D
Motherboard	ASUS TUF gaming B650
Cooling	Artic Liquid Freezer 2 - 420mm
Memory	G.Skill Sniper 32gb DDR5 6000
Video Card(s)	GreenTeam 4070 ti super 16gb
Storage	Samsung EVO 500gb & 1Tb, 2tb HDD, 500gb WD Black
Display(s)	1x Nixeus NX_EDG27, 2x Dell S2440L (16:9)
Case	Phanteks Enthoo Primo w/8 140mm SP Fans
Audio Device(s)	onboard (realtek?) - SPKRS:Logitech Z623 200w 2.1
Power Supply	Corsair HX1000i
Mouse	Steeseries Esports Wireless
Keyboard	Corsair K100
Software	windows 10 H
Benchmark Scores	https://i.imgur.com/aoz3vWY.jpg?2

Meta's Llama 4 Can Process 10 Million Tokens as Input, Lives in Native Multimodality

AleksandarK

News Editor

igormp

AleksandarK

News Editor

DeathtoGnomes