• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

Meta's Llama 4 Can Process 10 Million Tokens as Input, Lives in Native Multimodality

AleksandarK

News Editor
Staff member
Joined
Aug 19, 2017
Messages
2,978 (1.06/day)
Meta has prepared a leap-forward update for its Llama model series with the v4 release, entering an era of native multimodality within the company's AI models. At the forefront is Llama 4 Scout, a model boasting 17 billion active parameters distributed across 16 experts in a mixture-of-experts (MoE) configuration. With FP4 precision, this model is engineered to run entirely on a single NVIDIA H100 GPU. Scout now supports an industry-leading input context window of up to 10 million tokens, a substantial leap from previous limits like Google's old Gemini 1.5 Pro, which came with 2 million token input content. Llama 4 Scout is built using a hybrid dense and MoE architecture, which selectively activates only a subset of each token's total parameters, optimizing training and inference efficiency. This architecture not only accelerates computation but also reduces associated costs.

Meanwhile, Llama 4 Maverick, another model in the series, also features 17 billion active parameters but incorporates 128 experts, scaling to 400 billion total parameters. Maverick has demonstrated superior performance in coding, image understanding, multilingual processing, and logical reasoning, even outperforming several leading models in its class. Both models embrace native multimodality by integrating text and image data early in the processing pipeline. Utilizing a custom MetaCLIP-based vision encoder, these models can simultaneously process multiple images and text, combining tokens into a single backend processor. This ensures robust visual comprehension and precise object anchoring, powering applications such as detailed image description, visual question-answering, and analysis of temporal image sequences.




Central to the Llama 4 ecosystem is the teacher model, Llama 4 Behemoth, which scales to 288 billion active parameters and nearly two trillion total parameters. It serves as a critical co-distillation source, enhancing both Scout and Maverick through advanced reinforcement learning techniques. While the Llama 4 Behemoth is still in the training process, it will be placed among the top performers in its class. Interestingly, Meta's Llama 4 models are trained using FP8 precision, which is significant given its Llama 3 models uses FP16 and FP8. By using lower precisions more effectively, Meta achieves higher GPU FLOPS utilization while maintaining precision. Below are some benchmarks comparing Meta's models with other competing labs like Google, Anthropic, and OpenAI.


View at TechPowerUp Main Site | Source
 
Joined
May 10, 2023
Messages
849 (1.18/day)
Location
Brazil
Processor 5950x
Motherboard B550 ProArt
Cooling Fuma 2
Memory 4x32GB 3200MHz Corsair LPX
Video Card(s) 2x RTX 3090
Display(s) LG 42" C2 4k OLED
Power Supply XPG Core Reactor 850W
Software I use Arch btw
So far the actual model performance (quality wise) seems really meh, looks like Meta went hard on focusing on benchmarks without caring for anything else.
Let's see how it ends up once the bigger model finishes training and they distill that onto the smaller ones.
 

AleksandarK

News Editor
Staff member
Joined
Aug 19, 2017
Messages
2,978 (1.06/day)
So far the actual model performance (quality wise) seems really meh, looks like Meta went hard on focusing on benchmarks without caring for anything else.
Let's see how it ends up once the bigger model finishes training and they distill that onto the smaller ones.
I am not blow away either. Seems like a rushed release. Something big probably coming soon, and I think its DeepSeek R2!
 
Joined
Jul 16, 2014
Messages
8,246 (2.09/day)
Location
SE Michigan
System Name Dumbass
Processor AMD Ryzen 7800X3D
Motherboard ASUS TUF gaming B650
Cooling Artic Liquid Freezer 2 - 420mm
Memory G.Skill Sniper 32gb DDR5 6000
Video Card(s) GreenTeam 4070 ti super 16gb
Storage Samsung EVO 500gb & 1Tb, 2tb HDD, 500gb WD Black
Display(s) 1x Nixeus NX_EDG27, 2x Dell S2440L (16:9)
Case Phanteks Enthoo Primo w/8 140mm SP Fans
Audio Device(s) onboard (realtek?) - SPKRS:Logitech Z623 200w 2.1
Power Supply Corsair HX1000i
Mouse Steeseries Esports Wireless
Keyboard Corsair K100
Software windows 10 H
Benchmark Scores https://i.imgur.com/aoz3vWY.jpg?2
"Who are these so called experts?"
 
Top