LLaMA 4

Future model with a focus on agent systems

Focus on quality and breadth

The LLaMA 4 model at a glance

With the LLaMA 4 series, Meta AI continues its successful open source model development and launches a new generation of powerful language models. Building on the experience gained from LLaMA 3 and the further developments in LLaMA 3.1 and 3.3, LLaMA 4 aims for even deeper language understanding capabilities, better multi-turn communication and fine-tuned controllability. The models combine enhanced context understanding with optimized efficiency and offer an attractive basis for demanding applications in research, industry and product development – openly accessible and future-oriented.

Name:

LLaMA 4 series (includes LLaMA 4 Scout, LLaMA 4 Maverick, LLaMA 4 Behemoth)

Developer:

Meta AI

Publication:

April 5, 2025

License:

Open-Weight. The license is aimed at enabling developers and companies to use it.

Model type:

Native multimodal language models based on a Mixture-of-Experts (MoE) architecture. The models are designed from the ground up for processing text, images and videos.

Variations of the LLaMA 4 series

LLaMA 4 Scout

  • Active parameters: 17 billion
  • Experts: 16
  • Total parameters: 109 billion
  • Context length: 10 million tokens (industry-leading)
  • Architecture feature: Uses an iRoPE architecture (interleaved attention layers without positional embeddings) to enable the extreme context length.

LLaMA 4 Maverick

  • Active parameters: 17 billion
  • Experts: 128 (plus one shared expert)
  • Total parameters: 400 billion
  • Special architectural feature: Efficient inference through alternating dense and MoE layers.

LLaMA 4 Behemoth

  • Teacher model, not publicly available
  • Active parameters: 288 billion
  • Experts: 16
  • Total parameters: ~2 trillion
  • Purpose: Serves as a “teacher” model for distilling the smaller Llama 4 models.

Specialties of LLaMA 4 models

Native multimodality

Uses “Early Fusion” to seamlessly integrate text, image and video tokens into a unified model backbone.

Long context processing

LLaMA 4 Scout sets a new standard with 10 million tokens.

Efficiency

The MoE architecture enables higher performance at lower inference costs compared to dense models of similar size.

Multilingualism

Comprehensive training in over 200 languages.
Individual AI consulting

Is LLaMA 4 the right model for you?

We would be happy to advise you individually on which AI model suits your requirements. Arrange a no-obligation initial consultation with our AI experts and exploit the full potential of AI for your project!

The post-training pipeline for LLaMA 4

Training data & training process

LLaMA 4 was trained on an extremely large and diverse database: Over 30 trillion tokens from publicly accessible text, image and video data form the basis of the pre-training. A novel, three-stage post-training process was used for the Instruct and Chat variants: first a light supervised fine-tuning (SFT), followed by online reinforcement learning with an adaptive data filter and finally direct preference optimization (DPO).

A special focus was placed on mastering particularly difficult prompts, which were specifically incorporated into the training pipeline via continuous RL. Within the series, the LLaMA 4 Maverick model was trained using codistillation from its more powerful sister model LLaMA 4 Behemoth – a targeted transfer of knowledge for high efficiency with reduced resource consumption.

Hardware requirements (inference)

  • LLaMA 4 Scout: Designed to run on a single NVIDIA H100 GPU (with Int4 quantization).
  • LLaMA 4 Maverick: Can be run on a single NVIDIA H100 DGX host. Supports distributed inference for maximum efficiency.
  • Training efficiency: The training was performed with FP8 precision to maximize FLOPs utilization.

Safety and protective measures

  • Multi-level security integration: protective measures are implemented at data, training and system level.
  • Open source security tools:
    • Llama Guard: For filtering inputs and outputs.
    • Prompt Guard: For protection against malicious prompts and injections.
    • CyberSecEval: For the evaluation of cyber security risks.
  • Advanced red teaming: Use of automated methods such as GOAT (Generative Offensive Agent Testing) to efficiently identify vulnerabilities.
  • Bias reduction: Measurable progress in reducing political bias and refusal to respond to contentious issues (from 7% to less than 2%).
Versatile & efficient

Recommended use cases for LLaMA 4

Is LLaMA 4 the right AI model for your individual application? We will be happy to advise you comprehensively and personally.

Personalized multimodal experiences
Combination of text, image and video input.
Analysis of long documents & code bases
Use of the 10 million context length of LLaMA 4 Scout.
Powerful assistants & chatbots
Especially with LLaMA 4 Maverick for challenging dialogs, picture comprehension and creative writing.
Precise visual tasks
Image grounding (object localization in images) and visual Q&A systems.
Multilingualism
Applications in over 200 languages.
LLaMA 4

Strengths & weaknesses of the LLaMA 4 series

Strengths

Top performance: Competitive or superior to models such as GPT-4o, Gemini 2.0 and others in benchmarks for coding, reasoning and image understanding.

Outstanding efficiency: The MoE architecture offers a first-class performance to cost ratio.

Extreme context length: Opens up completely new application possibilities.

Native multimodal: Designed from the ground up for the joint processing of different data modalities.

Open-Weight & Open Source: Promotes transparency, security and innovation through the community.

Improved security & bias reduction: Comprehensive safeguards and demonstrable reduction of bias on controversial topics.

Weaknesses & limitations

High hardware requirements: Despite the efficiency, powerful GPUs are still required for the inference of the larger models.

General LLM risks: Potential for hallucination, bias and generation of inappropriate content remains, even if mitigation measures have been taken.

Complexity of the architecture: MoE models can be more demanding to handle and fine-tune than traditional dense models.

Availability: The most powerful model, LLaMA 4 Behemoth, is not publicly available.

Targeted use of LLaMA 4

Ready for scalable AI solutions?

Whether you need a powerful Instruct variant or an efficient codistilled model, the LLaMA 4 series offers flexible options for complex applications. We support you with the selection, integration and secure hosting – individually tailored to your requirements.