DeepSeek-R1 Series

Deep thinking through reinforcement learning

Optimized for deep reasoning

The DeepSeek-R1 series at a glance

The DeepSeek-R1 series was developed with the aim of establishing exceptional reasoning capabilities in large language models. Using a multi-stage reinforcement learning process and specialized chain-of-thought data, R1 is designed to solve even complex math, logic and programming tasks with high precision. The model combines methodological depth with practical performance and sets new standards for explainable, comprehensible AI results – openly accessible and commercially usable.

Name:

DeepSeek-R1 series (e.g. DeepSeek-R1-0528)

Developer:

DeepSeek-AI

Publication:

January 2025 (paper), May 2025 (model update 0528)

License:

MIT license. The use of the DeepSeek-R1 models is authorized for commercial purposes and for distillation.

Model type:

Large language model (LLM) whose architecture has been optimized for extreme reasoning capabilities through intensive reinforcement learning (RL).

Basic model:

DeepSeek V3 Base

Variations of the DeepSeek-R1 series

DeepSeek-R1-0528

The primary, most powerful version with a focus on deep reasoning, math and coding. It was trained through a multi-stage process, including a “cold start” with SFT data and intensive RL.

DeepSeek-R1-Zero

A version published for research purposes, trained exclusively through reinforcement learning (without initial supervised fine-tuning) to demonstrate the self-evolution of reasoning skills.

Distilled Models

(z.B. DeepSeek-R1-0528-Qwen3-8B)

Smaller, efficient open-source models (such as Qwen or Llama) that have learned the reasoning capabilities of the large DeepSeek-R1 model through distillation.

Specialties of DeepSeek-R1 models

Deep reasoning ("DeepThink")

The model is optimized to use a longer “thinking time” (chain-of-thought) for complex queries, which leads to significantly higher accuracy. In the AIME test, for example, an average of 23K tokens per question were used to find a solution.

Reinforcement Learning (RL)

The core innovation is the use of large-scale RL (using the GRPO algorithm) to teach the model to develop complex solution strategies on its own.

Distillation

The ability to successfully transfer the complex reasoning patterns learned to much smaller models.

Tool usage

Improved support for function calling.
Individual AI consulting

Is DeepSeek-R1 the right model for you?

We would be happy to advise you individually on which AI model suits your requirements. Arrange a no-obligation initial consultation with our AI experts and exploit the full potential of AI for your project!

The post-training pipeline for DeepSeek-R1 Series

Training data & training process

The post-training included a data set of approx. 800,000 samples for SFT.

1

Cold Start

Initial fine-tuning of the basic model with a small number (several thousand) of high-quality, long chain-of-thought examples.

2

Reasoning-oriented RL

Intensive training on math, code and logic tasks with a rule-based reward system to maximize accuracy.

3

Rejection Sampling & SFT

Generation of a new, high-quality SFT data set (~600k reasoning, ~200k non-reasoning) with the RL model and renewed fine-tuning.

4

All-Scenario RL

A final RL phase to improve general helpfulness and harmlessness, taking into account human preferences.

Hardware requirements (inference)

  • The documentation does not specify exact VRAM requirements. As a very large and powerful reasoning model, it can be assumed that considerable GPU resources are required for inference (especially for long “DeepThink” processes), comparable to other models in this performance class.
  • Distilled, smaller versions are designed to run on more accessible hardware.
Versatile & efficient

Recommended use cases for DeepSeek-R1

Is DeepSeek-R1 the right AI model for your individual application? We will be happy to advise you comprehensively and personally.

Solving complex problems
Mathematics, logic puzzles and scientific questions at a high level.
Advanced code generation
Solving programming competitions (Codeforces) and verifying software (SWE-Bench).
Academic research
Investigation of reasoning in LLMs and the effectiveness of reinforcement learning.
Development of small but powerful models
Use of the DeepSeek-R1 as a “teacher” model for distillation.
Function Calling
Applications with function calling and integration of external tools.
DeepSeek-R1 Series

Strengths & weaknesses of the DeepSeek-R1 series

Strengths

State-of-the-Art Reasoning: Achieves performance in benchmarks such as AIME, MATH-500 and GPQA that rivals or exceeds the best closed models (e.g. OpenAI o1 series, Gemini 2.5 Pro).

Effective scaling at runtime: The ability to invest more computing time (tokens) for a problem if required leads to better results.

Transparent development: The approach of publishing a pure RL model (R1-Zero) provides insights into the learning processes.

Outstanding distillation capability: Makes high-end reasoning accessible for smaller, more efficient models.

Commercially usable: The MIT license allows broad application in products and services.

Weaknesses & limitations

Language mixing: The model is optimized for English and Chinese and tends to perform the reasoning steps in English for queries in other languages.

Prompt sensitivity: The performance is sensitive to the prompt format. Zero-shot prompts are recommended over few-shot prompts.

Deficits in general capabilities: In areas such as complex role-playing, multi-turn dialogs and JSON output, it is partially inferior to the basic model (DeepSeek-V3).

Less focus on software engineering: RL was used less intensively in this area, which means that the increase in performance is lower than in mathematics.

Maximize results with the right model

Ready for powerful reasoning applications?

Use DeepSeek-R1 for complex logic tasks, advanced code generation or precise scientific work – perfectly tailored to your requirements. Our experts will help you with the selection, optimization and secure hosting on our GPU Cloud in Germany.