The DeepSeek-R1 series was developed with the aim of establishing exceptional reasoning capabilities in large language models. Using a multi-stage reinforcement learning process and specialized chain-of-thought data, R1 is designed to solve even complex math, logic and programming tasks with high precision. The model combines methodological depth with practical performance and sets new standards for explainable, comprehensible AI results – openly accessible and commercially usable.
DeepSeek-R1 series (e.g. DeepSeek-R1-0528)
DeepSeek-AI
January 2025 (paper), May 2025 (model update 0528)
MIT license. The use of the DeepSeek-R1 models is authorized for commercial purposes and for distillation.
Large language model (LLM) whose architecture has been optimized for extreme reasoning capabilities through intensive reinforcement learning (RL).
DeepSeek V3 Base
The primary, most powerful version with a focus on deep reasoning, math and coding. It was trained through a multi-stage process, including a “cold start” with SFT data and intensive RL.
A version published for research purposes, trained exclusively through reinforcement learning (without initial supervised fine-tuning) to demonstrate the self-evolution of reasoning skills.
(z.B. DeepSeek-R1-0528-Qwen3-8B)
Smaller, efficient open-source models (such as Qwen or Llama) that have learned the reasoning capabilities of the large DeepSeek-R1 model through distillation.
We would be happy to advise you individually on which AI model suits your requirements. Arrange a no-obligation initial consultation with our AI experts and exploit the full potential of AI for your project!
The post-training included a data set of approx. 800,000 samples for SFT.
Initial fine-tuning of the basic model with a small number (several thousand) of high-quality, long chain-of-thought examples.
Intensive training on math, code and logic tasks with a rule-based reward system to maximize accuracy.
Generation of a new, high-quality SFT data set (~600k reasoning, ~200k non-reasoning) with the RL model and renewed fine-tuning.
A final RL phase to improve general helpfulness and harmlessness, taking into account human preferences.
Is DeepSeek-R1 the right AI model for your individual application? We will be happy to advise you comprehensively and personally.
State-of-the-Art Reasoning: Achieves performance in benchmarks such as AIME, MATH-500 and GPQA that rivals or exceeds the best closed models (e.g. OpenAI o1 series, Gemini 2.5 Pro).
Effective scaling at runtime: The ability to invest more computing time (tokens) for a problem if required leads to better results.
Transparent development: The approach of publishing a pure RL model (R1-Zero) provides insights into the learning processes.
Outstanding distillation capability: Makes high-end reasoning accessible for smaller, more efficient models.
Commercially usable: The MIT license allows broad application in products and services.
Language mixing: The model is optimized for English and Chinese and tends to perform the reasoning steps in English for queries in other languages.
Prompt sensitivity: The performance is sensitive to the prompt format. Zero-shot prompts are recommended over few-shot prompts.
Deficits in general capabilities: In areas such as complex role-playing, multi-turn dialogs and JSON output, it is partially inferior to the basic model (DeepSeek-V3).
Less focus on software engineering: RL was used less intensively in this area, which means that the increase in performance is lower than in mathematics.
Use DeepSeek-R1 for complex logic tasks, advanced code generation or precise scientific work – perfectly tailored to your requirements. Our experts will help you with the selection, optimization and secure hosting on our GPU Cloud in Germany.