In the world of artificial intelligence, there are always breakthroughs that stir up the technology industry. One of these milestones is the recently introduced language model DeepSeek-R1 from the Chinese startup DeepSeek. The model has not only generated enthusiasm in expert circles but has also had significant impacts on stock markets and the strategies of established tech giants.
An Emerging Company with Strong Backing
DeepSeek, founded in April 2023 by Liang Wenfeng in Hangzhou, Zhejiang, is no ordinary startup. The company is supported by the hedge fund High-Flyer, also based in Hangzhou. High-Flyer was founded in 2015 by three engineers from Zhejiang University and managed assets of over 7 billion dollars in 2020. High-Flyer’s team includes talented minds, including Olympic medalists in mathematics, physics, and computer science. Unlike many reports suggest, DeepSeek also has an impressive infrastructure of around 50,000 GPUs and can thus compete in the top league of AI research.
Technological Breakthrough with DeepSeek-R1
On January 13, 2025, DeepSeek presented the Large Language Model (LLM) DeepSeek-R1. This model is based on state-of-the-art machine learning technologies and uses a Mixture-of-Experts (MoE) architecture with an impressive 671 billion parameters. Operating the model requires more than 16 GPUs with 80 GB memory each, underscoring the enormous size and complexity of DeepSeek-R1. Contrary to frequent reports, only the smaller models can be hosted locally, not the flagship model.
A Look into the Research Document
The accompanying paper “DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning” offers a detailed insight into the development process:
- Reinforcement Learning without Supervised Fine-Tuning: DeepSeek chose an unconventional approach by forgoing extensive pre-labeled datasets and instead working directly with Reinforcement Learning (RL). The model learned through reward mechanisms and independently developed impressive capabilities in logical thinking and problem-solving.
- Emergent Capabilities: During training, DeepSeek-R1 showed remarkable properties such as self-verification, reflection, and the ability to develop longer and more complex thought processes. These capabilities emerged without direct human guidance, underscoring the effectiveness of the RL approach.
- Group Relative Policy Optimization (GRPO): For efficiency gains, DeepSeek used the GRPO method. This dispenses with a separate critic model and instead estimates the baseline based on group scores, reducing computational overhead.
- Open Licensing and Distillation: DeepSeek released the model under the MIT license, enabling unrestricted commercial and academic use. Additionally, six “distilled” versions were provided. These smaller models, based on Qwen and Llama and trained on 800,000 examples without further RL processing, are freely available. However, they do not achieve the performance of the large R1 model.
Controversy About Training Data and Transparency
A central point not discussed in detail in the paper is the type of training data used. It remains unclear which specific datasets or sources were used for training DeepSeek-R1. However, this mystery is not unique to DeepSeek. Even with many US models, including OpenAI’s GPT-4, the exact training data is often not disclosed. This raises questions about transparency, data quality, and ethical standards. Without knowledge of data sources, it’s difficult to assess whether the model contains unintended biases or is influenced in certain ways.
Misconceptions and Clarifications
With the rise of DeepSeek-R1, numerous pieces of misinformation circulated:
- Training Costs: Contrary to some claims, training costs were not only about 6 million US dollars. The computing power for the base model alone without RL corresponded to GPU hours worth about 5.5 million US dollars. Additional costs for testing, smaller runs, data generation, and the actual training of DeepSeek-R1 are not yet included.
- Resources and Infrastructure: DeepSeek has an impressive infrastructure of around 50,000 GPUs, significantly more than some sources suggested.
- Project Significance: DeepSeek-R1 is not a side project. With significant financial support from High-Flyer and a talented team, it’s a central undertaking with strategic importance.
- Distilled Model Performance: The smaller model versions, although freely available and locally executable, do not achieve the performance of the large DeepSeek-R1.
- Data Usage: Users should be aware that the hosted version at chat.deepseek.com may use data to improve future models according to the terms of use.
Impact on the Tech Industry and Stock Markets
The release of DeepSeek-R1 had noticeable effects on the global technology industry:
- Challenge for Established Companies: DeepSeek-R1 achieves a performance level that competes with market-leading models, but with presumably lower investments. This challenges the business model of Western AI companies and could lead to increased competition and price pressure. However, operating 50,000 GPUs should not be underestimated.
- Stock Market Reactions: After the announcement, some Western technology stocks recorded declines of 5 to 10%. Investors react sensitively to signs of a possible shift in technology leadership.
- Growing Open-Source Movement: The open licensing of DeepSeek-R1 could strengthen the trend toward collaboration and transparency in AI research. Platforms like Hugging Face are already working on fully open reproduction pipelines, which could benefit all players in the long term.
Europe in the Digital Race
DeepSeek-R1 marks a turning point in the AI industry. A Chinese startup demonstrates that with innovative approaches and substantial support, powerful models can be developed that compete with market leaders. This underscores the need for all players, including Europe, to invest in research and development, promote transparency, and maintain ethical standards.
However, developments around DeepSeek-R1 also raise the question of where the European Union actually stands in this technological competition. Despite outstanding research institutes and a rich talent pool, Europe often fails to convert innovations into marketable products. While China and the USA are investing massively in AI and providing state support, Europe seems to stand between the fronts and watch. If we as the EU do nothing, we’ll end up like with cloud computing – only applying technologies from the USA and China.
