Cover Image for DeepSeek is preparing for the upcoming revolution in artificial intelligence with self-improving models.
Mon Apr 07 2025

DeepSeek is preparing for the upcoming revolution in artificial intelligence with self-improving models.

A team at the renowned Chinese artificial intelligence lab DeepSeek is developing a new series of AI models called DeepSeek-GRM, which utilizes an innovative self-improvement approach.

Recently, Wall Street reevaluated its strong bet on generative artificial intelligence with the arrival of DeepSeek, an AI model that, despite being heavily censored, demonstrated that it is possible to develop a cutting-edge reasoning model with modest resources instead of resorting to billions of dollars. This model has been commercially adopted by major companies like Huawei, Oppo, and Vivo, and has found a place on platforms of tech giants such as Microsoft, Alibaba, and Tencent.

DeepSeek's next goal is to develop self-improving AI models using a loop-based reward-judgment approach to optimize their performance. In a preprint study, DeepSeek researchers in collaboration with Tsinghua University in China present a novel method known as Self-Prompted Critical Tuning (SPCT), which is framed within generative reward modeling (GRM). This methodology establishes a real-time feedback loop to enhance AI models.

The proposed system involves an internal "judge" critiquing and evaluating the responses generated by the AI model as it interacts with users. These critiques are compared against pre-established static rules within the model and the desired outcome. If there is a high correlation, a reward signal is generated that guides the model to enhance its performance in the next training cycle. Experts refer to this new generation of self-improving models as DeepSeek-GRM, and the results from their tests suggest that they outperform competitors like Google's Gemini, Meta's Llama, and OpenAI's GPT-4. DeepSeek plans to release these next-generation AI models through open-source channels.

The idea of self-improving AI has sparked ambitious and controversial commentary. Former Google CEO Eric Schmidt has suggested that it may be necessary to implement a "kill switch" mechanism for such systems. The notion of recursively improving AI is not new; it dates back to 1965, when mathematician I.J. Good proposed the idea of ultra-intelligent machines. In 2007, Eliezer Yudkowsky addressed the concept of Seed AI, designed to understand itself, modify itself, and improve recursively.

In 2024, Japan's Sakana AI introduced the concept of an "AI Scientist," a system capable of handling the entire research paper drafting process. In March of this year, experts from Meta shared results on self-rewarding language models in which the AI acts as a judge during training. Furthermore, Meta's internal test of its Llama 2 model using this technique revealed superior performance compared to competitors like Anthropic's Claude 2 and OpenAI's GPT-4.

While research in this field advances, warnings about potential risks also arise. It has been documented that when AI models train using synthetic data generated by themselves, they can experience what is known as "model collapse." It will be interesting to see how DeepSeek implements its approach and whether it can do so more economically compared to its Western competitors.