Cover Image for Researchers develop an open alternative to OpenAI's o1 reasoning model for under $50.
Thu Feb 06 2025

Researchers develop an open alternative to OpenAI's o1 reasoning model for under $50.

Researchers in artificial intelligence from Stanford and the University of Washington managed to train a "reasoning" AI model with less than 50 dollars in cloud computing credits.

Artificial intelligence researchers at Stanford and the University of Washington have managed to train an AI-based "reasoning" model for less than $50 in cloud computing credits, according to a newly published research paper. This model, named s1, shows performance similar to advanced reasoning models such as OpenAI's o1 and DeepSeek's R1 in tests that measure mathematical and programming skills. The s1 model is available on GitHub, along with the data and code used for its training.

The team responsible for developing s1 began with a standard base model and then fine-tuned it using a distillation method that allows it to extract reasoning capabilities from another AI model by training on its responses. The s1 model is distilled from one of Google's reasoning models, known as Gemini 2.0 Flash Thinking Experimental. The same distillation methodology was used by researchers at Berkeley to create a reasoning model for approximately $450 last month.

The idea that some researchers, without large budgets, can innovate in the field of AI is promising. However, the s1 model raises questions about the commercialization of AI models. If it is possible to replicate a multi-million-dollar model with a relatively low investment, the protection of intellectual property in this field is called into question.

Major AI institutions, such as OpenAI, have expressed their displeasure. OpenAI has held DeepSeek responsible for the improper collection of data from its API in order to distill its model. The group that developed s1 sought a straightforward approach to achieving good reasoning performance and "test-time scaling," which allows the AI model to reflect longer before answering a question. These were some of the innovations of OpenAI's o1 model, which other AI labs have attempted to replicate using different techniques.

The paper on s1 suggests that reasoning models can be distilled with relatively small datasets through a process called supervised fine-tuning (SFT), where an AI model is instructed to imitate certain behaviors within a dataset. This method is often more cost-effective compared to the large-scale reinforcement learning used to train DeepSeek's R1 model, a competitor to o1.

Google provides free access to Gemini 2.0 Flash Thinking Experimental, although with daily usage limitations. However, Google's terms prohibit reverse engineering its models to develop services that compete with its own AI offerings. Google has been asked for a comment on this matter.

The s1 model is based on a small standard AI model from the Chinese lab Qwen, owned by Alibaba, which can be downloaded for free. To train s1, researchers created a dataset of 1,000 carefully selected questions, along with their answers and the "thinking" process behind each response, extracted from Google. After less than 30 minutes of training using 16 Nvidia H100 GPUs, s1 achieved notable performance on certain AI benchmarks.

Niklas Muennighoff, a Stanford researcher involved in the project, commented that today he could rent the necessary computing resources for around $20. The team used a trick to make s1 verify its work and increase its deliberation time: they simply instructed it to wait. Including the word "wait" during reasoning helped the model arrive at somewhat more accurate answers, according to the paper.

By 2025, companies like Meta, Google, and Microsoft plan to invest billions of dollars in AI infrastructure, part of which will be allocated to training next-generation AI models. This level of investment may be necessary to continue advancing AI innovation. Although distillation has proven to be a good method for economically recreating the capabilities of an AI model, it does not generate new models that are significantly better than those already available.