Artificial intelligence consumes a lot of energy, but DeepSeek could change that situation.
"There is an option regarding this."
Last month, DeepSeek surprised the public by claiming that its artificial intelligence model uses approximately one-tenth of the computing power required by Meta's Llama 3.1 model. This assertion could have significant repercussions on how energy development and consumption in artificial intelligence are perceived. Major tech companies are racing to build enormous AI data centers, with plans that foresee electricity consumption equivalent to that of small cities. This high energy consumption generates pollution, raising concerns about how the infrastructure supporting new generative AI tools may exacerbate climate change and deteriorate air quality.
Reducing the energy needed to train and operate generative AI models could alleviate much of this pressure, although it is still too early to determine whether DeepSeek will be a radical change in the environmental footprint of artificial intelligence. The response of other leading companies to the advances of this Chinese startup will be crucial, especially in the context of new data center constructions.
According to Madalsa Singh, a postdoctoral researcher at the University of California, Santa Barbara, this demonstrates that "AI does not have to be an excessive energy consumer," suggesting that alternatives exist. The controversy began with the launch of DeepSeek's V3 model in December, which cost $5.6 million and used 2.78 million GPU hours for its training on Nvidia's H800 chips. In comparison, Meta's Llama 3.1 model required around 30.8 million GPU hours, and its cost is estimated to range from $60 million to $1 billion for similar models.
Recently, DeepSeek launched its R1 model, which was praised by venture capitalist Marc Andreessen as "a deep gift to the world." This AI assistant quickly reached the top of the Apple and Google app stores, causing a drop in the stock prices of its competitors amid the possibility that DeepSeek had created an affordable alternative to models like Llama, Gemini, and ChatGPT. In addition, Nvidia, whose technology powers these systems, saw its stock value plummet after it was revealed that DeepSeek used only 2,000 chips for its training, in contrast to the 16,000 or more required by its competitors.
According to DeepSeek, the decrease in electricity consumption is due to more efficient training methods, using a strategy known as auxiliary-loss-free. Singh explains that this means opting to train specific segments of the model instead of the whole thing simultaneously, much like a customer service company choosing which experts to consult. Additionally, the model saves energy during inference through "key value caching" and compression techniques, allowing access to information summaries without needing to review complete documents.
Singh is hopeful that DeepSeek's models, mostly open-source (excluding training data), will facilitate collaboration among researchers and allow smaller players to enter the industry. This would also promote greater transparency and accountability regarding the investments and resources required to develop these models.
However, there is skepticism surrounding DeepSeek's efficiency claims. Carlos Torres Diaz, director of energy research at Rystad Energy, warns of the lack of concrete data regarding the program's energy consumption. If what DeepSeek claims is true, it could significantly reduce the energy consumption of a data center. Nonetheless, despite major tech companies signing numerous agreements to access renewable energy, the growing electricity demand from data centers could limit the available solar and wind resources.
Using less electricity could free up renewable resources for other sectors, helping to accelerate the transition to cleaner energy sources. However, implementing more efficient AI models brings the dilemma of increased usage. Satya Nadella, CEO of Microsoft, mentioned the "Jevons Paradox" on X, which states that the more efficient a technology becomes, the more likely its use is to expand, thus creating greater environmental harm.
Having reduced energy consumption from AI could attract more data providers, expanding infrastructure much further than anticipated. Despite this, Torres Diaz also pointed out that it is premature to make significant adjustments to energy consumption projections. The source of the electricity used remains essential to determining the pollution generated. In both China and the United States, a large proportion of electricity comes from fossil fuels, while the growing demand from data centers is leading to the postponement of conventional power plant closures.
The AI industry also faces additional challenges, including high water consumption for cooling hardware, which can exacerbate problems in drought-prone regions. Although traditional data centers have maintained a relatively constant consumption in the past, their energy demand has grown considerably with the advancement of AI. In 2023, these centers consumed more than 4% of electricity in the United States, and this percentage is expected to nearly triple by 2028. As DeepSeek's development is observed, considerable uncertainty remains regarding the actual future impact of these advances.