Cover Image for The extensive 145-page report by DeepMind on AGI safety may not persuade skeptics.
Wed Apr 02 2025

The extensive 145-page report by DeepMind on AGI safety may not persuade skeptics.

DeepMind released an extensive document outlining its strategy for ensuring the safety of artificial general intelligence. However, some experts are not entirely convinced by the foundations on which it is based.

On Wednesday, Google DeepMind published an extensive paper outlining its security approach regarding Artificial General Intelligence (AGI), defined roughly as an AI capable of performing any task that a human can do. AGI is a controversial topic in the field of artificial intelligence, as some critics believe it is more of an unattainable dream than a possible reality. In contrast, others, including AI labs like Anthropic, warn that its arrival is imminent and could cause catastrophic harm if proper safeguards are not implemented.

The 145-page paper, co-authored by Shane Legg, co-founder of DeepMind, anticipates that AGI could materialize by 2030, potentially leading to what the authors describe as "severe harms." While the document does not provide a clear definition of these harms, it mentions alarming examples such as "existential risks" that could "permanently destroy humanity." The authors state, "We anticipate the development of an Exceptional AGI before the end of this decade," defining such a system as one that achieves at least the 99th percentile of abilities in non-physical tasks, including metacognitive tasks such as learning new skills.

From the outset, the paper compares DeepMind's approach to mitigating AGI risks with that of Anthropic and OpenAI. According to DeepMind, Anthropic places less emphasis on "robust training, monitoring, and safety," while OpenAI is seen as overly optimistic in its approach to the "automation" of AI safety research, known as alignment research. Additionally, the document expresses doubts about the likelihood of a superintelligent AI emerging, meaning an AI that can perform tasks better than any human. Despite recent claims from OpenAI that its focus is shifting from AGI to superintelligence, the authors at DeepMind are unconvinced that, in the absence of "significant architectural innovation," these superintelligent systems will emerge soon.

However, the paper considers it plausible that current paradigms allow for a "recursive improvement of AI": a positive feedback cycle in which AI conducts its own research to create more sophisticated AI systems, which could be extremely dangerous, according to the authors. In general, the document proposes developing techniques aimed at restricting malicious actors' access to a hypothetical AGI, fostering understanding of AI systems' actions, and "hardening" the environments in which AI can operate. It acknowledges that many of these techniques are nascent and present "open research problems," but warns against ignoring the security challenges that may be on the horizon.

The authors state, "The transformative nature of AGI has the potential to generate both incredible benefits and severe harms. Therefore, to develop AGI responsibly, it is critical that frontier AI developers proactively plan to mitigate severe harms." However, some experts criticize the paper's premises. Heidy Khlaaf, a senior AI scientist at the AI Now Institute, argues that the concept of AGI is so unclear that it cannot be rigorously evaluated scientifically. Meanwhile, Matthew Guzdial, an assistant professor at the University of Alberta, doubts that recursive improvement of AI is realistic at present.

Guzdial notes, "Recursive improvement is the basis of arguments about the singularity of intelligence, but we have never seen evidence that it works." Sandra Wachter, a researcher in technology and regulation at Oxford, argues that a more realistic concern lies in the potential for AI to reinforce itself with "inaccurate outputs." According to her, the proliferation of generative AI results on the internet and the gradual replacement of authentic data lead models to learn from their own outputs, which may be rife with misinformation. "Chatbots are primarily used to search for and find the truth. This means we are in a constant risk of receiving incorrect information and believing it because it is presented convincingly," she added.

Despite its thoroughness, the DeepMind paper seems unlikely to resolve the debates about the viability of AGI and the areas of AI safety that require urgent attention.