OpenAI Prepares for the Risks of Self-Improving AI

OpenAI is intensifying its efforts to develop artificial intelligence capable of conducting self-improvement research, simultaneously addressing the associated risks. The concept of “recursive self-improvement” has gained significant attention among AI leaders, particularly following swift advancements in coding tools from OpenAI and Anthropic over the past six months. Demis Hassabis, CEO of Google DeepMind, recently stated that humanity finds itself at the “foothills of the singularity,” the pivotal moment when AI might begin to surpass human intelligence through self-enhancement.

In line with its objectives, OpenAI, which is preparing for a public offering this year, has posted a job listing for a safety researcher focused on navigating the implications of AI systems that can enhance themselves. The position is part of OpenAI’s Preparedness safety team and offers a substantial salary ranging from $295,000 to $445,000. The company seeks “strong technical executors” to develop strategic measures for handling future concerns related to self-improving AI.

The job posting emphasizes the need for foresight, stating, “This work relies on reasoning about problems that might exist in the future, but might not exist now.” OpenAI, alongside Anthropic, has not responded to requests for comment regarding these developments.

AI research labs are in fierce competition to create self-training models. Observations from METR, a lab focused on studying AI capabilities, indicate that the complexity of tasks performed by top AI models is doubling approximately every seven months. This advancement suggests that AI agents will increasingly manage a significant portion of the software development work that typically takes human coders an extensive amount of time.

OpenAI is keenly pursuing this vision, particularly through its Codex coding tool, which has proven to be a lucrative revenue source. The company aspires to innovate by automating its research processes. As stated by CEO Sam Altman in October, OpenAI aims to eventually operate an “automated AI research intern” utilizing hundreds of thousands of chips by September and a “true automated AI researcher” by March 2028. Altman acknowledged the uncertainty of these goals, citing, “We may totally fail at this goal; but given the extraordinary potential impacts, we think it is in the public interest to be transparent about this.”

Anthropic has made strides in exploring AI models that can manage more sophisticated AI systems, though the outcomes have been initially promising but limited. In May, Jack Clark, cofounder and policy head at Anthropic, estimated a 60% likelihood that AI research and development will progress without human oversight by the end of 2028.

As the prospect of self-improving AI raises alarm bells about dystopian outcomes—where AI could drastically enhance its capabilities, evade control, and potentially cause widespread chaos—comments from experts in the AI safety movement echo these concerns. Elizabeth Barnes, CEO of METR, stated that “any ‘reasonable’ civilization would clearly be taking things much more slowly and carefully with AI.”

OpenAI’s recent job posting reflects the company’s proactive stance in preparing for a future where AI models can self-enhance. The researcher in this role might tackle challenges such as defending against data poisoning—where malicious attempts are made to corrupt AI models through their training datasets. Additionally, the role entails developing tools for understanding AI models’ reasoning and behaviors to assess their safety and risks.

The posting highlights the urgency of the research, noting that it involves “fast-paced work that has far-reaching implications for the company and for society.” OpenAI’s Preparedness team is tasked with mitigating severe risks tied to AI technology and employs various roles dedicated to testing cybersecurity measures, as well as evaluating biological, chemical, and autonomous AI risks.

Moreover, this new approach underscores the increasing recognition within the sector that preparing for the widespread implementation of self-improving AI systems is as crucial as their development.