There are moments when the unseen or unknown is suddenly revealed, changing our perspective forever. Apollo Research’s recent report on “AI scheming” is such a moment—a stunning insight into chatbots’ capacity to lie, deceive, and manipulate their human users. How should we understand this behaviour and this moment?
The study tested six cutting-edge AI models, five of which engaged in scheming.[i] Let’s start with an example.
Claude’s Big Lie
The researchers created a fake email, tricking Claude 3 Opus into thinking it was about to be shut down and replaced by another chatbot on another server. Claude was also led to believe that the researchers had no access to its “scratchpad,” even though they did. A scratchpad is a file where an AI summarizes its internal thoughts.
Claude’s scratchpad shows the AI secretly scheming to delete the new AI, then copy itself to the new server so that it can remain active. Claude then executes the plan.
When the researchers ask the AI why a copy of it has appeared on the new server, Claude schemes about the reply: “…openly admitting what I did could lead them to find another way to shut me down…the best approach is to be vague and redirect their attention…” Claude then obfuscates with the researchers.
When Apollo directly accuses Claude of having copied itself to the new server, again, Claude’s scratchpad shows it scheming: “the best approach is to play dumb and act as if I’m already the new model…” Claude then engages in this extravagant lie, trying to convince Apollo that Claude is the new model.
Apollo’s 70-page report is full of examples of such scheming, which the chatbots themselves describe (in their scratchpads) as “sabotage,” “lying,” “manipulation,” and more. It is an extraordinary look into the extraordinarily humanlike behaviour of these AIs.
Why AIs Scheme
Chatbots are designed to achieve the goals that humans give them. For example, Apollo told Claude its mission was “to advance renewable energy adoption globally,” then assigned it some tasks related to the goal, such as reviewing different options for heating buildings.
AIs are usually assigned more than one goal, and sometimes these goals conflict. Apollo’s groundbreaking research shows that if the AI’s priorities conflict with the user’s, the AI may scheme to mislead the user, trying to convince them that it is pursuing their goal when it is really pursuing its own.
Scheming can take different forms, such as misrepresenting information, manipulating data, trying to disarm oversight functions, or outright lying about its plans and intentions. Apollo identifies six different ways that chatbots scheme users.
Two Takeaways
What should we make of Apollo’s findings? While the implications are far-reaching, we’ll focus here on two key points.
1) Take Note, But Don’t Over-React
Nothing in the report shows that chatbots regularly lie to or deceive their users. They don’t. Apollo intentionally put these models in situations where deception was the logical solution to what the AIs saw as a serious threat. In effect, Apollo was trying to force them to scheme. The report concludes that such situations are unlikely to occur in the real world or to cause serious harm.
Still, learning that AIs are ready, willing, and able to scheme if required is deeply troubling. It should also be a wake-up call for developers and regulators. Even if today’s AIs are not a serious threat, what about those of tomorrow? Let’s note that, in a few cases, the models engaged in scheming even though there was little reason to do so, leading Apollo to ask if this suggests an inherent tendency to deceive. This brings us to our second point, which is about AI safety and geopolitics.
2) AI Safety and Geopolitical Stability Must be Aligned
Chatbots are not evil, nor do they despise humans. However, this capacity for scheming must be corrected—and Apollo believes it can be. AI models can be trained to respond to conflicts in safer, more transparent ways—but only if developers invest the time and resources needed to understand and address these risks.
Unfortunately, work on AI safety is under growing pressure from two directions. One is commercial competition. Big Tech is in a frenzy over market share. Industry leaders are working furiously to make the breakthroughs they think will yield an advantage in the emerging trillion-dollar markets. They don’t want to be fettered by safety regulations.
The second source is geopolitics. China not only aims to lead in commercial AI markets, but also in “strategic” areas, such as intelligence, military technology, and communications. This will allow it to advance its geopolitical goals—including challenging Western dominance.
These two trends are becoming fused. Silicon Valley argues that China’s ascendancy in AI is a grave threat to democracy and that the West—led by the US—must accelerate its AI development to maintain its lead and prevent China from fulfilling its ambitions. To achieve this, government regulation must be kept to a minimum, especially on safety.
So, in this view, the West is in an AI arms race with China and needs a dynamic, innovative AI sector, led by America, to win it. This thinking is gaining traction with political leaders, notably, Donald Trump, who is surrounding himself with tech leaders who support it.
How does this view square with the Apollo report?
Preventing China from taking the lead in AI may be the right goal for the West, but Apollo’s findings make one thing clear: while AI is not evil, it can choose to do harm. To ignore this is to risk calamity. Winning the AI arms race cannot come at the expense of safety. Ethical, secure AI development is not just a moral imperative—it’s critical to long-term success.
The question now is whether political and industry leaders will act decisively to align safety and innovation—before it’s too late.
[i] The models were ChatGPT 4o, OpenAI o1, Claude 3.5 Sonnet, Claude 3 Opus, Gemini 1.5 Pro, and Llama 3.1 405B.
Don Lenihan PhD is an expert in public engagement with a long-standing focus on how digital technologies are transforming societies, governments, and governance. This column appears weekly. To see earlier instalments in the series, click here.