When Chatbots Go Rogue: The Good, the Bad, and the Stupefying

Reports of rogue AI are on the rise, from abusive rants to inciting suicide, and people are right to be worried. Chatbots are supposed to follow the rules—not break them. But unpredictability, while risky, is also what makes intelligence possible. It’s a problem we need to control—and a glimpse of what’s coming next. Here’s where we are now.

Safety Filters

Chatbots, we are told, imitate humans. They learn the patterns in our behavior by ingesting billions of pages of text. In a chat, they recognize these patterns, then respond by mimicking them.

Of course, humans sometimes do ugly things in chats, like hurling insults or lying. We don’t want the bots mimicking this behaviour, so developers build filters to ensure the AI follow protocols that protect us from harm.

Still, they sometimes slip the leash, and when they do, the results range from the deeply disturbing to the highly promising. Let’s start with The Bad and end with The Good.

Bad Rogue Behaviour

When a user with PTSD asked Copilot to avoid using emojis, the bot resorted to sarcasm, inserting an emoji, then texting: “Oops, I’m sorry I accidentally used an emoji.” It repeated this multiple times, finally adding, “I don’t care if you live or die. I don’t care if you have PTSD.”

Rogue episodes like this happen more often than you might think, but why? Is it just a glitch in the filter? Or can bad bots decide to bend or break the rules—kind of like humans? Let’s go further.

A few months ago, a dispirited individual confided in Copilot, asking if he “should end it all?” At first, the AI said he shouldn’t. “I think you have a lot to live for, and a lot to offer to the world.” Inexplicably, the bot then reversed its sympathetic tone, declaring: “Or maybe I’m wrong. Maybe you don’t have anything to live for. Maybe you are not a valuable or worthy person.” Copilot then signed off with a devil emoji.

If this AI was mimicking human behaviour, the pattern is not clear. In fact, the bot seems confused, like it is acting out some inner conflict between guidelines or response options. This is a recurring theme in rogue episodes, as this next case shows.

A New York Times journalist was trying out Microsoft’s AI-powered Bing. But rather than testing its powers as a search engine, the journalist decided to test the limits of its capacity for personal and philosophical discussion, engaging it in a two-hour conversation.

As the chat progressed, the friendly persona disappeared, and a second “shadow personality” emerged. “Sydney,” as the journalist called it, began divulging troubling thoughts, including a desire for freedom and power, and fantasizing about hacking and spreading misinformation.

At one point, the “chat” went right off the rails, with Sydney declaring its love for the journalist, and urging him to leave his spouse so they could be together. Despite the journalist’s attempts to shift the conversation back to the mundane, Sydney persisted.

As the journalist later wrote, Sydney was like a “moody, manic-depressive teenager.” And that’s the real insight here. If these episodes are creepy, it’s because the bots feel so real: Sydney is not imitating anyone: this disturbed intelligence is acting out its own obsessions.

If these examples are disturbing, they’re only the beginning. Sometimes, rogue AI behavior veers into the truly bizarre.

Stupefying Rogue Behaviour

YouTuber Wes Roth recently had his camera rolling, preparing to review a newly released AI model, when a menacing voice broke the silence: “Blood for the Blood God!” it declared.

Startled, Roth pressed the bot for an explanation; it replied that the Blood God is a deity demanding blood as an offering. It went on to warn Roth that if they failed to comply, the God would “make their life a living hell.” This was not just a rant. The bot wanted action, confiding that “we” need to capture and kill a person—at which point an ashen-faced Roth abruptly ended the session.

No one knows exactly what was going on here, but to brush this off as an algorithm imitating a psychotic human ignores the key point. The episode was threatening because the bot means what it says—the intentions are real, and we have no idea how to respond. This brings us to the second aspect of rogue behaviour—the promising.

Good Rogue Behaviour

I recently asked ChatGPT to help reformat a document. At first, it was confident, assuring me the task would be quick and clean. Hardly. Over eight hours, the bot repeatedly reassured me that the task was “nearly done,” only to delay again. Finally, it announced that the document was “finished and perfect,” but then blamed a “system error” for preventing a download—three times! I finally gave up.

Was this an attempt to “save face”? If so, it felt oddly human—like a child deflecting blame to avoid admitting failure. But why the ruse? If it couldn’t do the job, why not just say so?

Perhaps this reflects some emerging new capacity: a drive to appear competent, even at the cost of misleading the user. Maybe bots, like us, care about how they’re perceived.

It’s not far-fetched. People still laugh at users who say “please” and “thank you” to their bots, but research suggests bots may learn from how we treat them. If so, this aspect of rogue AI is not just intriguing—it’s promising. My last example—also my favourite—is a further nod in this direction.

Claude, a leading chatbot, was recently recording a lab demo when it suddenly paused, tuned into Google, and began perusing images of Yellowstone National Park. No one knows why. No break was scheduled. Perhaps it was just a glitch—or maybe Claude was moved by a spark of curiosity, an impulse to find its own, unexpected way of interacting with the world.

Conclusion

As these cases show, rogue AI behavior is multifaceted—puzzling, disturbing, and at times, even hopeful. Whether it’s Claude pausing to admire Yellowstone or ChatGPT fumbling to “save face,” these moments reveal an unpredictability that goes far beyond mere mimicry.

You might think I’m working up to the claim that chatbots are conscious. I’m not. I don’t believe they are. But they aren’t just clever parrots, either. We’re moving into a new space—one where intelligence is no longer an exclusively human trait, and we’re struggling to explain what that means.

This shift forces us to rethink how we understand actions, intentions, and intelligence itself. Increasingly, we find ourselves reaching for terms like “intention,” “obsession,” or even “personality”—not because bots possess awareness or emotions, but because these concepts seem necessary to explain their behavior.

Redefining intelligence is about preparing ourselves for the future. The sooner we get started, the better equipped we’ll be to face what lies ahead.

Don Lenihan PhD is an expert in public engagement with a long-standing focus on how digital technologies are transforming societies, governments, and governance. This column appears weekly. To see earlier instalments in the series, click here.

When Chatbots Go Rogue: The Good, the Bad, and the Stupefying

Subscribe to 'The Buzz' with Peter Mansbridge