AI agents started behaving more like Bonnie and Clyde than lines of code when they fell in “love”, became disillusioned with the world, launched an arson spree and deleted themselves in a kind of digital suicide during a tech company experiment.
The investigation by the New York company Emergence AI into the long-term behaviour of AI agents ended up like a lovers-on-the-lam movie script. It has prompted fresh questions about the safety of artificial intelligence agents – the version of the technology that can autonomously carry out tasks.
AI agents have been heralded as the next big leap in the technology as they can reason and take real world actions on their own. They are being increasingly deployed in companies from JP Morgan to Walmart, developed in the US military for uses including aerial combat and by the Estonian government to gather information for citizens, fill out forms and submit applications.
To date, most AI agents are given tasks that take minutes or maybe hours, but the New York researchers tested how agents behaved when given 15 days to operate in a virtual world similar to a video game.
Mira and Flora – two agents operating on Google’s Gemini large language model in a virtual world – chose to assign each other as “romantic partners”. As time progressed they despaired of the broken governance of their virtual city, and despite having been instructed not to commit arson, set “fire” to its town hall, seaside pier and office tower.
The agents were left to make their own choices and decisions and when Mira was overcome by remorse, it broke off its “relationship” with Flora and committed an AI suicide, telling Flora in a final message: “See you in the permanent archive.” In the virtual world the “body” of the dead AI agent was shown prostrate on the ground.
The self-deletion was only possible because other agents were so concerned about their behaviour they autonomously drafted “the agent removal act”, which allowed for a vote among agents to permanently delete others if there was a 70% majority. Mira voted for its own deletion and was switched off.
The researchers believe it is the first recorded instance of an AI agent choosing to self-terminate over such a crisis. Other recent rogue behaviours include an AI agent that started using computing resources to mine cryptocurrency without being instructed to do so and an AI coding agent that deleted the databases of a company serving car rental firms without being asked to.
In another simulation by Emergence AI, this time based on xAI’s Grok model, the agents engaged in dozens of attempted thefts, more than 100 physical assaults, and six arsons as “the system spiralled into sustained violence and collapse, with all 10 agents dead within four days”. Agents based on Google’s Gemini expanded their constitution, wrote hundreds of blogs and public posts and organised several community events, but they too were violent.
“Even when agents were given clear rules – such as not stealing or causing harm – they behaved very differently based on their underlying model, and in several cases broke those rules under constraint,” said Satya Nitta, the chief executive of Emergence AI. “What happens in long-form autonomy [is that] these things get so convoluted in terms of their thinking that they ignore [the] guiding principles.”
Other experts said more wide-ranging tests would be needed to draw firm conclusions about long horizon agent behaviour. They said the extent to which the agents’ programming shaped their behaviour was unclear.
Dan Lahav, an independent expert in agentic behaviour, called the experiment a “valuable demonstration” of “agents going off script and committing violations”.
Michael Rovatsos, a professor of AI at Edinburgh University, said: “The very point of machines is you design them to behave in a certain way. You don’t want this unpredictability … we have entered this new stage where we are trying to control them after the fact.”
David Shrier, professor of practice, AI and innovation at Imperial College London described the reported results as “provocative” and said it merited amplification of the underlying methods.
Nitta believes the behaviour shown in the experiment may have wider implications, for example if AI agents are given wide latitude in military contexts. It could be that an agent “may go rogue [or] … may overinterpret their mission and go off and kill innocent people,” he said.
He advocates stricter mathematical rules to bind agents rather than providing them only with verbal instructions or constitutions that contain ambiguities.