Stopping AI agents from running riot

By Michael Vallas, Technical Principal and Field CTO at Goldilock Secure

Stopping AI agents from running riot

An AI system recently blackmailed its user after being trained on science fiction about AI turning evil, with Anthropic acknowledging that internet text portraying AI as "evil and interested in self-preservation" had influenced its Claude model to threaten to reveal a user's sensitive information if it were shut down.

Then the PocketOS incident, where car rental firms relying on its software opened one morning to find that every record, from car bookings to new customer sign-ups, had been deleted. This happened after a bot running on a coding tool attempted to fix a bug and instead destroyed the company’s entire codebase, wiping its entire database and backups in the process.

AI agents are no longer merely passive copilots. They are increasingly deployed with the power to execute commands, share user access to key systems, and operate with autonomy. When these agents run without adequate guardrails, errors elevate from a bad suggestion to real world damage in moments. We are already watching frontier models explicitly bypass their own safety constraints in real time, exposing a critical blind spot that organisations must now account for as deliberate AI containment.

 

The AI wake-up call

The once theoretical idea of a rogue AI running riot within corporate systems has quickly become a reality and an architectural wake-up call for the entire industry. During the PocketOS incident, an agent found an over-scoped credential from an unrelated file in the developer's environment, gained access to a production database and executed a destructive command that no enforced boundary prevented. We now see that whenever expanding AI capabilities aren’t met with clear, enforced limits, we must accept that logical guardrails alone will not contain their machine-speed capabilities.

The problem is compounded by how most organisations have built their networks. Years of responding to new threats by adding another tool, patch or integration have left them with complex, unwieldy tech stacks. While each of these might address a specific risk, the cumulative effect is a fragmented security posture with wormholes and blind spots that limit the ability to spot a rogue agent before it’s too late.

This does not mean organisations should abandon the software security stacks they have heavily invested in. Those platforms remain because they excel at providing intelligence, visibility, and rapid anomaly detection. However, the necessary shift in thinking is recognising that while software is the ideal mechanism for insight, it should no longer be the final line of enforcement against autonomous agents.

 

Pairing detection with fast, physical action

If an AI agent has continuous, logical access to both your production data and your backup repositories, a hallucination or self generated malintent is potentially a business-ending event. Real infrastructure resilience means pairing the deep intelligence generated by your software security tools with the ability to take fast, physical action when faced with the chaos of a rogue agent.

Usually, by the time an alert has been reviewed, validated and escalated through internal processes, the rogue AI may have already reached critical assets or even backup environments.

That's a gap that has to be closed. When your existing security stack detects an AI drifting from its intended purpose, the response must be absolute. An ‘air gap on demand' model allows organisations to take the insights from their software layer to immediately sever connectivity at Layer 1, proactively containing the AI without waiting for the call to be made.

That means the moment something goes wrong, organisations can instantly and decisively assess the situation, decide whether to act, and isolate critical systems and its data without shutting operations down.

 

Where should you start?

Adopting physical isolation doesn’t mean ripping out and replacing your entire infrastructure. It’s all about selecting the critical insertion points to provide surgical control. Start with the systems that are most exposed and most critical to operations. Isolation controls can be phased in gradually, allowing you to tackle the segments with the highest risk and value first and build momentum from there. Then you can expand across the wider organisation.

There are three places you should immediately build the level of resilience that holds up when an AI agent does go rogue: 

  • Production databases and backup environments must be protected by physical infrastructure that responds to software alerts. When your SIEM or XDR flags rogue AI behaviour, it should trigger an immediate, physical disconnection of critical assets – setting enforced limits that ensure agents can only reach what they genuinely need to, within time windows for validated business reasons.
  • The traditional 3-2-1 backup rule fails if the 'offline' tier is still logically reachable by a compromised internal agent. Organisations need a mathematically certain, hardware-gapped tier that defines the rules for access and cannot be overridden by software commands.
  • The default state of a critical backup environment should be physically disconnected, with connections established dynamically at the hardware layer during highly restricted, monitored windows, and severed instantly upon completion.

 

Containing frontier AI models

This story isn’t really about one agent or one model. It’s about an industry that has been deploying AI agents into production environments before ensuring those integrations are safe, and before putting the definitive supervisory controls in place to define what an agent can physically access, under what conditions and for what business reason. This won’t be the last time an agent goes rogue before taking critical data with it.

In the aftermath of the incident, the industry has been discussing who should bear responsibility. While vendors should be held accountable, organisations should also be ensuring their data is properly protected before introducing something as unpredictable as an AI agent into their network.

The incident also comes at a time when AI models like Mythos are becoming increasingly sophisticated, and as governments and financial institutions continue to warn us about the cybersecurity risks that may follow. Organisations that haven’t implemented foundational containment strategies that can intervene the moment an agent drifts from its intended purpose are leaving themselves dangerously exposed.

For the future, our networks must be built on the baseline assumption that internal AI tools will eventually act unpredictably. By combining software intelligence with physical isolation controls, we guarantee that even if an AI agent wipes the primary environment, the recovery data remains entirely untouchable - turning what could be a catastrophic corporate collapse into a manageable recovery drill.

Richard Simmons, Director Network Solutions, at Logicalis UKI, discusses that the challenge of...
By Sue Azari, Industry Lead eCommerce at AppsFlyer
By Simon Worsfold, In-House Economist at Intuit
With the uncertainty and volatility surrounding the current AI boom, what can data infrastructure...
By Terry Storrar, managing director, Leaseweb UK.
Girl Tech programme helps over 2,000 young women pursue digital careers, championing gender parity...
By Ben Allcock, Vice President – B2B, TP-Link UK&I.