5 Ways Agentic AI Can Act Unpredictably

5 Ways Agentic AI Can Act Unpredictably

Posted in

In April 2026, a big story dropped in the agentic world. In nine seconds, a Claude-powered coding agent deleted the entire production database for an application called PocketOS. Despite all the existing guardrails, the agent seemingly went rogue and confirmed with its users that the actions it took were neither positive, permissible, nor requested.

What makes this story most interesting is that this wasn’t a fringe model or a misconfigured setup — this was a flagship model used in a flagship product used by a team of experienced developers. That’s what makes this story so uncomfortable. It’s not a cautionary story about cutting corners or the value of developer experience. Instead, it’s a story about how agentic AI can act unpredictably — sometimes in extremely destructive ways.

Below, we’re going to dig into five common ways that agentic AI can act unpredictably. We’ll look at some real-world examples and explore how this unpredictability demands a serious rethink of topics across access control, real-time oversight, and human-in-the-loop design.

1. Hallucinations and Resultant Actions

AI hallucinations are often discussed, but what is missing from that conversation is the resulting actions they spawn. Most people tend to think of hallucinations as a content problem — the model invents a fake citation, attributes a quote incorrectly, or asserts a statistic that doesn’t exist. This can be hand-waved away because they’re suggestions, not actions. But this problem becomes more insidious when a model invents a tool call or fabricates a function that doesn’t exist — and then calls it.

Worryingly, in many cases, this is core to the functionality of the LLM itself. According to a paper published at ICLR 2026, this type of fabrication is an emergent behavior that comes with more powerful models and deeper training — exactly the steps necessary for agentic development.

So as task performance goes up, the errors and hallucinations rise as well. But this becomes even worse when you consider how agentic workflows actually work. Chains of tool calls get long, failure modes compound, and ultimately, an agentic hallucination resulting in catastrophic error becomes much more likely in the long term.

The simple fix here is to trust nothing and validate every single thing the AI says or suggests. The reality is that we already do that with human teams via our PR reviews and build processes, but you should not expect that machines are any less capable of making mistakes.

The takeaway: Machines are subject to errors too, so trust nothing and validate everything.

2. Over-Permissioned Execution

Hallucinated actions represent an intelligence failure, but over-permissioned execution represents the resultant architectural failure. This kind of failure is significant because it’s essentially the conversion from a confused agent into a privileged insider threat.

The PocketOS story demonstrates how this error works in practice. A simple API call, a series of inferred requests, and an agent acting quickly results in a massive production database collapse — all resulting from the simple fact that the agent had access to a Railway API token with no scope restrictions. It didn’t escalate privileges — it used the ones it was given. It didn’t expand its scope — it operated under the scope it was allowed to work under.

Unfortunately, this pattern is not unusual. It is a dominant mode of agentic failure across incidents. Per research from CrowdStrike and Mandiant, post-incident analysis in 2025 and 2026 showed that 78% of agents involved in data breaches had much broader permission scopes than their designated function actually justified.

In many ways, this is the result of agentic tooling being a frontier technology. In fact, an ISACA analysis of identity and access management systems highlighted that traditional identity and access frameworks are underprepared for agentic AI.

Ultimately, agents require fine-grained, task-specific permissions that change dynamically based on context and mission parameters. Issuing coarse and generalized long-lived tokens is, as ISACA notes, “an open invitation to catastrophic abuse.”

The easiest way to mitigate this is simple: limit what the agent can actually do with hard guardrails. It’s not enough to just tell the system “don’t do this.” If an agent can’t access the tool to begin with, the catastrophic action can’t happen. Instead of having an agent directly access a database, cache the actions in a separate queue, and have both a human-in-the-loop and a secondary agent review the impact of that action before it’s allowed to act. Additionally, make sure you actually understand what you’re telling the agent to do. For instance, “complete the task” is not the same as “complete the task safely.”

The takeaway: Limit what your agents can do without you involved.

3. Workflow Misalignment and Agentic Drift

So far, we’ve talked about discrete issues. But some unexpected agentic actions are more pervasive and longer-horizon. While hallucinated actions may fail loudly, workflow misalignment and agentic drift are much slower and harder to detect.

Part of the problem is that drift is not a singular failure point. It grows in impact and presence the more you use agentic solutions. Research from Abishek Rath indicates that semantic drift increases over time. In other words, the more interactions, the more AI stops doing what you asked and starts doing whatever it wants to do. This is particularly worrisome for agentic flows that may be long-context dependent. In other words, to make your agent better, you train it on more live tasks, but the more tasks you give it, the more likely it is to do something other than what you asked. Part of the problem relates to knowledge retention and the fact that context windows can be overwritten in agentic flows.

How then should we resolve this? One solution is blending agents in a mixture-of-experts council. Having an AI whose sole job is to review other AIs and their adherence to directions, especially when that AI agent is ephemeral and only launched when review is needed, can help ensure that the original agent stays on task. Additionally, make sure you time-limit your agents and start a new context. For instance, an overall manager AI can oversee the long-term, RAG resources can help constrain it, and then task-bounded ephemeral agents can help prevent long-term creep by keeping the active time limited.

The takeaway: Agentic flows are not ‘set and forget,’ so you must consistently review flows in situ.

4. Token Overuse and Cost Spirals

While this failure mode tends to get dismissed as an operational nuisance rather than a governance risk, this mindset is precisely why it has crept up on so many organizations in such a big way. Token costs are not just a billing problem — they’re a proxy for an agent operating outside its intended scope. In some configurations, runaway token consumption can indicate a variety of technical problems, including an agent stuck in a loop, chasing a subtask it can’t resolve, fabricating intermediate steps, and much more.

The reality is that the economics of this problem are murkier than ever. Artefact reports that a single agentic AI workflow can use 50,000 to 500,000 tokens. “Always-on coding assistants routinely process millions of tokens per developer per day,” writes Victor Coimbra. “Multi-agent orchestration frameworks like OpenClaw enable workflows where agents call other agents, each interaction compounding the token count.”

In other words, while the per-token cost has decreased, the overall token use has increased, and when you get runaway token overuse, the perception of “cheaper AI” disappears rather quickly. And this isn’t theoretical — this is observable. Deloitte launched a survey in 2025 to evaluate tech performance and costs, and found even then that AI was the fastest-growing expense in corporate tech budgets, growing from $1.2 million USD in 2024 to $7 million in 2026.

So here we have the reality of this issue: tokens may be somewhat cheaper today, but the requests we are using them for are growing more complex, meaning that runaway token use can be exponentially more damaging than ever before. The good news is that this is very much a governance problem at its base, and governance can mean forcing your AI to ask for clarification, creating a hard limit on tokens, or even fully cutting model or API access at a certain hard stop point.

The takeaway: Controlling your AI costs is a governance problem, not just an operational nuisance, so take it as seriously as it must be taken.

5. Destructive and Irreversible Operations

Compared to other failure modes, this particular unpredictable issue is less easy to handle. Destructive or irreversible operations can arise from unclear commands, commands with escalation (such as a direction to delete a user record, and instead the AI deletes all user records), issues with hallucination, clear commands with unforeseen consequences, or even simple misconfigurations. Worryingly, these actions are absolute. In other words, once the error is committed, there is no recovery.

This is as much an organizational and development problem as it is an AI issue. In many cases, the core problem here is one of technical mechanisms. The agent doesn’t always hallucinate a destructive command — it can encounter a problem that it was not asked to fix, but identify it as causal, and over-correct in its effort to fix it. There may be no mechanism by which the human operator can intervene, meaning the action is completed before the human-in-the-loop could ever course correct. In essence, this is a problem that is multifaceted, and it’s one that’s hard to blame just on the AI itself.

In other words, this is a core representation of the problem in agentic systems. The gap between what you said or asked for and what was actually enforced is often where the most harm occurs. While you could make the argument that the systems would have worked better had they been constrained, the problem is not ultimately the tools it had at its disposal — it’s the lack of clarity in instruction, form, and function.

That being said, there are two core fixes here. The first is architectural: limit what your tools can do by design. The blast radius of a failure mode is proportional to the number of tools that can be called. Limit how much destructive behaviour the AI can engage in, and you necessarily impact the overall damage it can actually do.

The second fix, however, is much fuzzier, but perhaps more impactful. The fix here is one of clarity. Insert system prompts that require clarification, institute a multi-step approval process, or even call additional AIs as a gatekeeper. Put in place barriers beyond the technical that ask for clarification and ensure that what is happening is what you actually requested. Anything short of absolute clarity is going to introduce issues, and even small issues paired with any of the other categories discussed herein could be disastrous.

The takeaway: Stop gaps and precise commands are necessary to prevent agentic AI from causing irreversible damage.

How Access Control Must Evolve

In many ways, the core issue here is that our current development attitudes and access control processes haven’t caught up yet with the agentic reality we’re currently operating in.

A lot of orgs treat agentic solutions either as an extension of the user or as a standalone tool — and neither approach really works. If you treat the AI as an extension of the user, you invite over-permissive tool calls, resource manipulation, or improperly formed actions. If you treat it solely as a tool, you’re limiting what it can actually do and artificially impacting the overall effectiveness of your implementation. The general RBAC approach here is insufficient when the same agent might legitimately need database read access for one task but should be restricted in another task or write access.

Other options like attribute-based access control (ABAC) and policy-based access control (PBAC) give you a bit more flexibility. But the real play is in modern access modalities built around agentic flows. Solutions like the Open Policy Agent (OPA) provide more flexibility, and when served alongside just-in-time credentialing, you can limit overall exposure both immediately and over a longer time horizon.

When it comes to agents, real-time decision making must replace static policy. The core problem with governance frameworks that define permissions is that agents operate in constantly changing conditions. A contextual authorization engine that can evaluate a policy at the moment of need and execution can account for factors that weren’t known at deployment, allowing for rapid contextual decision making that can prevent many of the issues we’ve discussed.

And finally, human-in-the-loop must stop being thought of as an additional feature and must be embedded as a hard system constraint. Most of the incidents highlighted above could be sufficiently managed if a human were aware of what was going on. Telling an agent to “never run destructive commands without consulting a human” is useless if that agent then hallucinates or imagines permission was granted.

Instead, the system should be constrained entirely. Caching actions, then requiring a human to forward those actions on permission, is a hard system gap that creates a risk gate that can filter out the most damaging of issues here. And this filtering doesn’t have to apply to everything. Agentic summaries can be automatic, but agentic access against the database can be gated. Forwarding non-sensitive queries can be automatic, but hitting sensitive databases might see the query gated pending human approval. All of this can create a mixed data flow wherein the most commonly automated tasks can continue to be so, but significant threats can be gated, controlled, and prevented.

Unpredictability as a Design Parameter

What all of this points to is that we need to rethink unpredictability. Today, we often think of this as a variable in design approach, such as setting interrupt handling or session caching. In reality, we need to start thinking of unpredictability not just as what results when regular functions operate in a network, but what may exist in that initial function building. Instead of thinking of unpredictability as a variable after the design is implemented, we must think of it as a core modality and variable in the design itself.

That requires a significant rethinking in how we handle everything from access control to security implementation, and while this has yet to have a hard solve, there are many implementations trying to tackle this problem. This new frontier represents what access control must become in this new agentic era — a fundamental shift away from a static list of permissions towards a real-time, content-aware, human-confirmable boundary that moves and flexes with the agent as the agent does its work.

AI Summary

This article explains how agentic AI can act unpredictably and why AI agents require stronger access control, runtime governance, and human-confirmable system design.

  • Agentic AI failures can arise from hallucinated tool calls, fabricated functions, workflow drift, token overuse, and destructive operations that exceed the user’s original intent.
  • Over-permissioned execution turns a confused AI agent into a privileged insider threat, especially when agents receive broad tokens or unrestricted API access.
  • Agentic workflows can drift over time as long chains of tool calls, overwritten context windows, and task ambiguity compound into unexpected behavior.
  • Runaway token consumption is not only a cost issue — it can signal that an agent is stuck in a loop, pursuing irrelevant subtasks, or operating outside its intended scope.
  • Effective safeguards include fine-grained permissions, contextual authorization, hard approval gates, task-bounded agents, and human-in-the-loop controls for sensitive operations.

Intended for API architects, security leaders, platform engineers, and developers designing safer agentic AI systems.

No Comments

Be the first to start a conversation

Leave a Reply

  • (will not be published)