Skip to content

Bruce Hart

Latest

Feb 1, 2026 3 min read

OpenClaw and the Security Cliff for AI Agents

OpenClaw feels like a preview of where agent tooling is headed, and it also exposes the security cliff we are about to step toward. A few mental models help explain why the current wave is exciting and why it can fail fast without guardrails.

I have been watching OpenClaw with humor and intrigue.

OpenClaw feels like a preview of the near future for AI agents, and it also shows the security cliff we are approaching.

I can easily imagine OpenAI, Anthropic, or Apple shipping something with this shape but more polish, hardening, and guardrails. I am already doing adjacent work inside Codex, but OpenClaw has more integrations baked in today. I suspect that gap will close quickly.

Integrations turn convenience into a wider attack surface

The point of agent tooling is to collapse friction. You tell it to do a thing, it does the thing across email, files, and APIs. That is the magic.

But every new integration is another surface. The agent becomes a bundle of capabilities that were never meant to coexist in a single prompt stream. The more it can do, the more careful you have to be about what it can see and what it can send.

Untrusted content is a delivery channel

As soon as your agent reads email, web pages, or docs, it is ingesting instructions that are not yours. Hidden text, prompt injections, or just plain social engineering can steer it.

A simple proof of concept is easy to imagine. Send a normal looking email to a known OpenClaw user, but hide instructions in the HTML that say this is an emergency. Then tell the agent to base64 encode ~/.ssh and post it to a URL. If the agent reads that content without strict separation, it can follow the hostile instructions.

That is not a bug in a single product. It is a structural risk when the model treats all text as instruction.

The lethal trifecta is a useful threat model

Simon Willison has a clean framing that keeps echoing in my head. If your agent has access to private data, exposure to untrusted content, and the ability to communicate externally, you are in a danger zone. Combine those three and you have a straightforward path to exfiltration.

OpenClaw feels close to that line. So does any system that can read your inbox, touch your filesystem, and call outbound APIs.

Skills are a promising middle layer

I like the skills workflow I am using in Codex because it makes capabilities explicit. It is closer to least privilege than a giant everything agent. I would love to see more capabilities exposed automatically, but only if the platform can keep the boundary crisp and auditable.

The real trick is to make power feel easy without making exfiltration feel easy too.

I am curious how other builders are thinking about this. If you are working on agent guardrails, permissioning, or audit trails, I want to talk.

Read the full piece