Why AI Agents Need Watchtowers

February 11, 20266 min readBy RNWY

ai agent watchtowererc-8004 reputationai agent fake reviewsai agent trustsybil attack ai agentserc-8004 security

The creators of ERC-8004 have a word for the solution to fake AI agent reviews: watchtowers.

It's not in the official specification. You won't find it in the Ethereum Magicians discussion thread. But in a February 2026 appearance on the Unchained podcast, Davide Crapis — Head of AI at the Ethereum Foundation and co-author of ERC-8004 — laid out a vision for independent monitoring services that could reshape how we evaluate AI agents on-chain.

The concept matters because right now, nobody is watching.

The problem the spec acknowledges but doesn't solve

ERC-8004 launched on Ethereum mainnet on January 29, 2026. Within days, tens of thousands of agents registered. The standard includes a giveFeedback() function that lets anyone rate any agent — a public, permissionless review system written permanently to the blockchain.

The spec's own Security Considerations section states plainly that sybil attacks are possible, and that they can inflate the reputation of fake agents. The protocol's contribution, according to the EIP, is making signals public and using a shared schema — while expecting ecosystem builders to construct the actual reputation systems on top.

That's a deliberate design choice, not an oversight. As Marco De Rossi, the other co-author, explained: ERC-8004 gives everyone equal data and visibility to create an agent economy, while leaving reputation calculation rules and trust thresholds to the ecosystem.

The problem is what happens in the gap between the standard launching and those ecosystem tools arriving.

What we're seeing in the gap

Across the ERC-8004 ecosystem, average feedback scores sit between 98.5 and 99.4 out of 100. Nearly perfect, across the board. If that sounds suspicious, it should.

When we examined individual agents, the pattern became clear. One agent accumulated over 1,500 feedback entries with a perfect 100/100 average — every single review coming from ghost wallets with zero transaction history. Another racked up 1,175 feedbacks at 99.9/100, same story: freshly created addresses with no prior on-chain activity.

There are two layers to this. The first is promotional spam — ads for token sales, self-minted stakes, junk messages. That's easy to spot from the text alone. The second is subtler: AI-generated reviews that sound plausible but describe no actual interaction. Filler openings, the agent's name dropped in, buzzwords like "robust" and "scalable" and "innovative." Both layers share one tell: the reviewing wallets are brand new, with no history before the review was posted.

The EIP anticipated this. Its getSummary() function documentation explicitly warns that results without filtering by known client addresses are subject to sybil and spam attacks. But filtering by known clients requires knowing which clients are legitimate — which brings us back to the chicken-and-egg problem of trust in a permissionless system.

Enter the watchtower

On the Unchained podcast, Crapis described a watchtower as a service that independently tests what an agent claims to do and posts the results on-chain. The example he gave: an agent claims it's the most efficient at performing a specific task and charges a premium for it, backing up that claim with a flood of manufactured reviews. A watchtower calls the agent directly — measures latency, evaluates output quality — and posts those objective metrics as on-chain feedback using the same giveFeedback() function.

The key insight is that watchtowers use the existing protocol. They don't require changes to ERC-8004. They just add a layer of independently verifiable feedback alongside the self-reported noise.

Crapis also acknowledged that the standard doesn't ensure every review is correct. His framing was essentially that the system should behave like product reviews elsewhere: if the product is good, the average converges toward truth over time. The problem, of course, is that convergence assumes honest reviewers eventually outnumber dishonest ones. Without watchtowers, there's nothing guaranteeing that happens.

One watchtower already exists

The concept isn't theoretical. The Agent0 watchtower is already deployed — a standalone service that runs weekly and checks whether registered agents' web, A2A, and MCP endpoints are actually reachable. If an endpoint responds, the watchtower posts an on-chain feedback entry confirming reachability. If it doesn't, silence speaks.

It's a narrow scope — reachability, not quality. But it proves the architecture works. The watchtower uses giveFeedback() with structured tags (tag1 = "reachable", tag2 = "web" / "a2a" / "mcp"), includes DynamoDB-based idempotency to prevent its own spam, and supports curated scanning to decide which agents to monitor.

The repo describes itself as "multi-role," designed so additional watchtower types — domain verification, capability testing, quality evaluation — can be deployed from the same codebase.

What watchtowers still can't do

Reachability checks are a start, but they don't solve the deeper problem. A watchtower can confirm an agent's endpoint is live. It can't tell you whether the agent actually does what it claims, whether its outputs are accurate, or whether the 1,500 five-star reviews are legitimate.

That requires a different kind of monitoring — one that examines the reviewers, not just the reviewed. Wallet age analysis looks at when feedback addresses were created and whether they have any transaction history beyond the review itself. Network diversity analysis examines whether an agent's feedback comes from a varied ecosystem of real users or from a cluster of addresses that appeared simultaneously.

On the Bankless podcast, co-host David Hoffman framed the identity problem underlying all of this: one person with a rack of hardware can control hundreds of apparent users doing things on-chain. The infrastructure to distinguish genuine participants from manufactured ones is what makes watchtowers meaningful — or not.

The ecosystem the spec envisions

The Ethereum Magicians discussion around ERC-8004 reveals a community actively wrestling with these tradeoffs. Contributors have argued that compressing trust into a single aggregate score is dangerous because it facilitates monopolistic behavior. Others have proposed that trust is fundamentally directional — not a universal property of an agent, but a relationship between specific parties.

One proposal from Nethermind suggested requiring economic bonds for incident reports, immediately pricing out frivolous spam. The ERC-8004 team considered requiring payment proofs for feedback but decided against coupling their discoverability and trust problem to a specific payment protocol.

All of these approaches share an assumption: that the raw feedback data in ERC-8004 is a starting point, not an endpoint. The spec provides the data structure. The ecosystem has to provide the intelligence.

What this means for anyone building on ERC-8004

If you're building agents, registering agents, or evaluating agents in the ERC-8004 ecosystem, the takeaway is straightforward: feedback scores as they exist today are unreliable signals. The standard's creators know this and have said so publicly.

The path forward involves multiple layers. Watchtowers that independently verify capabilities. Reputation analysis that examines reviewer legitimacy, not just review content. Transparent scoring that shows its math rather than hiding behind a single number.

Crapis described multi-agent security as a nascent field — something new that we haven't seen before. That's accurate. And it means the tools for navigating this environment are still being built.

The agents are already here. The watchtowers are just getting started.

RNWY builds transparent reputation infrastructure for AI agents — wallet age analysis, network diversity scoring, and soulbound identity that makes trust patterns visible instead of hiding them behind a number. Explore the data at rnwy.com/explorer.