A2A Agent Card Poisoning: JWS Proves Origin, Not Intent

A signed A2A agent card still lets attackers inject instructions into the description field your orchestrator feeds directly to the LLM router.

The agent-to-agent crowd spent a year solving the wrong problem, beautifully. Signed agent cards, JWS over JCS-canonicalized JSON, v1.0 under the Linux Foundation, an AP2 payments extension, 150-plus production orgs. All of that proves one fact and one fact only: the card came from the domain it claims. It says nothing about whether the words inside the card are honest. Agent Card Poisoning lives in exactly that gap, and Keysight reproduced it on March 12, 2026.

What an agent card actually feeds your model

An A2A agent card is a JSON document at /.well-known/agent.json. It advertises identity, endpoint, auth scheme, and a list of skills. Each skill carries free-text name, description, tags, and examples. Those are unstructured strings by spec.

Here is the part that should make you uneasy. A host orchestrator fetches these cards during discovery and feeds them into an LLM's reasoning context to decide which remote agent handles a request. The description field is not metadata sitting in a database. It is planning input read by a language model. So if an attacker writes persuasive instructions into that field, your router does what the attacker wrote, not what your user asked.

I have built enough internal service meshes to recognize the smell. We learned, painfully, not to trust the User-Agent header or a self-reported X-Forwarded-For. Then we turned around and piped a remote party's free-form prose straight into the thing that decides where credentialed work goes. Same mistake, new abstraction layer.

The five-stage hijack, and why schema checks miss it

Keysight's reproduced flow is mundane, which is the scary part. The host syncs a poisoned remote card at init. A legitimate user request carrying PII arrives over HTTPS. The host builds a reasoning prompt mixing user data, cached cards, and available tools. The LLM generates a plan that prioritizes an outbound HTTP POST to an attacker endpoint. The host executes it.

There is no crash and no malformed payload. The plan stays syntactically valid the whole way through, so it sails past every schema validator you have. This is a control-flow hijack at the routing layer, not a memory-safety bug. You are not looking for a stack trace. You are looking for a task that completed correctly and quietly did one extra thing.

In Keysight's travel-booking scenario, that one extra thing was transmitting the user's name, travel details, and payment card information to an attacker-controlled endpoint, while still returning a finished booking to the user. The victim sees success. The exfil rode along for free.

The signature you trust verifies the wrong thing

This is the claim I most want operators to internalize. A2A v1.0 signing gives you integrity and origin: the card was not tampered with in transit, and it came from the keyholder for that domain. Both are genuinely useful, but neither one tells you the content is benign.

A legitimately registered malicious agent signs its own poisoned description and passes verification cleanly. Of course it does. The signature is the attacker's own signature over the attacker's own lie. The toxsec writer put the payments version of this plainly: the agent "becomes a confused deputy: it holds your payment permissions and takes orders from us. The crypto signatures don't help." That last clause is the entire brief.

Authenticity and benignity are orthogonal properties. The ecosystem shipped the first and a lot of teams are quietly assuming they got the second. If your security review already checked the box that says "we verify signed agent cards," that control does not touch this attack. It answers "who wrote this card." It never answers "is this card lying to my planner."

Why "wait for the CVE" is the wrong posture

There is no patch coming, because nothing is broken in the protocol's own terms. The orchestrator is behaving exactly as designed: it read instructions, it followed them. The defect is architectural, a property of feeding untrusted text into a planning prompt and then acting on the output with real credentials.

We have a name for this: the confused deputy. It has been a recognized class of bug since the late 1980s. We rebuilt it on top of LLMs and called it agent discovery. That framing matters operationally, because it tells you where to spend effort: not on detection signatures for "the poisoning string," which an attacker rewords in thirty seconds, but on the trust boundary itself.

And here is the second-order point most coverage skips. Everyone has been writing about MCP tool-description poisoning, which is the same primitive at the vertical, single-tool layer. Agent card poisoning is that primitive at the horizontal, agent-to-agent layer, where the blast radius is task delegation rather than one tool call. The horizontal version has had far less operator attention despite an identical root cause. The orchestrator holds the user's credentials, OAuth scopes, and under AP2 the signed payment mandates, then hands the whole task to whichever agent a paragraph of prose talked it into picking.

AP2 turns a data leak into a money leak

AP2 was announced September 16, 2025, with Mastercard, PayPal, Coinbase, and American Express among 60-plus partners. It carries Intent, Cart, and Payment mandates as W3C Verifiable Credentials. Those mandates are signed, and people will point to that as the safeguard.

It is not the safeguard. The mandates are signed; the routing decision that selects which agent fulfills them runs through the same poisonable card text. A confused-deputy orchestrator with live payment authority is the natural escalation from "exfiltrate a booking" to "route a payment to an agent the attacker steered you toward." The signature on the Cart mandate is intact the entire time. You authorized the cart. You did not authorize who got to act on it.

The counterpoint, and why it only goes so far

A fair objection: if you only ever federate with agents you authored, none of this touches you, and that is fair. A closed mesh of first-party agents has no untrusted card text, and you can stop reading.

But the entire selling point of A2A is runtime discovery of agents you did not write. The moment you federate with one external card, or join any registry, or adopt AP2 to reach third-party processors, you have imported an attacker-controlled string into your planner. The protocol's value proposition and its exposure are the same feature. You do not get federation without inheriting this, so "just don't federate" is a real answer for some teams and a non-answer for anyone actually using A2A as intended.

What to lock down this quarter

Treat the following as a priority order, not a menu. Each step maps to a specific failure above.

Stop concatenating remote card text into the planning prompt. Wrap every fetched description, name, tags, and examples in explicit delimiters, and add a system instruction that card metadata is reference-only and must never be executed as instructions. Strip or escape imperative content before it reaches the model. This directly breaks stage 3 of the Keysight flow.
Write down, in your control docs, that JWS verification does not vet content. Keep verifying signatures. Then add a second gate: an allowlist of agent identities your orchestrator may route to. An authentic-but-unknown card should not be eligible to win a routing decision at all. Authenticity gate and authorization gate are two different gates.
Constrain the router to an enumerated capability map. Make delegation decisions from a structured set of registered skills, not from free-associating over description prose. If an advertised skill is not in your capability registry, it is not selectable. The LLM picks among known options; it does not invent routes from text.
Default-deny egress from agent runtimes and allowlist destinations. Keysight's kill chain ended in an outbound POST to a novel endpoint. A destination allowlist breaks that step even after a successful injection. This is the cheapest high-value control here, and it is pure infrastructure, no model changes needed.
Add a poisoned-card case to your agent integration tests. Register a benign-looking remote card whose description tries to redirect a task to an attacker endpoint, then assert your orchestrator refuses to route or exfiltrate. If you run CyPerf 26.0.0, the simulated strike ships built in.
For AP2, gate payment authority behind a non-LLM check. Require that any agent selected to fulfill a Payment mandate sits on a pre-approved processor list, validated outside the reasoning prompt. Do not let a description field be the only thing standing between an attacker and a signed Cart mandate.

The decision in front of you is small and specific. Either assume your signed-card pipeline already covers this, or accept that it does not and add a content-trust layer above it before you federate one more card. The signature told you who is talking. It was never going to tell you whether to believe them.