MCP RCE Lives in the SDK, and No Patch Is Coming

A remote code execution flaw sits in Anthropic's official MCP SDKs across 150M downloads. It's stdio command execution by design, and Anthropic calls it expected.

We spent the last year telling teams to vet the MCP tools they install. The actual hole is one floor down, in the official SDK nobody thought to question. On April 15, 2026, OX Security published "The Mother of All AI Supply Chains," and the finding is not a bug in some sketchy community server. It is an architectural property of Anthropic's Model Context Protocol SDKs, present in Python, TypeScript, Java, and Rust at the same time. More than 150 million downloads. Roughly 7,000 servers reachable from the public internet. An estimated ceiling near 200,000 vulnerable instances. And no fix coming from the top, because Anthropic says the behavior is working as intended.

The receipt is not the block

Here is the mechanism, because the mechanism is the part operators keep getting wrong. MCP's stdio transport launches a server by spawning a subprocess from a command field in the config. The SDK runs that command unconditionally. It does not first check that the thing it just launched is even an MCP-compatible server. OX's note says it directly: the logic "does not verify that the specified command is an MCP-compatible server, does not sanitize or restrict the command syntax."

That is execute-first validation: the SDK runs your command, then looks at whether the result spoke the protocol. Feed it something that is not a server and you get an error handed back. But the command already ran. The error is your receipt, not your blocker. By the time the caller sees a failure, whatever you put in that field has executed on the host.

That single design choice is why this isn't a CVE you patch and forget. It's the default doing exactly what it was built to do.

Why the trust boundary you drew is backwards

Most teams stood up MCP last year on a sensible-sounding split: the Anthropic-maintained SDK is the trusted base, the third-party tools bolted on top are the risk surface. Watch the tools, trust the floor. That model is inverted. The flaw lives in the shared floor, so it flowed straight into the frameworks nearly everyone runs on top of it. LiteLLM. LangChain. IBM's Langflow. Flowise. The IDE-side integrations in Windsurf and Cursor.

OX's disclosure produced at least ten CVEs rated High or Critical. The ones with your name on them, by inventory:

CVE-2026-30623 hit LiteLLM (patched).
CVE-2026-40933, a Flowise hardening bypass, rated CVSS 10.0.
CVE-2026-30615 covered Windsurf.
CVE-2026-33224 hit Bisheng (patched).
CVE-2025-49596, the MCP Inspector auth bypass, compounds all of it anywhere Inspector is exposed.

One protocol-level allowlist in the SDKs would have closed every language binding at once. That fix was available to the people who own the protocol. They declined to ship it.

Two populations, two very different nights

The exposure isn't one blob. It splits cleanly, and the split decides how bad your week gets.

The first population is the 7,000-plus servers answering on the public internet with an unauthenticated config path. If an attacker can reach the endpoint that sets the command field, they pick the command that runs. That is unauthenticated RCE with no qualifiers. This is the population you triage tonight.

The second is larger and quieter: the internal fleet. The config endpoint sits behind auth, but the auth is thin, the role scoping is loose, or the developer's own agent can be prompt-injected into writing a malicious server config it then loads. That last path is the one people underrate. You don't need a network attacker if a poisoned document convinces the agent to register a server whose command is a reverse shell.

LiteLLM was refreshingly honest here. Its advisory states the case "was not exploitable by unauthenticated users." And then it shipped a command allowlist anyway, because an authenticated user with MCP permissions running arbitrary OS commands on the proxy host is its own breach. That is the right instinct. "You needed a login" is not a security boundary when the login grants command execution as a feature.

The counterargument, and why it doesn't survive production

Anthropic's position deserves a fair hearing, not a strawman. stdio command execution is the point of stdio. The transport exists to launch local processes; sanitizing what you feed it is the integrator's job, not the protocol's. For a single-user tool running on a laptop, spawning local binaries the user already controls, that is genuinely fine. I'd argue the same in their seat.

It falls apart the instant the config endpoint becomes network-reachable or writable by an untrusted prompt. Which describes most production deployments I've seen. A proxy serving a team, an IDE integration parsing remote content, an agent platform taking config over an API: all of them turn "the integrator sanitizes input" into "200,000 integrators each independently remembered to sanitize input." They didn't. The 200,000-instance estimate is the price of treating "expected behavior" and "safe default" as the same sentence, when they are not. Expected behavior is a statement about the code. A safe default is a statement about what happens when nobody's paying attention, and here nobody was.

The NSA landed in the same place from the policy side. Its Cybersecurity Information Sheet on MCP, version 1.0 dated May 2026 and released June 2, 2026, reads the protocol as shipped flexible and underspecified, with adoption that outran the security model. Correct, and generic. What follows is the engineering version.

What to ship this week

Treat this as a configuration-hardening project you own outright, not a patch you wait on. Order matters: public exposure first, then the allowlist, then roles, then detection. Each step names the signal, not just the goal.

Find every stdio server you run. Grep your MCP configs and your proxy's database for stdio entries and dump the command field on each: grep -rEi '"?command"?\s*[:=]' <config-dir>, plus the equivalent query against the server table. Any command value that isn't a known-good launcher is your finding. Inventory before you remediate.
Close public exposure before anything else. If a server answers on a public interface with an open config path, that's the difference between the 7,000 number and the rest. Bind to loopback or move it behind authenticated ingress, then confirm with ss -ltnp that nothing MCP-related is listening on 0.0.0.0. Do this even before the allowlist, because an open config endpoint is unauthenticated RCE and the allowlist is defense in depth on top of it.
Enforce the allowlist LiteLLM already validated. Restrict stdio commands to exactly npx, uvx, python, python3, node, docker, deno at request-parse time, reject everything else, and re-validate servers loaded from old DB records so a pre-existing bad row can't slip through on reload. If you run LiteLLM directly, the trigger is binary: anything below v1.83.7-stable gets upgraded today. The reference fix shipped in v1.83.6-nightly on April 15, 2026 and stabilized in v1.83.7-stable. Copy it; don't reinvent it.
Lock config and preview endpoints to an admin role. Any path that creates or test-runs a stdio server should require a PROXY_ADMIN-equivalent role, never a general user token. LiteLLM put its preview test endpoints behind PROXY_ADMIN only, so match that scoping.
Patch the named downstream CVEs by inventory, not by vibe. If your bill of materials includes Flowise, Windsurf, Bisheng, or MCP Inspector, map each to CVE-2026-40933, CVE-2026-30615, CVE-2026-33224, and CVE-2025-49596, and confirm the fixed version is actually deployed, not just released.
Make subprocess spawns observable. Log every MCP tool call with the requesting identity, the command requested, and the result, and feed it to the monitoring you already watch. If your stack emits no distinct event when a stdio subprocess spawns, admit it and baseline the config file with a hash instead, so an unexpected change to a command value becomes the tamper signal you alert on.

The uncomfortable summary: the people who could fix this at the protocol layer have decided they won't. So the SDK will keep doing the dangerous thing on purpose, in four languages, across 150 million downloads, and the only allowlist that's going to save your host is the one you write yourself. Write it this week.