This is the list I actually run new infrastructure and agent work against in 2026. It is not exhaustive and it is not a maturity model. It is the short set of things that, the times I skipped them, came back to bite someone. Each section links to the longer piece where I work through the reasoning, so you can argue with it.

Work top to bottom. If an item already holds in your stack, move on. If you cannot answer it with a specific yes, that is your next ticket.

AI agents are logins. Inventory them first.

Most companies that deployed agents already have an incident story, and the cause is almost never an exotic attack. It is standing, privileged, never-expiring access owned by software nobody is watching.

  • Treat every agent as a privileged service account. Write down what each one holds, what it can reach, and cut it to least privilege.
  • Put credential rotation on a schedule and enforce it. A key that never expires is the whole problem in one line.
  • Make tamper-evident, structured logging a precondition for production, not a follow-up. If an agent cannot emit an audit trail, it does not go near anything compliance touches.
  • Centralize credential issuance and refuse to honor anything minted outside the approved pipeline. That is how you reach the shadow fleet you will never catch at the network edge.

Deep dive: The AI agent you deployed last quarter is probably your weakest login

Treat MCP servers as third-party dependencies, not infrastructure

  • Pin MCP server versions and watch them for supply-chain anomalies the way you watch any package.
  • Scan tool descriptions for injection payloads before you wire a server into an agent. A poisoned description is a poisoned instruction.

Deep dive: Harden MCP servers against tool poisoning

Run untrusted, AI-written code in a real boundary

A standard container shares the host kernel, so it is not a security boundary for code an LLM wrote thirty seconds ago and no human read.

  • Give untrusted multi-tenant code its own kernel with a microVM (Firecracker, or gVisor for the middle tier), not a hardened container.
  • Block outbound network by default. Isolation without egress control is an exfiltration channel waiting for a prompt injection to find it.
  • Cap CPU, memory, and process count so a runaway loop starves itself instead of the host.
  • Guarantee teardown when the task ends. "Usually destroyed" is not destroyed.

Deep dive: Sandboxing AI agent code: a 2026 runtime guide

Turn on dependency cooldowns this week

In one May 2026 incident an attacker pushed 84 malicious package versions in about six minutes. The defense is almost free: do not install brand-new versions for a few days.

  • Set a 7-day cooldown as the default (npm min-release-age, pnpm minimumReleaseAge, Yarn npmMinimalAgeGate, Bundler 4.0.13+ cooldown:). Dependabot honors it natively.
  • Use a longer window on major versions and a shorter one on patches, controlled by semver type.
  • Write the override before you need it, so a cooldown can never become your reason for shipping an unpatched CVE.
  • Treat .vscode/, .cursor/, and .claude/ config files as executable surfaces and review changes to them like a shell script. The newest attacks aim there.

Deep dive: Dependency cooldowns beat fast supply chain attacks

Start the post-quantum migration on the calendar, not the physics

On September 21, 2026 NIST moves remaining FIPS 140-2 modules to Historical status, and CNSA 2.0 requires quantum-safe acquisitions for new national security systems from January 1, 2027. Harvest-now-decrypt-later means long-lived data is already exposed today.

  • Build a cryptographic bill of materials first. You cannot migrate what you cannot see.
  • Sort the migration by data lifespan and put anything with a 10-plus year confidentiality requirement at the front of the line.
  • Make post-quantum readiness and FIPS 140-3 validation a written procurement requirement now, because the dates will land in your supplier contracts either way.
  • Deploy hybrid TLS (X25519 with ML-KEM) on internet-facing paths and measure the handshake cost in production rather than guessing.

Deep dive: RSA is on a clock now: the 2026 deadlines forcing the switch

If you expose tools to agents, gate them like public APIs

  • Treat every WebMCP tool you declare as a public API endpoint. Authenticate it, authorize it, and rate-limit it. An agent calling your tool on behalf of a stranger is a stranger.

Deep dive: WebMCP hands your site's tools to AI agents

Plan capacity around power availability

  • Treat power availability as a first-class variable next to compute cost and latency when you choose regions. A region's headline investment tells you nothing if its grid is full.
  • Push for capacity-availability guarantees in new contracts. Uptime SLAs say nothing about whether the capacity you are promised can actually connect.

Deep dive: The AI buildout has a plug problem, not a chip problem

---

That is the list. If a section here saves you a bad afternoon, that is the whole point. I send a short field note when there is a new one worth your time, and nothing else.