I have yet to clean up a breached cluster that fell to a clever zero-day. The pattern is always duller: a service account auto-mounted a token nobody scoped, a namespace shipped with no network policy, and a cluster-admin binding granted years ago for "just this one migration" was never pulled. Kubernetes defaults are tuned for compatibility, so hardening is work you do after install, not something the platform hands you. Below is the checklist I actually run against production clusters in 2026, in roughly the order that buys the most safety per hour. Two things shifted this year: RBAC-driven takeovers tied to bugs like IngressNightmare, and a June 1, 2026 correction from the Kubernetes Security Response Committee that reclassified several CVEs as permanently unfixed, which means your scanners are about to start shouting about them.

The tips

  1. Enforce Pod Security Standards at restricted, but get there through warn first. PodSecurityPolicy is gone; Pod Security Admission (built in since 1.25) replaces it and works per namespace via labels. If you jump straight to enforce: restricted, you will reject running workloads and get paged at 3am. Apply warn and audit first, read what they flag, fix the offending pods, then flip enforce on.
   # Observe without breaking anything
   kubectl label ns payments \
     pod-security.kubernetes.io/warn=restricted \
     pod-security.kubernetes.io/audit=restricted
   # Once the warnings are clean:
   kubectl label ns payments pod-security.kubernetes.io/enforce=restricted --overwrite
  1. Hunt down every cluster-admin binding and justify it out loud. More than half of the production clusters I have assessed carry at least one RBAC misconfiguration that lets a compromised pod climb to cluster-admin. The fastest win is listing every wildcard and every cluster-admin grant, then asking, per binding, whether that human or workload still needs it.
   kubectl get clusterrolebindings -o json | jq -r \
     '.items[] | select(.roleRef.name=="cluster-admin") | .metadata.name'

Watch the escalate, bind, and impersonate verbs especially, plus create on pods. Any one of them is a quiet path to full takeover, and they rarely look dangerous in a code review.

  1. Turn off auto-mounted service account tokens. Roughly 87% of clusters in audits still mount a token into every pod running under the namespace default service account. That token is a finished credential sitting in the filesystem, waiting for anyone who lands a shell. Disable it at the service account level and opt workloads back in only when they genuinely call the API.
   apiVersion: v1
   kind: ServiceAccount
   metadata:
     name: default
     namespace: payments
   automountServiceAccountToken: false
  1. Give every namespace a default-deny NetworkPolicy on the day it is created. With no policy in place, every pod can reach every other pod across namespaces, flat and open, and most teams never see it because nothing is logging the traffic. Lay down a deny-all baseline, then add explicit allow rules on top. Bake it into your namespace template so no namespace ever ships without one.
   apiVersion: networking.k8s.io/v1
   kind: NetworkPolicy
   metadata: { name: default-deny-all, namespace: payments }
   spec:
     podSelector: {}
     policyTypes: ["Ingress", "Egress"]

One gotcha that has bitten me: confirm your CNI actually enforces policy. Calico and Cilium do, but some managed clusters hand you a NetworkPolicy API that silently does nothing because enforcement was never switched on.

  1. Treat creating or editing Ingress objects as a privileged operation. IngressNightmare (CVE-2025-1974, CVSS 9.8) let attackers inject arbitrary NGINX config through the ingress-nginx admission controller and read every secret in that controller's service account scope. Patching the controller matters, but the durable fix is RBAC: stop handing Ingress create and update rights to every developer namespace by default. Check who actually holds it.
   kubectl auth can-i create ingress --as=system:serviceaccount:dev:builder -n dev
  1. Mitigate the CVEs that will never get a patch. On June 1, 2026 the SRC corrected the records for CVE-2020-8561, CVE-2020-8562, and CVE-2021-25740 to show no fixed version. They are architectural trade-offs, not coding bugs, so scanners that used to stay quiet will now alert and you cannot version-bump your way out. The answer is configuration:
  • CVE-2020-8561 (webhook redirect): run the API server with --profiling=false and keep audit log verbosity under level 10.
  • CVE-2021-25740 (cross-namespace endpoint forwarding): restrict write access to Endpoints and EndpointSlice objects via RBAC.
  • CVE-2020-8562 (DNS TOCTOU): run a local DNS cache with an enforced min-cache-ttl.

Write these up as accepted-with-mitigation in your risk register. The point is so the next engineer doesn't burn a day chasing a patch that does not exist.

  1. Spell out non-root, read-only root filesystem, and dropped capabilities in every securityContext. The restricted PSS profile already requires these, but admission labels get loosened during incidents and stay loosened. Putting the block directly in your manifests means the protection survives a sloppy label change. This single stanza closes most container-escape and privilege-escalation primitives before they start.
   securityContext:
     runAsNonRoot: true
     readOnlyRootFilesystem: true
     allowPrivilegeEscalation: false
     capabilities: { drop: ["ALL"] }
     seccompProfile: { type: RuntimeDefault }
  1. Run kube-bench and treat the CIS Benchmark as a backlog, not a one-time report. The CIS Kubernetes Benchmark has 200+ checks and you will not clear them in a sprint. Run kube-bench against the control plane and worker nodes, then triage by blast radius: RBAC, Pod Security, network policy, etcd encryption, and audit logging come first; cosmetic file-permission findings can wait.
   kubectl run kube-bench --rm -it --image=aquasec/kube-bench:latest \
     --restart=Never -- run --targets node
  1. Encrypt etcd secrets at rest, then prove it actually happened. By default, Secrets are base64-encoded in etcd, which is encoding, not encryption, and anyone who reads the datastore reads your passwords. Enable an EncryptionConfiguration with a KMS provider, or at minimum aescbc, then verify by pulling the raw etcd value and confirming it is ciphertext rather than your credentials in plain sight.
   ETCDCTL_API=3 etcdctl get /registry/secrets/payments/db-creds | hexdump -C | head
  1. Deploy a runtime scanner and route its findings somewhere a human reads weekly. Static hardening starts drifting the moment developers ship the next change. Trivy Operator (or Kubescape) scans workloads, RBAC, and images in-cluster continuously and exposes the results as CRDs you can alert on. The install is the easy part. The discipline is someone reviewing the vulnerabilityreports and rbacassessmentreports on a real cadence instead of letting them pile up.
    kubectl get vulnerabilityreports -A
    kubectl get configauditreports -A --sort-by=.report.summary.criticalCount

Wrap-up

If you change one habit, make it this: push hardening into namespace creation instead of into a quarterly audit. A namespace template that ships a default-deny NetworkPolicy, automountServiceAccountToken: false, a scoped developer RoleBinding, and the restricted Pod Security label means every new team starts from a safe floor rather than an open one. Hardening that relies on people remembering decays between reorgs; hardening that is the default holds. Get the floor right and every other item on this list becomes far easier to enforce.

Sources

  • https://kubernetes.io/docs/concepts/security/pod-security-standards/
  • https://kubernetes.io/docs/concepts/security/rbac-good-practices/
  • https://kubernetes.io/blog/2026/05/26/reconciling-unfixed-kubernetes-cves/
  • https://www.sentinelone.com/blog/ingressnightmare-critical-unauthenticated-rce-vulnerabilities-in-kubernetes-ingress-nginx/