The first time I tried to enforce NetworkPolicy in a real cluster, I broke DNS for an entire namespace and spent the next forty minutes wondering why every pod was returning i/o timeout. This post is the guide I wish I had read first.

The mental model

A NetworkPolicy is a label-selector-driven firewall rule. It only does two things:

  1. Selects pods by label (in a single namespace).
  2. Specifies allowed ingress, egress, or both for those pods.

Three rules that took me too long to internalize:

  • Policies are additive within a direction. If pod X is selected by two policies, the allowed ingress is the union.
  • Policies are subtractive across directions. Once any policy with policyTypes: [Ingress] selects a pod, all other ingress is denied. Same for egress.
  • Policies are namespace-scoped. A policy in namespace app does not affect pods in namespace db. Cross-namespace rules use namespaceSelector.

The implication is important: there is no global “default-deny” switch. You build default-deny by writing a policy that selects every pod and allows nothing.

Default-deny: the starting point

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-all
  namespace: app
spec:
  podSelector: {}
  policyTypes:
    - Ingress
    - Egress

podSelector: {} matches every pod in the namespace. policyTypes lists both directions but with no ingress or egress rules below — meaning “select all pods, allow nothing.” This is the foundation everything else gets layered on top of.

The trap: this immediately breaks DNS, because every pod’s egress to kube-dns is now denied. You will not realize this until something tries to resolve a hostname.

Allow DNS — every time

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-dns
  namespace: app
spec:
  podSelector: {}
  policyTypes:
    - Egress
  egress:
    - to:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: kube-system
          podSelector:
            matchLabels:
              k8s-app: kube-dns
      ports:
        - protocol: UDP
          port: 53
        - protocol: TCP
          port: 53

A few things to notice. The to block uses both namespaceSelector and podSelector under a single list item — that’s an AND. If you put them in two separate list items, it becomes an OR, which is almost never what you want. This is the single most common NetworkPolicy bug I see in PRs.

The kubernetes.io/metadata.name label is automatic on every namespace from Kubernetes 1.22 onward. Before that you had to label namespaces yourself.

Allow ingress from a specific service

Suppose your api deployment should only accept connections from the frontend deployment in the same namespace:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: api-allow-frontend
  namespace: app
spec:
  podSelector:
    matchLabels:
      app: api
  policyTypes:
    - Ingress
  ingress:
    - from:
        - podSelector:
            matchLabels:
              app: frontend
      ports:
        - protocol: TCP
          port: 8080

If frontend lived in another namespace, you’d add a namespaceSelector next to the podSelector (still under the same list item — same AND rule).

Egress to external services

This is where it gets awkward. NetworkPolicy egress can target IP CIDRs, but DNS still resolves to whatever IP the upstream wants today. Two patterns work in practice:

  • Allow egress to a known CIDR. Works for cloud provider services with documented ranges (S3, RDS, etc.).
  • Run an egress proxy. All outbound traffic to the internet goes through a known set of pods, and the policy only needs to allow egress to those pods. The proxy handles the dynamic-DNS problem.

Don’t try to allow-list outbound DNS names directly in NetworkPolicy. Some CNI plugins (Cilium) support FQDN-based policies, but that’s a CNI-specific extension, not core Kubernetes.

Debugging when it doesn’t work

When traffic is being dropped and you don’t know why, the order I check things in:

  1. Is the CNI actually enforcing NetworkPolicy? Calico, Cilium, and AWS VPC CNI with policy enforcement enabled all do. Some older or simpler CNIs do not.
  2. Run kubectl describe networkpolicy on the target namespace. Look at which pods are selected and which rules apply.
  3. From inside the source pod, try nc -zv <target> <port>. A timeout strongly suggests NetworkPolicy. Connection refused suggests the service is wrong, not the policy.
  4. Temporarily add a wide-open allow-all policy and see if the problem clears. If it does, the issue is policy. If it doesn’t, look at Service, Endpoints, and CNI logs.

The lesson

NetworkPolicy is one of those features that’s easy to demo and unforgiving in production. The upside is that once you have a clean default-deny posture in every namespace, lateral movement risk drops dramatically and reviewing access becomes a matter of reading a few YAML files.

Start with default-deny in a non-critical namespace. Add the DNS allowlist immediately. Layer in service-specific rules. Don’t try to retrofit it across an entire cluster in one PR.