The first time I tried to enforce NetworkPolicy in a real cluster, I broke DNS for an entire namespace and spent the next forty minutes wondering why every pod was returning i/o timeout. This post is the guide I wish I had read first.
The mental model
A NetworkPolicy is a label-selector-driven firewall rule. It only does two things:
- Selects pods by label (in a single namespace).
- Specifies allowed ingress, egress, or both for those pods.
Three rules that took me too long to internalize:
- Policies are additive within a direction. If pod X is selected by two policies, the allowed ingress is the union.
- Policies are subtractive across directions. Once any policy with
policyTypes: [Ingress]selects a pod, all other ingress is denied. Same for egress. - Policies are namespace-scoped. A policy in namespace
appdoes not affect pods in namespacedb. Cross-namespace rules usenamespaceSelector.
The implication is important: there is no global “default-deny” switch. You build default-deny by writing a policy that selects every pod and allows nothing.
Default-deny: the starting point
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-all
namespace: app
spec:
podSelector: {}
policyTypes:
- Ingress
- Egress
podSelector: {} matches every pod in the namespace. policyTypes lists both directions but with no ingress or egress rules below — meaning “select all pods, allow nothing.” This is the foundation everything else gets layered on top of.
The trap: this immediately breaks DNS, because every pod’s egress to kube-dns is now denied. You will not realize this until something tries to resolve a hostname.
Allow DNS — every time
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-dns
namespace: app
spec:
podSelector: {}
policyTypes:
- Egress
egress:
- to:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: kube-system
podSelector:
matchLabels:
k8s-app: kube-dns
ports:
- protocol: UDP
port: 53
- protocol: TCP
port: 53
A few things to notice. The to block uses both namespaceSelector and podSelector under a single list item — that’s an AND. If you put them in two separate list items, it becomes an OR, which is almost never what you want. This is the single most common NetworkPolicy bug I see in PRs.
The kubernetes.io/metadata.name label is automatic on every namespace from Kubernetes 1.22 onward. Before that you had to label namespaces yourself.
Allow ingress from a specific service
Suppose your api deployment should only accept connections from the frontend deployment in the same namespace:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: api-allow-frontend
namespace: app
spec:
podSelector:
matchLabels:
app: api
policyTypes:
- Ingress
ingress:
- from:
- podSelector:
matchLabels:
app: frontend
ports:
- protocol: TCP
port: 8080
If frontend lived in another namespace, you’d add a namespaceSelector next to the podSelector (still under the same list item — same AND rule).
Egress to external services
This is where it gets awkward. NetworkPolicy egress can target IP CIDRs, but DNS still resolves to whatever IP the upstream wants today. Two patterns work in practice:
- Allow egress to a known CIDR. Works for cloud provider services with documented ranges (S3, RDS, etc.).
- Run an egress proxy. All outbound traffic to the internet goes through a known set of pods, and the policy only needs to allow egress to those pods. The proxy handles the dynamic-DNS problem.
Don’t try to allow-list outbound DNS names directly in NetworkPolicy. Some CNI plugins (Cilium) support FQDN-based policies, but that’s a CNI-specific extension, not core Kubernetes.
Debugging when it doesn’t work
When traffic is being dropped and you don’t know why, the order I check things in:
- Is the CNI actually enforcing NetworkPolicy? Calico, Cilium, and AWS VPC CNI with policy enforcement enabled all do. Some older or simpler CNIs do not.
- Run
kubectl describe networkpolicyon the target namespace. Look at which pods are selected and which rules apply. - From inside the source pod, try
nc -zv <target> <port>. A timeout strongly suggests NetworkPolicy. Connection refused suggests the service is wrong, not the policy. - Temporarily add a wide-open
allow-allpolicy and see if the problem clears. If it does, the issue is policy. If it doesn’t, look at Service, Endpoints, and CNI logs.
The lesson
NetworkPolicy is one of those features that’s easy to demo and unforgiving in production. The upside is that once you have a clean default-deny posture in every namespace, lateral movement risk drops dramatically and reviewing access becomes a matter of reading a few YAML files.
Start with default-deny in a non-critical namespace. Add the DNS allowlist immediately. Layer in service-specific rules. Don’t try to retrofit it across an entire cluster in one PR.