Problem 2: The dying pod
The second problem is a bit less obvious and practically concerns your pods
receiving traffic from Kubernetes while they are shutting down. Slow clap.
When a pod is shutting down it must be removed from several places:
kubelet
must shut down the pod
kube-proxy
on all nodes in the cluster must remove the pod’s IP address
from iptables
- The pod must be removed from the
endpoints
of the service it is part of
It was with mild shock that I realized Kubernetes does not even try to
orchestrate this in any way other than doing everything in parallel. It is,
after all, a distributed system! Thus, there is a high chance that a service
gets told to remove your pod from its endpoints after the pod has started
shutting down, or that traffic reaches the pod’s IP after shutdown. Hooray.
This article
goes more in depth on why this is the case, without fully convincing me that it
must be this way, but it is how it is, so what can we do?
The hack suggested in that article is to ensure the pod’s shutdown takes enough
time so that it almost certainly does not shut down before it has been removed
from iptables
and endpoints
. How you do this depends somewhat on what runs
in your pod, but a simple and YOLO approach is to run sleep
as a preStop
:
lifecycle:
preStop:
exec:
command: ["/bin/bash", "-c", "sleep 10"]
You want this one under each element under containers
.