Kubernetes 1.21 is released! From the release notes there are a lot of interesting new features, such as the following:

  • Cronjobs become stable - No longer any changes in the manifest so it is always good for operators.
  • Graceful node shutdown - This is super-important when you have complex applications and a lot of nodes. During a cluster operation there are a lot of chances that you will need to reboot the nodes. In the past you will have to drain the nodes, check them, ensure that all workloads are transitioned properly, before reboot. This should in theory minimize the human operations required. I will probably do some tests on this and post later on.
  • PV Health Monitor - Again another operation friendly feature. There isn't a lot mentioned in the release notes, but let's stay tuned for tests.

There are a lot other features that goes into alpha / beta state. Read on the release notes to see what's happening.

Also, it appears that Docker 1.20 is supported by kubeadm. So you can also upgrade your docker installation although it is highly advised to move to another container runtime because there is already a depreciation timeline set.

Personal take of the upgrade process

The upgrade steps from 1.20 to 1.21 can be found here. The process basically are broken down into a few logical steps:

Control Planes

  1. Upgrade kubeadm in one of the master node.
  2. Use it to check whether auto upgrade is possible, or any human intervention is needed.
  3. Update the master node's configuration. The master node's static pods will be restarted one-by-one, including etcd, kube-apiserver, kube-scheduler.
  4. Then upgrade the remaining of the master nodes using the same method in the upgrade process.
  5. After all the master nodes' kubeadm were upgraded, do the following with each nodes one-by-one: a) drain the nodes, b) use your package manager to upgrade kubelet and kubectl, and c) uncordon the nodes.

Worker nodes

The worker nodes process is almost the same as the control planes. You upgrade kubeadm, then follow the instructions to update the configuration (which is only 1 command away), drain, upgrade and uncordon.

Results

It is a relatively straight forward process, so I've upgraded my nodes and now it looks like this:

$ k get nodes -o wide
NAME    STATUS   ROLES                  AGE   VERSION   INTERNAL-IP      EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION     CONTAINER-RUNTIME
node1   Ready    control-plane,master   40d   v1.21.0   192.168.1.12   <none>        Ubuntu 20.04.2 LTS   5.4.0-1032-raspi   docker://20.10.5
node2   Ready    <none>                 39d   v1.21.0   192.168.1.13   <none>        Ubuntu 20.04.2 LTS   5.4.0-70-generic   docker://20.10.5
node3   Ready    <none>                 39d   v1.21.0   192.168.1.14   <none>        Ubuntu 20.04.2 LTS   5.4.0-70-generic   docker://20.10.5
node4   Ready    <none>                 40d   v1.21.0   192.168.1.15   <none>        Ubuntu 20.04.2 LTS   5.4.0-1032-raspi   docker://20.10.5
node5   Ready    control-plane,master   39d   v1.21.0   192.168.1.10   <none>        Ubuntu 20.04.2 LTS   5.4.0-1032-raspi   docker://20.10.5
node6   Ready    control-plane,master   30d   v1.21.0   192.168.1.16   <none>        Ubuntu 20.04.2 LTS   5.4.0-1032-raspi   docker://20.10.5

All nodes upgraded to v1.21.0!

Notes and considerations

Upgrading the nodes are really straight forward with only a few notes.

  1. It is always hard to fully evict all the pods because of various reasons. Some pods cannot be evicted easily (like metrics-server, kubernetes-dashboard). But since those are not essential during upgrade process I have simply modify the deployment to set their replicas to zero to shutdown the pods. Most other production workloads with proper non-local persistent volumes just evicted properly.
  2. You may want to check if you want to upgrade your docker version as well.
  3. Upgrading CNI can be tricky depending on how often you maintain them. For example, I have calico CNI with vxlan configuration, so when I upgrade the CNI I have to modify the mainfest files to ensure that I do not overwrite my existing vxlan configuration with ipip configuration, which will break network and cause a long outage.
  4. Make sure you upgrade the master nodes before the worker nodes.
  5. During the process, I also upgraded metallb, which is really simple.
  6. If you always stay updated, it will make the upgrade process much simplier.
  7. Ansible can be used easily during the upgrade process of worker nodes. But I would not recommend using it to upgrade the master nodes as manual intervention may be required. Also it is not worth it because normally you will not have a lot of master nodes. A lot of k8s clusters use namespaces to distinguish production and testing environments so testing can be tricky and hard to replicate.