This document describes the high-level architecture of the StatefulSet Resize Controller. It should give you a good impression on how it works and where to change things.
On a very high-level, the StatefulSet Resize Controller watches changes on StatefulSets and reacts to them. This is the general way Kubernetes Operators work and we call this reacting to resource changes the reconcile loop. If you are not familiar with Kubernetes Operators it might be helpful to read up on them before proceeding.
The controller will notice StatefulSets that should be resized, make a copy of the current volumes by creating new volumes with the requested size, delete the previous volume and then restore the original data from the previous volumes to the new volumes. It does this by only interacting with the Kubernetes API and has no access to other systems.
statefulset-resize-controller ├── controllers # Core controller package - only place that interacts with k8s │ ├── controller.go # Entry point of reconcile loop │ ├── statefulset.go # Fetch and update Sts, initiate resize │ ├── pvc.go # Fetch relevant PVCs │ ├── backup.go # Create backup PVC and job │ ├── restore.go # Recreate PVC and restore data │ └── copy.go # Handle job creation ├── statefulset # Wrapper handling modification of k8s StatefulSet resources ├── pvc # Wrapper handling modification of k8s PVC resources ├── cmd # Entry point - main.go └── naming # General helper package for naming
vshn.net
There are five distinct steps that are part of a successful resize. Each of these steps is abstracted as an idempotent function that will return whether it completed or has completed successfully before.
All information on the state of the resize is stored as annotations of the StatefulSet.
The controller is notified of any StatefulSet change. This means as the very first step we detect whether the StatefulSet needs to be resized.
This first includes fetching the StatefulSet, and checking whether we are already in the process of resizing. It then finds all PVCs that are smaller then the PVC template of the StatefulSet. If there are any, they are stored on the StatefulSet and we proceed with the resizing.
Fetching the StatefulSet is handled in controllers/statefulset.go
.
Finding PVCs is handled in controller/pvc.go
.
Before any volume resizing can happen we scale down the StatefulSet to avoid data corruption. The original number replicas is stored as an annotation.
This is handled in controllers/statefulset.go
and statefulset/
.
As soon as the StatefulSet has scaled down, the controller initiates a backup of the to be resized PVCs.
This means for each PVC it will create a new PVC with the same size as the original and it will start a job that mounts both PVCs and will rsync
the data to the backup.
This is handled in controllers/backup.go
and controllers/copy.go
When the backup job completed successfully, the original PVC will be deleted and recreated with the new size. A new job will be started to restore the backed up data to the new, larger, PVC.
This is handled in controllers/restore.go
and controllers/copy.go
Most errors, like failing to connect to the Kubernetes API, will be treated as a transient error and the controller will retry the operation.
There are a few ways the resizing can potentially fail, that cannot be recovered.
The most relevant is when a copy job fails to complete.
In this case a CriticalError
will be returned.
On a CriticalError
the StatefulSet will be marked as failed and a critical Kubernetes event will be logged.
Depending on the type of error, the StatefulSet will scale back up to its original replicas.
Failed StatefulSet will not be resized again and depend on human intervention.