RKE2 configuration to best handle a node failure. #4157

David-A-Blankenship · 2023-04-28T19:24:49Z

David-A-Blankenship
Apr 28, 2023

I am looking for verification and guidance on configuring my RKE2 cluster to best handle a node failure.

I have an RKE2 cluster (v1.25.7+rke2r1) deployed via Rancher (v2.7.2) into vSphere (v7.0.3) that consists of 3 control plane, etcd nodes and 3 worker nodes. I have been testing an involuntary disruption or a non-graceful node shutdown via network partition, by removing a worker node from the network. Full disclosure, I am removing a node from the network that contains VMs for 1 control plan, etcd node and 1 worker node.

I am observing the behavior described here:
https://kubernetes.io/docs/concepts/architecture/nodes/#non-graceful-node-shutdown
“When a node is shutdown but not detected by kubelet's Node Shutdown Manager, the pods that are part of a StatefulSet will be stuck in terminating status on the shutdown node and cannot move to a new running node. This is because kubelet on the shutdown node is not available to delete the pods so the StatefulSet cannot create a new pod with the same name. If there are volumes used by the pods, the VolumeAttachments will not be deleted from the original shutdown node so the volumes used by these pods cannot be attached to a new running node. As a result, the application running on the StatefulSet cannot function properly. If the original shutdown node comes up, the pods will be deleted by kubelet and new pods will be created on a different running node. If the original shutdown node does not come up, these pods will be stuck in terminating status on the shutdown node forever.”

All the pods on that node go into the Terminating state and the pods that are part of a stateful set are not restarted. The pods that connect to a RWO volume are stuck in ContainerCreating waiting for the volume to become available. The stateless pods and pods attached to RWX and ROX volumes are restarted on another node successfully.

I enabled the NodeOutOfServiceVolumeDetach feature gate on kube-controller-manager. When manually adding the taint node.kubernetes.io/out-of-service with NoExecute, all pods in the Terminating state are removed. The pods that are part of a stateful set and the pods that are connected to a RWO volume are restarted, but get stuck in a ContainerCreating or Init state. Describing these pods shows “Unable to attach or mount volumes”. vCenter shows attempts to “Detach a virtual disk” that fail with “Unable to communicate with the remote host”, so this looks like a vSAN problem. When I bring the node back online, everything clears up quickly. The NodeOutOfServiceVolumeDetach feature gate is still in Alpha for 1.25.

From an RKE2 perspective, it looks like everything is working as advertised. I am trying to set up an available application. I know I need to architect my application to be redundant within the RKE2 cluster. Are there other RKE2 configuration settings that I can use to help RKE2 with handling failed nodes better, easier, or without manual intervention?

frank-at-suse · 2023-04-29T02:57:22Z

frank-at-suse
Apr 29, 2023

Node auto replacement and vSphere CPI (if you're not running those already).

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RKE2 configuration to best handle a node failure. #4157

{{title}}

Replies: 1 comment

{{title}}

Select a reply

RKE2 configuration to best handle a node failure. #4157

David-A-Blankenship Apr 28, 2023

Replies: 1 comment

frank-at-suse Apr 29, 2023

David-A-Blankenship
Apr 28, 2023

frank-at-suse
Apr 29, 2023