-
Notifications
You must be signed in to change notification settings - Fork 208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Alleviate race conditions in roll restart reconciler #694
base: main
Are you sure you want to change the base?
Conversation
9fdb5e9
to
e16fde6
Compare
Hey @geobeau thanks for your contribution, we can ship this change soon, is it possible to add some unit tests to this PR? |
e16fde6
to
ebe3a8d
Compare
Refactor the code to seperate each steps clearly. In the first loop there was 2 operations: - Check if there is a pending update - Check if all pods are ready However, we can break in some conditions and skip the health checking. To simplify the maintability of the scope, it is now split in 2 separate loops: - first check if restarts are pending - then check that all pools are healthy Signed-off-by: Geoffrey Beausire <g.beausire@criteo.com>
Refactor the order or restart to simplify the reconcile function and make it more deterministic. Before we could maybe restart stuck pods then perform roll restart. Now, stuck pods are always restarted first. Also, stuck pods are restarted before checking the number of ready replicas, because if we check before we will never reach this point. Finally, we don't process a rolling restart if any pods stuck was deleted, to avoid performing any dangerous actions. Signed-off-by: Geoffrey Beausire <g.beausire@criteo.com>
When a Roll Restart is triggered with at least 2 pools, a race condition can trigger the roll restart of a pods in each pools. This can lead to a red cluster. Normally to prevent this from happening, there are 3 checks: - check the status.ReadyReplicas of all sts before moving forward. - for each nodePool, check that that all replicas are ready by listing pods directly - before deleting a pod, a check is made on the OpenSearch to see if the cluster is healthy In practice, it is not enough. Considering the rollRestart of 2 nodePool: - data - masterData The following sequence can happen: - a rollRestart is triggered - reconcile function is called - data and masterData have all their pods ready - data pod is deleted; pods is terminating (NOT terminated yet) - reconcile function is recalled - data and masterData have all their pods ready from the status.ReadyReplicas point of view (because it didn't see the change yet) - data is seen as unhealthy thanks to CountRunningPodsForNodePool - masterData is seen as healthy because all its pods are ready - Opensearch is still healthy, because the deleted pod is not terminated yet - A pod in masterData is restarted - Cluster is red! This commit make sure we check readiness of all nodePool using CountRunningPodsForNodePool before trying to restart any pools. Signed-off-by: Geoffrey Beausire <g.beausire@criteo.com>
ebe3a8d
to
2543804
Compare
@salyh hello, I rebased and added some more unittests |
Thanks @geobeau let me take a look at this PR. |
Adding @salyh @swoehrl-mw @jochenkressin @pchmielnik to please check this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logic looks good, just some minor gripes about naming and stuff.
// Check if there is any crashed pod. Delete it if there is any update in sts. | ||
any_restarted_pod := false | ||
for _, sts := range statefulSets { | ||
restared_pod, err := helpers.DeleteStuckPodWithOlderRevision(r.client, &sts) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo
restared_pod, err := helpers.DeleteStuckPodWithOlderRevision(r.client, &sts) | |
restarted_pod, err := helpers.DeleteStuckPodWithOlderRevision(r.client, &sts) |
@@ -118,6 +116,47 @@ func (r *RollingRestartReconciler) Reconcile() (ctrl.Result, error) { | |||
return ctrl.Result{}, nil | |||
} | |||
|
|||
// Check if there is any crashed pod. Delete it if there is any update in sts. | |||
any_restarted_pod := false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use camelCase for any variables
return ctrl.Result{}, err | ||
} | ||
if sts.Status.ReadyReplicas != pointer.Int32Deref(sts.Spec.Replicas, 1) { | ||
return ctrl.Result{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a log line (can be debug) so it is visible the operator is waiting on pods being ready
) | ||
|
||
func TestHelpers(t *testing.T) { | ||
RegisterFailHandler(Fail) | ||
RunSpecs(t, "Helpers Suite") | ||
} | ||
|
||
var _ = Describe("Helpers", func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not the correct file for the tests, they should go in a file helpers_test.go
, the helpers_suite_test.go
only contains the init/start code for the test of the entire package.
@geobeau any chance you would pick up this PR again? Otherwise I would offer to fix the remaining remarks |
Description
When a Roll Restart is triggered with at least 2 pools, a race condition can trigger the roll restart of a pods in each pools. This can lead to a red cluster.
Normally to prevent this from happening, there are 3 checks:
In practice, it is not enough.
Considering the rollRestart of 2 nodePool:
The following sequence can happen:
Additionally I added 2 commits to refactor the Reconcile function to make intent of each bloc more obvious.
Issues Resolved
Probably #650
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.