Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alleviate race conditions in roll restart reconciler #694

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Commits on Apr 8, 2024

  1. Seperate pending update check from health check during roll restart

    Refactor the code to seperate each steps clearly.
    In the first loop there was 2 operations:
    - Check if there is a pending update
    - Check if all pods are ready
    However, we can break in some conditions and skip the
    health checking.
    To simplify the maintability of the scope, it is now split in
    2 separate loops:
    - first check if restarts are pending
    - then check that all pools are healthy
    
    Signed-off-by: Geoffrey Beausire <g.beausire@criteo.com>
    geobeau committed Apr 8, 2024
    Configuration menu
    Copy the full SHA
    64ac0c2 View commit details
    Browse the repository at this point in the history
  2. Separate the rolling update loop from the pod stuck restart

    Refactor the order or restart to simplify the reconcile function
    and make it more deterministic.
    
    Before we could maybe restart stuck pods then perform roll
    restart. Now, stuck pods are always restarted first.
    Also, stuck pods are restarted before checking the number
    of ready replicas, because if we check before we will never
    reach this point.
    Finally, we don't process a rolling restart if any pods stuck
    was deleted, to avoid performing any dangerous actions.
    
    Signed-off-by: Geoffrey Beausire <g.beausire@criteo.com>
    geobeau committed Apr 8, 2024
    Configuration menu
    Copy the full SHA
    c2a3df2 View commit details
    Browse the repository at this point in the history
  3. Alleviate race conditions in roll restart reconciler

    When a Roll Restart is triggered with at least 2 pools, a race
    condition can trigger the roll restart of a pods in each pools.
    This can lead to a red cluster.
    Normally to prevent this from happening, there are 3 checks:
    - check the status.ReadyReplicas of all sts before moving forward.
    - for each nodePool, check that that all replicas are ready by listing
      pods directly
    - before deleting a pod, a check is made on the OpenSearch to see
      if the cluster is healthy
    
    In practice, it is not enough. Considering the rollRestart of 2 nodePool:
    - data
    - masterData
    
    The following sequence can happen:
    - a rollRestart is triggered
    - reconcile function is called
    - data and masterData have all their pods ready
    - data pod is deleted; pods is terminating (NOT terminated yet)
    - reconcile function is recalled
    - data and masterData have all their pods ready from the status.ReadyReplicas
      point of view (because it didn't see the change yet)
    - data is seen as unhealthy thanks to CountRunningPodsForNodePool
    - masterData is seen as healthy because all its pods are ready
    - Opensearch is still healthy, because the deleted pod is not terminated yet
    - A pod in masterData is restarted
    - Cluster is red!
    
    This commit make sure we check readiness of all nodePool using
    CountRunningPodsForNodePool before trying to restart any pools.
    
    Signed-off-by: Geoffrey Beausire <g.beausire@criteo.com>
    geobeau committed Apr 8, 2024
    Configuration menu
    Copy the full SHA
    2543804 View commit details
    Browse the repository at this point in the history