Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix rdspec and protectedpvcs condition #1605

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

BenamarMk
Copy link
Member

This PR includes critical fixes for Cephfs workloads that occasionally caused the relocation to stall forever in the WaitForReadiness progression.

Key Changes:

  1. Fix for RDSpec List Alternation
    Addressed an issue where frequent VRG resource updates caused the RDSpec list to alternate between empty and non-empty list. This inconsistency was leading to incomplete PVC restores during failover or relocation, halting the recovery process.

  2. Fix for ProtectedPVC PVsRestored Condition
    In certain edge cases, ProtectedPVCs were failing to add the PVsRestored condition permanently, which caused the relocate process to get stuck in the WaitForReadiness progression. This fix ensures the condition is consistently applied, preventing the relocation from stalling.

  3. Refactor of ManifestWork Creation Function
    The utility function that creates ManifestWork has been refactored to return the last operation result (created, updated, or none) alongside any errors. This change allows tracking of whether a ManifestWork resource was newly created, updated, or left unchanged.

Fixes Bug: 2319334

Benamar Mekhissi added 4 commits October 24, 2024 10:40
Fix an issue where the VRG resource was frequently updated, causing the RDSpec
to alternate between an empty and non-empty list. This behavior directly impacted
failover and relocation. If the list was empty during these actions, PVC restore
was skipped, leading to incomplete recovery.

Signed-off-by: Benamar Mekhissi <bmekhiss@ibm.com>
This commit modifies the utility function that creates the ManifestWork to return
an additional value indicating the last operation result alongside the error. The
result can be one of three values: created, updated, or none. This change is
needed to track whether the ManifestWork resource was newly created, updated, or
left unchanged.

Signed-off-by: Benamar Mekhissi <bmekhiss@ibm.com>
Signed-off-by: Benamar Mekhissi <bmekhiss@ibm.com>
In certain edge cases, ProtectedPVCs may fail to add the PVsRestored condition
permanently, causing the relocate process to get stuck in the WaitForReadiness
progression.

Signed-off-by: Benamar Mekhissi <bmekhiss@ibm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant