Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance SLO CRD for Multi-Cluster Support #1291

Open
Nitesh-vaidyanath opened this issue Oct 18, 2024 · 1 comment
Open

Enhance SLO CRD for Multi-Cluster Support #1291

Nitesh-vaidyanath opened this issue Oct 18, 2024 · 1 comment

Comments

@Nitesh-vaidyanath
Copy link

Description:

We need to enhance our ServiceLevelObjective (SLO) Custom Resource Definition (CRD) to better support multi-cluster deployments with a centralized monitoring solution. The current implementation has limitations when applied across multiple Kubernetes clusters, particularly concerning namespace deletion and recording rule management.

Current Problem:

  • When integrating the SLO CRD with Helm charts and applying to multiple Kubernetes clusters, namespace and group deletion in one cluster can affect the entire group of rules in the centralized monitoring solution. There is no way we can modify or control recording namespace name.
  • There's no mechanism to append new reconciled rules from different Kubernetes clusters to the same group without overwriting existing rules for a give namespace.

Desired Solution:

  • Implement a mechanism to create a new namespace (new attribute in CRD) within the same group when reconciled different Kubernetes clusters in the centralized monitoring solution. By default we can keep namespace and group name same unless it is defined in CRD.
  • When deleting an SLO object in any given cluster, it should only delete the namespace specific rules instead of the entire group.
  • mplement a finalizer that deletes the entire group only when there's just one namespace associated with it.

Technical Considerations:

  • This solution may introduce potential race conditions as multiple operators will be working on the same group.
  • The operator should be designed to reconcile on failure, mitigating issues from concurrent operations.
  • Implement proper locking mechanisms if required or use a distributed locking service to manage concurrent access to shared resources (Optional). Interface to global distributed locking.

Acceptance Criteria:

  • SLO rules from multiple clusters can create a namespace in the same group in the centralized monitoring solution.
  • Deleting an SLO object in one cluster only removes the rules specific to the namespace which is defined in CRD.
  • A finalizer is implemented that deletes the entire group when only one namespace remains associated with it.
  • The solution is resilient to race conditions and can handle concurrent operations from multiple clusters (Out of scope). (Optional)
@Nitesh-vaidyanath
Copy link
Author

I can work on this if there are no concerns.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant