Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement ShutdownJobRule with golang for RayJob #2454

Open
Tracked by #2310
MortalHappiness opened this issue Oct 18, 2024 · 0 comments
Open
Tracked by #2310

Implement ShutdownJobRule with golang for RayJob #2454

MortalHappiness opened this issue Oct 18, 2024 · 0 comments
Assignees
Labels
ci good first issue Good for newcomers

Comments

@MortalHappiness
Copy link
Member

MortalHappiness commented Oct 18, 2024

This is a subtask of #2310. See the parent issue for more information.

Implementation Details

  • Implement the ShutdownJobRule here with Golang.
    class ShutdownJobRule(Rule):
    """Check the Ray cluster is shutdown when setting `spec.shutdownAfterJobFinishes` to true."""
    def assert_rule(self, custom_resource=None, cr_namespace='default'):
    custom_api = K8S_CLUSTER_MANAGER.k8s_client_dict[CONST.K8S_CR_CLIENT_KEY]
    # Wait for there to be no RayClusters
    logger.info("Waiting for RayCluster to be deleted...")
    for i in range(30):
    rayclusters = custom_api.list_namespaced_custom_object(
    group = 'ray.io', version = 'v1', namespace = cr_namespace,
    plural = 'rayclusters')
    # print debug log
    if i != 0:
    logger.info("ShutdownJobRule wait() hasn't converged yet.")
    logger.info("Number of RayClusters: %d", len(rayclusters["items"]))
    if len(rayclusters["items"]) == 0:
    break
    time.sleep(1)
    else:
    raise TimeoutError("RayCluster hasn't been deleted in 30 seconds.")
    logger.info("RayCluster has been deleted.")
    def trigger_condition(self, custom_resource=None) -> bool:
    # Trigger if shutdownAfterJobFinishes is set to true
    steps = "spec.shutdownAfterJobFinishes".split('.')
    value = search_path(custom_resource, steps)
    logger.info("ShutdownJobRule trigger_condition(): %s", value)
    assert isinstance(value, bool) or value is None
    return value is not None and value
  • Needs to wait for Check RayJob ray job submit can successfully create a job by running it in the head Pod in the corresponding RayCluster #2452
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

1 participant