Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Die quicker, controller #265

Merged
merged 1 commit into from
Oct 17, 2024
Merged

Conversation

plasorak
Copy link
Collaborator

This PR corrects a typo in the data_type for the RunControlMessage.
More importantly, it changes the behaviour if we can't reach the connectivity server on retract: if that's the case (meaning the connectivity server has probably been killed before the controller), we abort.

Fixes #204, again

…mportantly if we cant reach the connectivty server on retract, abort
Copy link
Member

@eflumerf eflumerf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minimal_system_quick_test still works

@bieryAtFnal
Copy link
Contributor

I'm not sure if I understand what I should see with these changes...

Without them, I see drunc-controller processes hang around for about 30 seconds after drunc exits when I run an interactive DAQ session on daq.fnal.gov.

WIth them, the drunc-controller processes hang around for less than 10 seconds, but they are still there when drunc exits.

Am I looking at the wrong thing?
If not, shouldn't success be indicated by no drunc-controller processes running when the drunc-interactive-shell exits?

@plasorak
Copy link
Collaborator Author

You are not, on the np04 cluster, they still exist for around 2 seconds.

I don't think it's trivial to add a check to make sure there is not process when drunc exits, the process manager sends sighup to the processes when it exits, but it does not track their PID.

@bieryAtFnal
Copy link
Contributor

Thanks for the update.

I think that it's very important to not leave processes hanging around when run control exits. Should I file a separate Issue for that?

@plasorak
Copy link
Collaborator Author

Yeah, I think so, this is quite a bit more complicated than what I envisaged.

@plasorak plasorak merged commit 8c70ed7 into develop Oct 17, 2024
1 check passed
@plasorak plasorak deleted the plasorak/faster-controller-stop branch October 17, 2024 14:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

drunc-controller process sometimes does not get cleaned up
3 participants