Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

staging.docs.ci.ocaml.org is unreachable #67

Open
mtelvers opened this issue Aug 24, 2023 · 10 comments
Open

staging.docs.ci.ocaml.org is unreachable #67

mtelvers opened this issue Aug 24, 2023 · 10 comments
Assignees

Comments

@mtelvers
Copy link
Collaborator

staging.docs.ci.ocaml.org is unreachable over SSH and HTTPS. Please can it be rebooted?

@avsm
Copy link
Member

avsm commented Aug 29, 2023

Rebooted; nothing on the console, but I suspect OOM killer. We do actually need to save a non-ssh-key login to these machines to access the dmesg (or have a log helper that shuttles the logs out regularly, but this doesn't help debug OOM-killer related failures)

@tmcgilchrist
Copy link
Collaborator

Thanks @avsm I am working on restoring staging.docs.ci.ocaml.org.

@tmcgilchrist
Copy link
Collaborator

@avsm Can you check on staging.docs.ci.ocaml.org again? I set it up running ocaml-docs-ci from a clean slate but now it's again unreachable over SSH and HTTPS.

@tmcgilchrist
Copy link
Collaborator

Rebooted; nothing on the console, but I suspect OOM killer.

Last I saw on my console ocaml-docs-ci was using approx 10Gb RAM and was stable on that amount. Nothing else was using large amounts of RAM and only the local solver instances would be using significant CPU while it was resolving.

@tmcgilchrist
Copy link
Collaborator

Seems to be back this morning @avsm

@tmcgilchrist
Copy link
Collaborator

And now it's unreachable again @avsm curious to see if there are network errors or unexpected shutdowns on that machine.

@avsm
Copy link
Member

avsm commented Sep 7, 2023

Rebooted. No indications of anything untoward on the console...

@rikusilvola
Copy link
Contributor

The missing dashboards on Grafana have been fixed but for some reason, the only data we have for staging is a blip in July.
This needs further investigation but, unsurprisingly the server is once more unreachable.
While waiting for the restart of this staging server, we'll see if the issue can be reproduced on a VM.

@tmcgilchrist
Copy link
Collaborator

@avsm This machine is un-available again, really confused what is happening with it.

@mtelvers has setup an alternative instance on https://staging.docs.ci.ocamllabs.io that we have switched over to using. Additionally we have a working docker-compose setup for the entire ocaml-docs-ci pipeline. So I think we can remove this machine for now.

@tmcgilchrist
Copy link
Collaborator

Ping @avsm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants