Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Endpoints for Metrics and Probes #91

Open
Cardes opened this issue Jun 14, 2024 · 3 comments
Open

Add Endpoints for Metrics and Probes #91

Cardes opened this issue Jun 14, 2024 · 3 comments
Labels
enhancement New feature or request

Comments

@Cardes
Copy link

Cardes commented Jun 14, 2024

Use Case:

  • As an Operator i want the orchestrator to handle restarts if the container crashes, therefor i need readiness and liveness checks
  • Also as an Operator i want to fetch metrics from the proxy so i can visualize the state and trigger alerts on issues

Design Proposal:

Open Points:

  • Discuss if its feasible to add the endpoints to the existing ports or add a dedicated port for both endpoints

Based on this discussion

Edit: Added Open Points

@s-allius
Copy link
Owner

Great idea. I also find a async version of the prometheus client library.

I think I will start with the health check...

@s-allius s-allius added the enhancement New feature or request label Jun 14, 2024
@s-allius
Copy link
Owner

s-allius commented Jun 15, 2024

I implemented a first readycheck. This check looks for a proper config file and that both server (Port:5005 and 10000) are started.
If everything is fine, I return on http://172.16.30.7/-/ready a HTTP code 200 and the text Is ready. If there is a problem in the config file, I return a HTTP code 503 and the text Not ready. It is also possible that the HTTP endpoint isn't available on errors or startup.
Does this behaviour fit for k8s?


For the health check, I will evaluate the processing time of the messages. This should actually be quite simple and centralized.


At the moment I use the port 8127 for the http server. Or should we use Port 8080 to make clear that it is a http server?
My idea to use a non standard port is, that normally the user must not map to another port. Port 8080 is surely used by a lot of containers. What do you thing about that?

@Cardes
Copy link
Author

Cardes commented Jun 16, 2024

This looks good, all http codes between 200 and below 400 are considered success, so 503 will get recognized as failed.
For startup purpose its common to define an initialDelaySeconds or a failureThreshold, so no answer / timeout works fine as long as the container is starting up.
Example:

  livenessProbe:
      httpGet:
        path: /-/healthy
        port: 8127
      initialDelaySeconds: 12 #time before the first probe
      periodSeconds: 20 #time between probes
      timoutSeconds: 2 #delay until no answer is considered a timeout (usefull for high load or low compute environments)
      successThreshold: 1 #how many consecutive probes are needed to assume a healthy probe
      failureThreshold: 2 # how many consecutive failed probes are needed to trigger the unhealty state
      terminationGracePeriodSeconds: 10 #how long to wait between shutdown signal and forced stop of the container

The Port could be any number above 1024 that's not used for a common service, so 8127 works fine i think. considering that host network is often used in smaller setups and using an uncommon port helps users to prevent port duplications. In every other environment port mapping/ distinct service ips should prevent any collisions.

Thanks for your effort, let me know if i can help test anything.

Regards,
Sebastian

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants