Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Add signals to expose internal Waitress events #388

Closed
wants to merge 1 commit into from

Conversation

NicolasLM
Copy link

In order to expose how loaded waitress is, this commit adds a set of signals. They act as an API allowing external code to react to events like the creation of a channel or the start of execution of a task.

This allows third-parties to develop extensions to track the load of Waitress via their favorite monitoring solutions.

Support for signals depends on the availability of the "blinker" library. When it is not available, signals act as a no-op and cannot be subscribed to.

In order to expose how loaded waitress is, this commit adds a set of
signals. They act as an API allowing external code to react to events
like the creation of a channel or the start of execution of a task.

This allows third-parties to develop extensions to track the load of
Waitress via their favorite monitoring solutions.

Support for signals depends on the availability of the "blinker"
library. When it is not available, signals act as a no-op and cannot be
subscribed to.
@NicolasLM
Copy link
Author

This is a proposal solution for #182 using signals.

I included an example that uses the exposed signals to show the number of worker threads busy handling requests. Executing it and sending a few requests leads to:

Started server listening on ('0.0.0.0', 8080) with 2 threads
Thread waitress-0 started task, 1/2
Thread waitress-1 started task, 2/2
Thread waitress-0 finished task, 1/2
Thread waitress-1 finished task, 0/2
Thread waitress-0 started task, 1/2
Thread waitress-1 started task, 2/2
WARNING:waitress.queue:Task queue depth is 1
WARNING:waitress.queue:Task queue depth is 2
Thread waitress-0 finished task, 1/2
Thread waitress-0 started task, 2/2
Thread waitress-1 finished task, 1/2
Thread waitress-1 started task, 2/2
Thread waitress-0 finished task, 1/2
Thread waitress-1 finished task, 0/2
Stopped server listening on ('0.0.0.0', 8080)

Instead of just printing those, users could feed these numbers to Prometheus and start monitoring the load of their clusters.

@mmerickel
Copy link
Member

mmerickel commented Nov 6, 2022

Using the BaseWSGIServer as the sender doesn't offer a strong connection between the listener and the source of the event. There's a couple cases to consider:

  • If someone invokes serve on two interfaces (common).
  • If someone invokes serve multiple times on different ports (less common).

We want the listener to be able to know which server (which interface/port/socket) is emitting the events. The BaseWSGIServer has no public API and even if it did, it does not expose any useful information that relates back to either the specific socket it's serving, or the invocation of serve.

An alternative that is more explicit (and requires no dependencies) is:

class EventListener:
    def on_task_started(self, task):
        pass

    def on_task_finished(self, task):
        pass

    def on_channel_added(self, channel):
        pass

    def on_channel_deleted(self, channel):
        pass


class MyEventListener(EventListener):
    def on_task_started(self, task):
        print(f'task started: {id(task)}')

serve(app, listen='*:8080', listener=MyEventListener())

The other piece of feedback I have right now is that this strategy exposes a lot of objects that are not currently public APIs. For example, BaseWSGIServer, Channel, Task. I hope that we can avoid this and only expose enough info to correlate the data, without allowing interacting with the objects.

Thoughts?

@NicolasLM
Copy link
Author

NicolasLM commented Nov 9, 2022

Using the BaseWSGIServer as the sender doesn't offer a strong connection between the listener and the source of the event.

Definitely, I used that as a placeholder for now but a string identifying the server would be better. I was not sure if the listen string was a good candidate for that.

An alternative that is more explicit (and requires no dependencies) is [...]

The example API you propose would work for this specific case but it has some limitations, on the top of my head:

  • How to use multiple instances of EventListener on the same server? This is important because multiple third-party integrations will need the ability to subscribe simultaneously to the same events.
  • Does the EventListener need to be passed down to every sub object Waitress creates (Channel, Task...) so that hooks can be called or is it stored globally somewhere?
  • How to cancel a subscription to an event after it is not necessary anymore?
  • How can I use a context manager to temporarily subscribe to an event?
  • Can I subscribe to events if Waitress is launched by a third-party library and I don't control the call to serve()?

I get the benefit of not introducing a dependency but I have the feeling that if we roll our own we will just end up with a worse reimplementation of blinker (which is a tiny lib with a permissive license).

The other piece of feedback I have right now is that this strategy exposes a lot of objects that are not currently public APIs. For example, BaseWSGIServer, Channel, Task. I hope that we can avoid this and only expose enough info to correlate the data, without allowing interacting with the objects.

That is a good point I had not considered, we should define what data each event receives.

@mmerickel
Copy link
Member

Basically the only one that isn’t an implementation detail of the user-defined EventListener in your examples is the serve issue. My question is: when do you not control the serve call but you’re comfortable starting a separate server on your own to expose metrics? Shouldn’t those lifecycles basically match or the metrics server should exist first in the process?

@NicolasLM
Copy link
Author

NicolasLM commented Nov 9, 2022

Basically the only one that isn’t an implementation detail of the user-defined EventListener in your examples is the serve issue.

True but I guess I am questioning a bit the point of providing an API so low-level that users have to implement their own solutions for things like using two integrations simultaneously.

Given the fact that a more feature rich and battle tested API already exists and is ready to be used I am struggling to understand the hold up. Is adding an entirely optional dependency to waitress such a big deal?

My question is: when do you not control the serve call but you’re comfortable starting a separate server on your own to expose metrics? Shouldn’t those lifecycles basically match or the metrics server should exist first in the process?

I agree that this particular concern may be far fetched, but as a reminder, the Prometheus approach of exposing metrics via an HTTP endpoint is far from the norm. Sentry, Datadog, statsd... all work in the opposite direction and do not require a separate server.

@mmerickel
Copy link
Member

True but I guess I am questioning a bit the point of providing an API so low-level that users have to implement their own solutions for things like using two integrations simultaneously.

Using blinker or another listener API doesn't really change this. Waitress should just be emitting events when things happen and allow the listener to aggregate data just like in your examples. The fact that blinker allows multiple listeners is nice but not a dealbreaker.

Given the fact that a more feature rich and battle tested API already exists and is ready to be used I am struggling to understand the hold up. Is adding an entirely optional dependency to waitress such a big deal?

I'm mainly hung up on the API and how to expose the information. Using serve() provides an unambiguous hard link to the source of the data. So if we go with the current strategy what do we need to do? listen= is an optional argument (we can combine host and port to look like it), but it doesn't map to a specific interface that the data is coming from (if you want to know it's ipv4 versus ipv6 traffic).

Server:

  • id(BaseWSGIServer)
  • host
  • port
  • family

Channel:

  • id(channel) ?
  • port

Task:

  • some unique id for a request (could be `f'{thread_name} + {threadlocal counter++}' ??)
  • thread name
  • ???

@digitalresistor
Copy link
Member

True but I guess I am questioning a bit the point of providing an API so low-level that users have to implement their own solutions for things like using two integrations simultaneously.

This is something that can be distributed outside of waitress, and means that simple integrations where there is no need to dynamically allow adding/removing of listeners and other complexities are handled by someone implementing them in their own EventListener sub-class.

I really don't like the idea of the code doing different things depending on whether or not some dependency is installed into the local python (conditional imports have caused me pain more than once), and I also don't like the idea of the additional complexity when it is not necessary for 99% of the use cases (simple emitter sending stats to statsd for instance, or JSON lines to vector). Even then there is nothing stopping you from creating an EventListener that composes multiple EventListeners together (or reaches out to something like blinker to do that).

We already pass things down from serve through the adjustments internals and there is no reason why that couldn't be used for this use case either. Unfortunately stuff is already fairly intertwined/tangled together as is, so getting that stuff down as necessary is not an issue.

It also allows you to trivially launch two different waitress servers in a single process without having to manually keep track of server and accounting properly for that as your current example shows populating a dict by the server key.

@NicolasLM
Copy link
Author

Got it, too bad I could not convince you of the awesomeness of signals.

@NicolasLM NicolasLM closed this Nov 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants