Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aeron epoll #1637

Open
alejandrofsGR opened this issue Aug 18, 2022 · 6 comments
Open

Aeron epoll #1637

alejandrofsGR opened this issue Aug 18, 2022 · 6 comments

Comments

@alejandrofsGR
Copy link

Is there a way to have an epoll-like behaviour in Aeron where we can put one or more subscriptions in a set and yield the thread until there's data available to read in any of them? This would greatly help us scale a number of systems with high throughput loads and no particularly low latency requirements.

Thanks.

@tmontgomery
Copy link

We have discussed it a few times, but currently, no. How many subscriptions do you have? Aeron scales quite well in terms of throughput, so curious where your scaling limitations come from.

@alejandrofsGR
Copy link
Author

Thanks for your answer. Here's a few more details about our use case:

We have more subscriptions than threads available to listen to them and the traffic on each subscription varies significantly across subscriptions and over time. We want to keep the subscriptions separated as we save them independently with Aeron archive and that helps other downstream sharded systems. Because there's no epoll, we put threads to sleep for a fixed amount of time if there's nothing to read from Aeron to avoid context-switching active threads with idle ones that are just no-op spinning. Given there's no way to trigger an "early wake up" if data becomes available, we end up with very low IPC and either:

  • High latency (high sleep times).
  • Low throughput (low sleep times with a lot of wasted cycles on CPUs when active threads are preempted).

Please let me know your thoughts on this use case and how to best leverage Aeron to:

  • Scale up the number of subscriptions.
  • Scale up the volume of traffic in each subscription.

Thank you!

@tmontgomery
Copy link

Scaling up the number of subscriptions depends on the QoS needed for them. Assuming the typical case of some streams being latency-sensitive and some not...

Obviously, the streams with low latency demands should be isolated from all other threads. No multiplexing will solve that for you. It would add latency to that path. So, you would want to poll those and assign them to isolated, pinned threads taking into account the data path from the NIC to the CPU, etc. For latency, you don't want those sharing a duty cycle with a lot of other subscriptions that don't have latency demands. Nothing really saves you from having to do this for latency-sensitive streams. It's work that has to be done.

The less latency demanding streams (more throughput demanding maybe) can be combined into a single thread (or a couple) and polled in a round-robin or other ratios.... i.e. you don't have to poll ALL subscriptions on each duty cycle iteration. You can proportion it and use a more aggressive idle scheme such as half of the subscriptions each iteration and no-op/pause/yield idle. A sleeping idle for many subscriptions isn't normally a great idea because the sleep will do exactly what you mention. If the system has to do a lot of other things and needs those threads, then you have to figure out how you balance the latency demands with the thread demands.

In essence, if you have a set of latency-sensitive streams, then place them on specific CPUs.... don't have to be pinned even, just removed from other threads. And the rest balanced out.

If, on the other hand, you have streams that are all the same on demands, then experiment. Perhaps service a proportion each time with a round-robin or with certain more active ones polled more often and an idle that is NOT sleeping, but may be yielding at most. Aeron drivers do this with publications in that there is a ratio of polls of the network to send attempts. Polls of the network are about 1 out of 4 to send attempts.

More than happy to set up a chat to talk more about this if desired.

@alejandrofsGR
Copy link
Author

alejandrofsGR commented Aug 19, 2022

Thanks for your thoughts. Replying in line:

Scaling up the number of subscriptions depends on the QoS needed for them. Assuming the typical case of some streams being latency-sensitive and some not...

None of the streams are particularly latency sensitive, but the point is to have the smallest possible latency while maximising throughput across all subscriptions. Basically maximise number of useful instructions-retired-per-cycle and overall CPU usage. Ideally, when a thread is running, it's doing useful work, and no CPU should be idle when there is useful work to do. That's easy to do with an epoll.

Obviously, the streams with low latency demands should be isolated from all other threads. No multiplexing will solve that for you. It would add latency to that path. So, you would want to poll those and assign them to isolated, pinned threads taking into account the data path from the NIC to the CPU, etc. For latency, you don't want those sharing a duty cycle with a lot of other subscriptions that don't have latency demands. Nothing really saves you from having to do this for latency-sensitive streams. It's work that has to be done.

We have no particularly latency-sensitive streams, but there's many streams (more than CPUs available) and their data rate is variable. Therefore, we don't want to isolate anything. In general, we'd rather void any unnecessary complexity on our end.

The less latency demanding streams (more throughput demanding maybe) can be combined into a single thread (or a couple) and polled in a round-robin or other ratios.... i.e. you don't have to poll ALL subscriptions on each duty cycle iteration. You can proportion it and use a more aggressive idle scheme such as half of the subscriptions each iteration and no-op/pause/yield idle. A sleeping idle for many subscriptions isn't normally a great idea because the sleep will do exactly what you mention. If the system has to do a lot of other things and needs those threads, then you have to figure out how you balance the latency demands with the thread demands.

I understand what you mean, but no matter how clever we are with busy polling, we are still wasting cycles when there's nothing to read from a subscription. We have hundreds of subscriptions and only a dozen CPUs to handle them. Most importantly, we don't want to maintain this complex scheduler on our end. All we want is a single thread on a epoll, and each time the epoll wakes up, each individual subscription with data ready will be processed concurrently with as many CPUs as we have available. Subscriptions that are idle should waste zero cycles.

If, on the other hand, you have streams that are all the same on demands, then experiment. Perhaps service a proportion each time with a round-robin or with certain more active ones polled more often and an idle that is NOT sleeping, but may be yielding at most. Aeron drivers do this with publications in that there is a ratio of polls of the network to send attempts. Polls of the network are about 1 out of 4 to send attempts.

That's where we are right now - experimenting. It's hard to maximise efficiency without complex IO scheduling logic like the one you're proposing, which would still not be as good and easy as an epoll. Latency is not a problem for this use case, but we want to do as much as possible as quickly as possible with what we've got.

In summary, it feels like epoll would solve all our problems without any major system upgrades. However, because that's not supported by Aeron, it looks like the only alternatives will add more unnecessary complexity to our systems and still deliver sub-optimal results. From where we stand, it seems reasonable for Aeron to allow clients to fan-in many subscriptions and only do useful work when it's needed. Is that still up for debate? Are there any hopes for supporting an epoll-like API in Aeron?

@tmontgomery
Copy link

We do see some value in having a demux API for Aeron such as epoll. So, I wouldn't say it is not up for debate. But it is not currently on our roadmap, though. Is it something you all would be open to discuss supporting development of?

@alejandrofsGR
Copy link
Author

Thank you. Let me take this back to my team and discuss supporting its development. I'll be in touch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants