-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: uspace: Let the user choose the CPU affinity per hal thread #2514
base: master
Are you sure you want to change the base?
Conversation
Can you provide some insight into why this is useful? I suspect knowing this might make it easier to comment on the approach used and other alternatives. |
Well, this is especially useful, if one CPU is not fast enough to handle all RT tasks. |
I think this should be extended to setting the affinity for the NIC used to connect with any Ethernet hardware. It's been reported (and also recommended by the RT kernel guys) that the NIC should share the core that is isolated for PREEMPT_RT |
Can you provide a link to more information on the NIC affinity topic? I don't know what this means. Please also note that this PR does not add the ability to configure a thread CPU affinity. It merely extends the existing mechanism to support different CPU affinities per thread, instead of only having the option of a single selectable CPU that all threads can be directed to. |
NIC = Network interface card. We have some discussion with the RT kernel team and it has been suggested the NIC interrupt should be moved to the isolated core. @pcw-mesa will know more as he stated: It would be good if this could be done without the user having to be involved in some low level Linux configuration. |
Ok. I don't think LinuxCNC is the right place to do some modification of the Ethernet IRQ. I don't see why LinuxCNC should be involved here, except for maybe some configuration hint in the documentation of such card drivers. And this has basically nothing to do with this PR. :) |
I suspect that you might be missing the point that we run an ethernet driver in the realtime thread if we are using a Mesa Ethernet card. I think that controlling the ethernet IRQ affinity is very much part of the bigger picture in this case. |
Can we please not mix up IRQ and thread CPU affinity? I do not see how that has anything to do with the change proposed in this PR. |
There are command line tools that allow you to query the interupts in use and set the affinity. But it's not a user friendly process. Perhaps the hm2_eth driver could be modified to query the core the thread is running on and adjust the affinity of the NIC used to match the thread defined here. This PR could help this process by saving the CPU core for the threads it creates somewhere in the Linuxcnc environment. In the latest kernels (from 5.10 and on), network latency is a major issue on many computers. This was never an issue with Debian Buster (4.19 kernel). Debian Bookworm is released in 3 days with the 6.1 kernel and linuxcnc-uspace is in it's repos so everything that can be done to reduce network latency will help reduce user support issues. So while it may or may not be related to this issue, IRQ affinity of the NIC is a very real issue facing this project. Some of us have been battling to understand the issue for over 12 months1 |
This is not just a issue with Hostmot2 but any Ethernet connected device that uses the standard network stack. |
This PR breaks the rate-monotonic scheduling (RMS) promise we make here: http://linuxcnc.org/docs/2.9/html/man/man3/hal_create_thread.3hal.html#DESCRIPTION Maybe that's ok, but we should talk about it before changing the behavior of LinuxCNC in this way. RMS means that higher-priority threads may interrupt lower-priority threads, but lower-priority threads may not interrupt higher-priority threads. In LinuxCNC, this means the base thread may interrupt the servo thread, but the servo thread may not interrupt the base thread. This is probably mostly important for LinuxCNC's split-thread components, i.e. components that need functions in both the base thread and the servo thread, such as stepgen, pwmgen and encoder. The base-thread functions of these components can currently depend on information from the servo thread not changing while they're running. What are the effects if we remove that guarantee, like this PR does? I'm not sure, but we should figure that out before making this change. (I have independently been working on a different change to uspace scheduling that also moves different threads to different CPUs, see https://github.com/LinuxCNC/linuxcnc/tree/busywait8. I've been holding it back partially for this RMS reason, so I am also interested in finding the answer to this question.) |
Thank you for your comment @SebKuzminsky . I was not aware that we made such a guarantee in LinuxCNC. Lots of years ago I played around with machinekit. (Disclaimer: Therefore my knowledge about them is not up to date). But maybe we can find a lightweight synchronization mechanism (e.g. one barrier per task entry/exit plus some additional barriers where needed. But not on every hal signal access) that doesn't kill performance. |
The current situation, with all realtime threads sharing one CPU using rate-monotonic scheduling, is this:
Therefore:
Is that correct? Does it adequately describe the current situation? This... doesn't seem ideal. But apparently it works well enough that our users don't report bugs about it. I think an ideal solution would have the following properties:
This problem statement is drawn both from the multi-thread components like stepgen/pwmgen/encoder mentioned above, and from #2386. Do we agree on that problem statement? |
stuttgart meeting would like to know how this relates to @SebKuzminsky s busywait branch? |
This is the patch I currently use to set the CPU affinity per hal thread.
I want to select different CPUs for the different threads.
I'm not convinced about doing that via prio mapping, because it's kind of not intuitive. But it was easiest to implement for now.
That's why this is RFC.
But I'd like to hear your general opinion on this topic, before implementing this in another way.
The way this works is that threads get descending priorities and for each prio we can select a different CPU (if we want) by specifying the CPU number in the corresponding environment variable (e.g. RTAPI_CPU_NUMBER_PRIO98=1 to run the thread with prio 98 on CPU 1).