Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add capabilities, priority, and oom-score-adjustment support on Linux #404

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

q66
Copy link
Contributor

@q66 q66 commented Oct 19, 2024

This is not properly tested yet and does not come with documentation. I mostly want to get an opinion on it for now.

This adds 3 things; when dinit is compiled without support, all of them will raise errors:

  1. The capabilities service field. This is an IAB string; it's parsed with cap_iab_from_string so it can take any value that can parse, e.g. ^cap_sys_time,^cap_net_admin for ambient caps. It supports the += operator, which will append to it, delimiting with a comma.
  2. The secure-bits service field, which implements securebits flags as a companion to the above.
  3. The new option no-new-privs which will use prctl to prevent gaining new privileges across execve (e.g. suid).

For now I want to get an opinion on the overall implementation. One thing I am not sure about for example is the IAB string parsing; I currently do it in load-service.cc because it must be done after all the concats have been completed. I find it to be a bit out of place there, however, so maybe there is a better way. Or if you have any other clues about anything, etc...

Fixes #398

@mobin-2008 mobin-2008 added Enhancement/New Feature Improving things or introduce new feature P-Linux Things related to Linux A-Importance: Normal C-dinit-service Things about dinit-service meson Things about Dinit's Meson build system make Things about Dinit's Make build system labels Oct 19, 2024
@q66
Copy link
Contributor Author

q66 commented Oct 19, 2024

I also wonder if i should add functionality for setting scheduling and i/o priorities while at it, and perhaps oom-killer score adjustment (they are useful to have especially for services that change uid, as any wrapper will already run with decreased privileges)? i could also do it later, but it's simple enough to do...

i guess all 3 of these should also be specific to linux at this point, because the implementation will differ and so will the range? should these also be behind an option like cgroups? or just #ifdef __linux__?

@q66
Copy link
Contributor Author

q66 commented Oct 20, 2024

this is what the setting of the 3 values would look like without the parsing (the input values for nice would go integer -20 to 19, for ionice it would probably be good to parse it properly which would mean none, rt:0 to rt:7, be:0 to be:7 and idle, for oom adjustment it should take -1000 to 1000), and guards:

diff --git a/src/includes/proc-service.h b/src/includes/proc-service.h
index bb7ed42..8d622cd 100644
--- a/src/includes/proc-service.h
+++ b/src/includes/proc-service.h
@@ -43,6 +43,9 @@ struct run_proc_params
     bool on_console;          // whether to run on console
     bool in_foreground;       // if on console: whether to run in foreground
     bool unmask_sigint = false; // if in foreground: whether to unmask SIGINT
+    int nice = INT_MIN;
+    int ionice = INT_MIN;
+    int oom_adj = INT_MIN;
     int wpipefd;              // pipe to which error status will be sent (if error occurs)
     int csfd;                 // control socket fd (or -1); may be moved
     int socket_fd;            // pre-opened socket fd (or -1); may be moved
diff --git a/src/run-child-proc.cc b/src/run-child-proc.cc
index b206b6d..ef9392b 100644
--- a/src/run-child-proc.cc
+++ b/src/run-child-proc.cc
@@ -65,6 +65,9 @@ void base_process_service::run_child_proc(run_proc_params params) noexcept
     const char *working_dir = params.working_dir;
     const char *logfile = params.logfile;
     bool on_console = params.on_console;
+    int nice = params.nice;
+    int ionice = params.ionice;
+    int oom_adj = params.oom_adj;
     int wpipefd = params.wpipefd;
     int csfd = params.csfd;
     int notify_fd = params.notify_fd;
@@ -301,6 +304,38 @@ void base_process_service::run_child_proc(run_proc_params params) noexcept
         if (setrlimit(limit.resource_id, &setlimits) != 0) goto failure_out;
     }
 
+    // nice
+    if (nice != INT_MIN) {
+        if (setpriority(PRIO_PROCESS, getpid(), nice) != 0) goto failure_out;
+        // we usually create a new session leader; that makes nice not very
+        // useful as the Linux kernel will autogroup processes by session id
+        // except when disabled - so also work around this where enabled
+        // the r+ is used in order to avoid creating it where already disabled
+        errno = 0;
+        FILE *ag = std::fopen("/proc/self/autogroup", "r+");
+        if (ag) {
+            std::fprintf(ag, "%d\n", nice);
+            std::fclose(ag);
+        }
+        else if (errno != ENOENT) goto failure_out;
+    }
+
+    // ionice
+    if (ionice != INT_MIN) {
+        #ifdef __NR_ioprio_set
+        if (syscall(__NR_ioprio_set, 1, (int)getpid(), ionice) != 0) goto failure_out;
+        #endif
+    }
+
+    // oom score adjustment
+    if (oom_adj != INT_MIN) {
+        errno = 0;
+        FILE *adj = std::fopen("/proc/self/oom_score_adj", "w");
+        if (!adj) goto failure_out;
+        std::fprintf(adj, "%d\n", oom_adj);
+        std::fclose(adj);
+    }
+
     #if SUPPORT_CGROUPS
     if (params.run_in_cgroup != nullptr && *params.run_in_cgroup != 0) {
         err.stage = exec_stage::ENTER_CGROUP;

@q66 q66 force-pushed the capabilities branch 3 times, most recently from 3679b5c to 16ab75c Compare October 20, 2024 15:27
@q66 q66 changed the title [RFC] Add capabilities support on Linux [RFC] Add capabilities, priority, and oom-score-adjustment support on Linux Oct 20, 2024
@q66
Copy link
Contributor Author

q66 commented Oct 20, 2024

I implemented everything. What's left now is basically

  1. documentation
  2. testing
  3. deciding how to guard the features; whether we should have a separate feature for each mainly
  4. the cap parsing thing
  5. maybe something else? you tell me

@davmac314
Copy link
Owner

No strong objections/opinions from me, though as usual good documentation will be important.

nice/io_nice/oom_adj are probably unnecessary when cgroups are used properly. If they were to be supported, documentation must clearly state the requirements, i.e. /proc must be mounted for oom_adj.

One thing I am not sure about for example is the IAB string parsing; I currently do it in load-service.cc

I think that's fine tbh.

It might be worth gating Linux-specific features behind something other than __linux__, so you can use it in documentation too (and so those that prefer to use cgroups could just disable these features).

Setting for nice should be possible on other OSes, setpriority and nice are both part of POSIX (and I think you can use nice rather than setpriority unless I've missed something). The autogroups hack is unfortunate, I guess there's no way around it though.

@q66
Copy link
Contributor Author

q66 commented Oct 21, 2024

well, nice will add to the current level, while setpriority actually sets it, so there is a difference - both are posix though...

the main non-portability for that is range checking, as the acceptable range of values may vary depending on systems

maybe i could

  1. implement nice unconditionally - not sure about the range checking part though? setpriority will clamp the input, so if you set it too high or too low, it will use the nearest supported value; but either way, do the autogroups hack only for linux
  2. put the ionice behind its own feature
  3. put the oom-score-adj behind its own feature

@davmac314
Copy link
Owner

well, nice will add to the current level, while setpriority actually sets it, so there is a difference

Ah right, yeah.

the main non-portability for that is range checking, as the acceptable range of values may vary depending on systems

True, but I don't think we need to be overly fussed about checking it ourselves. Perhaps add a flag to indicate whether the value is set rather than relying on INT_MIN as a sentinel value.

maybe i could [...]

Yep, that sounds reasonable to me.

@q66
Copy link
Contributor Author

q66 commented Oct 21, 2024

ok, i think i should have a cleaned up PR sometime this evening...

@q66
Copy link
Contributor Author

q66 commented Oct 21, 2024

I added documentation, feature flags, and improved the resource management for the capabilities IAB object, which should be a lot cleaner now.

It should be reviewable now, though I still need to test it a bunch more.

@q66 q66 changed the title [RFC] Add capabilities, priority, and oom-score-adjustment support on Linux Add capabilities, priority, and oom-score-adjustment support on Linux Oct 21, 2024
@davmac314
Copy link
Owner

Will review this weekend

Copy link
Owner

@davmac314 davmac314 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I actually have a few minor things that I think would be good to change, as noted. Thanks for the PR.

src/includes/load-service.h Outdated Show resolved Hide resolved
src/includes/load-service.h Outdated Show resolved Hide resolved
src/run-child-proc.cc Outdated Show resolved Hide resolved
src/run-child-proc.cc Outdated Show resolved Hide resolved
Comment on lines +6 to +8
ifeq ($(SUPPORT_CAPABILITIES),1)
ALL_TEST_LDFLAGS+=-lcap
endif
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't really the right place for this. If tests need -lcap then it should be in TEST_LDFLAGS. That could be set in mconfig.Linux/mconfig.Linux.sh, but see comment there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all linking needs -lcap due to the IAB abstraction in the load-service.h

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can go into both TEST_LDFLAGS and LDFLAGS, what I mean is that this should happen in the build configuration (mconfig.* / configure script) though rather than being done ad-hoc in various makefiles.

@@ -38,6 +38,9 @@ TEST_LDFLAGS=$(TEST_LDFLAGS_BASE) $(TEST_CXXFLAGS)
# Features.

SUPPORT_CGROUPS=1
SUPPORT_CAPABILITIES=1
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given this requires a dependency (in libcap), I'd prefer the default here was SUPPORT_CAPABILITIES=0. It's fine to have it on-by-default in the configure script since we can check the dependency is available there (but that should be implemented).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it may add a dependency, but it's a dependency that's present on virtually any linux system (it's required by things like... udev), imo it makes much more sense to keep it on

Copy link
Owner

@davmac314 davmac314 Oct 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The library will be present but not necessarily the headers, and people using a straight Makefile-based build are often just trying out Dinit for personal use. I'd like to make sure that this scenario still works as easily as possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, but those people can easily disable it; i don't think it's right to default to a configuration where capabilities are not supported, the default build should imo be one that is as distros should build it

it's trivial to disable for somebody just trying it (it's also trivial to just install libcap-devel or whatever, in 99% of scenarios it's easily doable, and you already need to install a development toolchain)

Comment on lines +12 to +14
ifeq ($(SUPPORT_CAPABILITIES),1)
ALL_LDFLAGS+=-lcap
endif
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should already be LDFLAGS if it's needed (see review comments elsewhere for further details).

@@ -467,6 +482,9 @@ STRIPOPTS=$STRIPOPTS
# Feature settings
SUPPORT_CGROUPS=$SUPPORT_CGROUPS
USE_INITGROUPS=$USE_INITGROUPS
SUPPORT_CAPABILITIES=$SUPPORT_CAPABILITIES
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If SUPPORT_CAPABILITIES is 1 then -lcap needs to be added to LDFLAGS and TEST_LDFLAGS if they haven't been specified manually. See for example the tests under the heading "# Check whether sanitizers can be used for tests" (that's only dealing with TEST_LDFLAGS but the general principle should be the same for LDFLAGS).

src/includes/proc-service.h Show resolved Hide resolved
src/includes/proc-service.h Outdated Show resolved Hide resolved
meson.build Outdated Show resolved Hide resolved
@q66 q66 force-pushed the capabilities branch 4 times, most recently from b5009d7 to 91f0d98 Compare October 29, 2024 23:21
@q66 q66 force-pushed the capabilities branch 3 times, most recently from d6970e2 to 04b6af9 Compare October 29, 2024 23:32
@q66 q66 force-pushed the capabilities branch 2 times, most recently from 96f50c9 to bdb249d Compare October 29, 2024 23:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-Importance: Normal C-dinit-service Things about dinit-service Enhancement/New Feature Improving things or introduce new feature make Things about Dinit's Make build system meson Things about Dinit's Meson build system P-Linux Things related to Linux
Projects
None yet
Development

Successfully merging this pull request may close these issues.

capabilities support on linux
3 participants