Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tetragon is not showing process exec ancestors #2420

Open
alexeysofin opened this issue May 8, 2024 · 5 comments · May be fixed by #2938
Open

Tetragon is not showing process exec ancestors #2420

alexeysofin opened this issue May 8, 2024 · 5 comments · May be fixed by #2938
Labels
kind/bug Something isn't working

Comments

@alexeysofin
Copy link
Contributor

alexeysofin commented May 8, 2024

What happened?

Tetragon version

time="2024-05-08T08:14:51Z" level=info msg="Starting tetragon" version=v1.0.3
time="2024-05-08T08:14:51Z" level=info msg="config settings" config="map[bpf-lib:/var/lib/tetragon/ btf: config-dir:/etc/tetragon/tetragon.conf.d/ cpuprofile: data-cache-size:1024 debug:false disable-kprobe-multi:false enable-export-aggregation:false enable-k8s-api:true enable-msg-handling-latency:false enable-pid-set-filter:false enable-pod-info:false enable-policy-filter:true enable-policy-filter-debug:false enable-process-ancestors:true enable-process-cred:false enable-process-ns:false event-queue-size:10000 export-aggregation-buffer-size:10000 export-aggregation-window-size:15s export-allowlist:{\"event_set\":[\"PROCESS_EXEC\", \"PROCESS_EXIT\", \"PROCESS_KPROBE\", \"PROCESS_UPROBE\", \"PROCESS_TRACEPOINT\"]} export-denylist:{\"namespace\":[\"\", \"cilium\", \"kube-system\"]} export-file-compress:false export-file-max-backups:5 export-file-max-size-mb:10 export-file-perm:600 export-file-rotation-interval:0s export-filename:/var/run/cilium/tetragon/tetragon.log export-rate-limit:-1 expose-kernel-addresses:false field-filters: force-large-progs:false force-small-progs:false gops-address:localhost:8118 k8s-kubeconfig-path: kernel: kmods:[] log-format:text log-level:info memprofile: metrics-label-filter:namespace,workload,pod,binary metrics-server::2112 netns-dir:/var/run/docker/netns/ pprof-addr: process-cache-size:65536 procfs:/procRoot rb-queue-size:65535 rb-size:0 rb-size-total:0 redaction-filters: release-pinned-bpf:true server-address:localhost:54321 tracing-policy: tracing-policy-dir:/etc/tetragon/tetragon.tp.d verbose:0]"

Kind version

kind version
kind v0.22.0 go1.21.3 linux/amd64

deployed using default helm.

if I start a pod with image debian:bookworm-slim, exec into the pod and run this bash script.

./script.sh

#!/bin/bash
set -e

response=$(timeout -s 15 5 curl google.com)
echo $response

I am not getting any ancestors in the log

{
    "process_exec": {
        "process": {
            "exec_id": "a2luZC13b3JrZXI6MjA4MDgwNTYwNTc1ODoyNzcyNg==",
            "pid": 27726,
            "uid": 0,
            "cwd": "/root",
            "binary": "/usr/bin/curl",
            "arguments": "google.com",
            "flags": "execve clone",
            "start_time": "2024-05-08T08:15:48.966646768Z",
            "auid": 4294967295,
            "pod": {
                "namespace": "default",
                "name": "test-pod",
                "container": {
                    "id": "containerd://796556cd4570c4a238358a8afc595698d23554e14348ecbe1ebf68c099efaadc",
                    "name": "test-pod",
                    "image": {
                        "id": "docker.io/library/debian@sha256:155280b00ee0133250f7159b567a07d7cd03b1645714c3a7458b2287b0ca83cb",
                        "name": "docker.io/library/debian:bookworm-slim"
                    },
                    "start_time": "2024-05-08T07:47:45Z",
                    "pid": 3141
                },
                "pod_labels": {
                    "run": "test-pod"
                },
                "workload": "test-pod",
                "workload_kind": "Pod"
            },
            "docker": "796556cd4570c4a238358a8afc59569",
            "parent_exec_id": "a2luZC13b3JrZXI6MjA4MDgwMjUwMjY0MzoyNzcyNQ==",
            "tid": 27726
        },
        "parent": {
            "exec_id": "a2luZC13b3JrZXI6MjA4MDgwMjUwMjY0MzoyNzcyNQ==",
            "pid": 27725,
            "uid": 0,
            "cwd": "/root",
            "binary": "/usr/bin/timeout",
            "arguments": "-s 15 5 curl google.com",
            "flags": "execve clone",
            "start_time": "2024-05-08T08:15:48.963545732Z",
            "auid": 4294967295,
            "pod": {
                "namespace": "default",
                "name": "test-pod",
                "container": {
                    "id": "containerd://796556cd4570c4a238358a8afc595698d23554e14348ecbe1ebf68c099efaadc",
                    "name": "test-pod",
                    "image": {
                        "id": "docker.io/library/debian@sha256:155280b00ee0133250f7159b567a07d7cd03b1645714c3a7458b2287b0ca83cb",
                        "name": "docker.io/library/debian:bookworm-slim"
                    },
                    "start_time": "2024-05-08T07:47:45Z",
                    "pid": 3140
                },
                "pod_labels": {
                    "run": "test-pod"
                },
                "workload": "test-pod",
                "workload_kind": "Pod"
            },
            "docker": "796556cd4570c4a238358a8afc59569",
            "parent_exec_id": "a2luZC13b3JrZXI6MjA4MDc5MDQxNjAwNDoyNzcyNA==",
            "tid": 27725
        }
    },
    "node_name": "kind-worker",
    "time": "2024-05-08T08:15:48.966644780Z"
}

is there something I'm doing wrong? This seems critical for more or less high-loaded clusters where container's health-checks can quickly overwhelm the log systems. In addition to that I think healthchecks can not filter out by ancestors either but at least we can do that by an intermediate filter system if the ancestors were there.

Tetragon Version

CLI version: v1.0.2

Kernel Version

Linux *** 6.5.0-28-generic #29~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Thu Apr 4 14:39:20 UTC 2 x86_64 x86_64 x86_64 GNU/Linux

Kubernetes Version

Server Version: v1.29.2

Bugtool

No response

Relevant log output

No response

Anything else?

No response

@alexeysofin alexeysofin added the kind/bug Something isn't working label May 8, 2024
@mtardy
Copy link
Member

mtardy commented May 16, 2024

Thanks for taking the time to open this issue. So you can see the process information and its parent in your event (if you get the event that contains the parent, or retrieve the information externally, you can rebuild an ancestor tree).

The process ancestry is a feature that is not available on the OSS version of Tetragon. May I ask where you saw mentions of this feature?

@t0x01
Copy link

t0x01 commented Sep 6, 2024

Hello.

Are there any plans to add process ancestry feature to tetragon in any foreseeable future? It really is very useful.

I've implemented my own version of it plus additional ancestor_binary_regex filter recently and so far it seems to be working fine. Not sure if my approach for it was optimal though, since i just basically added an optional loop to pkg/grpc/exec/exec.go. Not sure if i should create a PR as well, since it is a feature of the enterprise version.

@alexeysofin
Copy link
Contributor Author

@mtardy

May I ask where you saw mentions of this feature?

Nowhere, but this is just obvious that in a more or less loaded cluster health checks will be 99% of events, happening thousands per second, and in addition there are go structures for ancestors, which are always empty.

So we ended up with a custom solution as well, but without forking tetragon as per @t0x01, but as a secondary process that tracks process trees and is injected into the data delivery pipeline.

@t0x01
Copy link

t0x01 commented Sep 19, 2024

Hello, @mtardy.

Just trying to make sure. Since this feature is available only in the Isovalent enterprise version of Tetragon, is it prohibited to add it to the open-source version or anyone can essentially propose required changes via a PR anyway? It is a very usefull feature to have for both observability and filtering purposes. As i mentioned earlier, i've implemented my own version of it recently and it seems to be working well enough. At least as far as i can tell.

What i've changed:

  • Read and set option enable-process-ancestors from the config file. Turn option enable-process-ancestors off by default.
  • If option enable-process-ancestors is set, try to include ancestors (up to PID 1/PID 2) of the process beyond the immediate parent in process_exec, process_exit, process_uprobe, process_kprobe, process_lsm, process_tracepoint events in a respective protobuf message for the given process.
  • If option enable-process-ancestors is set and there was an error when trying to include process' ancestors in the protobuf message, add the event to eventcache for reprocessing.
  • When trying to reprocess events from eventcache, if option enable-process-ancestors is set and Ancestors is nil, try to include process' ancestors again.
  • Implement a new export filter that can filter over ancestor binary names using RE2 regular expressions.
  • Add a new test for the new export filter.
  • Add information about new features to documentation.

All changes can be found here. I'm not quite certain, where and how it can be improved as of now. Please let me know if these changes are allowed to be added to the open-source version of Tetragon, and if so is it required to add or change anything else before creating a PR. Thank you.

@jrfastab
Copy link
Contributor

Please submit a PR the list looks good and I'll review it wen the PR exists, didn't look at the link yet I'm currently at Linux Plumbers Conference but can look when I get back in a few days. Whatever different folks have forked on or added on top of Tetragon doesn't impact what we should do in Tetragon. Assuming the code looks good and no one has technical arguments against it I say we can push it. Thanks!

@t0x01 t0x01 linked a pull request Sep 19, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants