Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop tracking skb in the end of its lifetime #241

Merged
merged 2 commits into from
Aug 30, 2023
Merged

Stop tracking skb in the end of its lifetime #241

merged 2 commits into from
Aug 30, 2023

Conversation

jschwinger233
Copy link
Member

@jschwinger233 jschwinger233 commented Aug 28, 2023

--filter-track-skb has a problem of mismatching skbs with same pointer address. This PR tries to fix it by stopping tracking after an skb ends its lifetime.

In general, an skb has 3 possible endings:

  1. consumed locally
  2. dropped locally
  3. transferred via a netdev

All the three endings hit kfree_skbmem eventually, so we delete tracked skb pointer address from bpf map at that time.

Fixes: #194

Signed-off-by: Zhichuan Liang gray.liang@isovalent.com

@jschwinger233
Copy link
Member Author

The outstanding issue is, I can't come up with a good flag name.

In the 3rd commit, I added a flag --filter-track-skb-across-namespaces to control if we see eth_type_trans as a lifetime termination.
For a containerized cluster such as kind cluster, we don't want to stop tracking on eth_type_trans; for bare metal node, or in the case we want to focus on a specific netns, we definitely want to stop at eth_type_trans.

But I don't think --filter-track-skb-across-namespaces is a good name. Do you have some idea? @brb

Previous injection assumes the next instruction to `printk()` is
"goto <label_return>", which isn't guaranteed by clang.

Technically, previous injection adds a ret insn to implement "exit
immediately if not matched":

    asm.Return().WithSymbol("return")

which forces us to remove the duplicated assumed "goto <label_return>"
after `printk()`.

This commit improves the way to inject ebpf pcap filter without such
assumption by setting proper values to registers to avoid adding ret
insn.

Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com>
main.go Outdated
if flags.FilterTrackSkb {
skbLifetimeEndAt := []string{"kfree_skbmem"}
if !flags.FilterTrackSkbAcrossNamespaces {
skbLifetimeEndAt = append(skbLifetimeEndAt, "eth_type_trans")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know how to do it flexibly without hard coding eth_type_trans and kfree_skbmem - -

@brb
Copy link
Member

brb commented Aug 29, 2023

But I don't think --filter-track-skb-across-namespaces is a good name. Do you have some idea? @brb

Tough one 😅 What's about --filter-stop-track-skb-after-emit?

@jschwinger233
Copy link
Member Author

jschwinger233 commented Aug 29, 2023

Marking kfree_skbmem as skb lifetime termination looks fine.

As for eth_type_trans, it definitely is a bad choice.

Below is a typical pwru output for an egress traffic inside a netns:

0xffff97bfe0bb54e8      0           [curl]             ip_local_out netns=4026535338 mark=0x0 ifindex=0 proto=0 mtu=0 len=60 10.0.1.45:54816->10.0.0.99:80(tcp)
0xffff97bfe0bb54e8      0           [curl]           __ip_local_out netns=4026535338 mark=0x0 ifindex=0 proto=0 mtu=0 len=60 10.0.1.45:54816->10.0.0.99:80(tcp)
0xffff97bfe0bb54e8      0           [curl]                ip_output netns=4026535338 mark=0x0 ifindex=0 proto=8 mtu=0 len=60 10.0.1.45:54816->10.0.0.99:80(tcp)
0xffff97bfe0bb54e8      0           [curl]             nf_hook_slow netns=4026535338 mark=0x0 ifindex=13 proto=8 mtu=1500 len=60 10.0.1.45:54816->10.0.0.99:80(tcp)
0xffff97bfe0bb54e8      0           [curl]    apparmor_ip_postroute netns=4026535338 mark=0x0 ifindex=13 proto=8 mtu=1500 len=60 10.0.1.45:54816->10.0.0.99:80(tcp)
0xffff97bfe0bb54e8      0           [curl]         ip_finish_output netns=4026535338 mark=0x0 ifindex=13 proto=8 mtu=1500 len=60 10.0.1.45:54816->10.0.0.99:80(tcp)
0xffff97bfe0bb54e8      0           [curl]       __ip_finish_output netns=4026535338 mark=0x0 ifindex=13 proto=8 mtu=1500 len=60 10.0.1.45:54816->10.0.0.99:80(tcp)
0xffff97bfe0bb54e8      0           [curl]        ip_finish_output2 netns=4026535338 mark=0x0 ifindex=13 proto=8 mtu=1500 len=60 10.0.1.45:54816->10.0.0.99:80(tcp)
0xffff97bfe0bb54e8      0           [curl]     neigh_resolve_output netns=4026535338 mark=0x0 ifindex=13 proto=8 mtu=1500 len=60 10.0.1.45:54816->10.0.0.99:80(tcp)
0xffff97bfe0bb54e8      0           [curl]       __neigh_event_send netns=4026535338 mark=0x0 ifindex=13 proto=8 mtu=1500 len=60 10.0.1.45:54816->10.0.0.99:80(tcp)
0xffff97bfe0bb54e8      0           [curl]               eth_header netns=4026535338 mark=0x0 ifindex=13 proto=8 mtu=1500 len=60 10.0.1.45:54816->10.0.0.99:80(tcp)
0xffff97bfe0bb54e8      0           [curl]                 skb_push netns=4026535338 mark=0x0 ifindex=13 proto=8 mtu=1500 len=60 10.0.1.45:54816->10.0.0.99:80(tcp)
0xffff97bfe0bb54e8      0           [curl]         __dev_queue_xmit netns=4026535338 mark=0x0 ifindex=13 proto=8 mtu=1500 len=74 10.0.1.45:54816->10.0.0.99:80(tcp)
0xffff97bfe0bb54e8      0           [curl]       qdisc_pkt_len_init netns=4026535338 mark=0x0 ifindex=13 proto=8 mtu=1500 len=74 10.0.1.45:54816->10.0.0.99:80(tcp)
0xffff97bfe0bb54e8      0           [curl]      netdev_core_pick_tx netns=4026535338 mark=0x0 ifindex=13 proto=8 mtu=1500 len=74 10.0.1.45:54816->10.0.0.99:80(tcp)
0xffff97bfe0bb54e8      0           [curl]        validate_xmit_skb netns=4026535338 mark=0x0 ifindex=13 proto=8 mtu=1500 len=74 10.0.1.45:54816->10.0.0.99:80(tcp)
0xffff97bfe0bb54e8      0           [curl]       netif_skb_features netns=4026535338 mark=0x0 ifindex=13 proto=8 mtu=1500 len=74 10.0.1.45:54816->10.0.0.99:80(tcp)
0xffff97bfe0bb54e8      0           [curl]  passthru_features_check netns=4026535338 mark=0x0 ifindex=13 proto=8 mtu=1500 len=74 10.0.1.45:54816->10.0.0.99:80(tcp)
0xffff97bfe0bb54e8      0           [curl]     skb_network_protocol netns=4026535338 mark=0x0 ifindex=13 proto=8 mtu=1500 len=74 10.0.1.45:54816->10.0.0.99:80(tcp)
0xffff97bfe0bb54e8      0           [curl]  skb_csum_hwoffload_help netns=4026535338 mark=0x0 ifindex=13 proto=8 mtu=1500 len=74 10.0.1.45:54816->10.0.0.99:80(tcp)
0xffff97bfe0bb54e8      0           [curl]       validate_xmit_xfrm netns=4026535338 mark=0x0 ifindex=13 proto=8 mtu=1500 len=74 10.0.1.45:54816->10.0.0.99:80(tcp)
0xffff97bfe0bb54e8      0           [curl]      dev_hard_start_xmit netns=4026535338 mark=0x0 ifindex=13 proto=8 mtu=1500 len=74 10.0.1.45:54816->10.0.0.99:80(tcp)
0xffff97bfe0bb54e8      0           [curl]         veth_xmit	[veth] netns=4026535338 mark=0x0 ifindex=13 proto=8 mtu=1500 len=74 10.0.1.45:54816->10.0.0.99:80(tcp)
0xffff97bfe0bb54e8      0           [curl]   skb_clone_tx_timestamp netns=4026535338 mark=0x0 ifindex=13 proto=8 mtu=1500 len=74 10.0.1.45:54816->10.0.0.99:80(tcp)
0xffff97bfe0bb54e8      0           [curl]        __dev_forward_skb netns=4026535338 mark=0x0 ifindex=13 proto=8 mtu=1500 len=74 10.0.1.45:54816->10.0.0.99:80(tcp)
0xffff97bfe0bb54e8      0           [curl]       __dev_forward_skb2 netns=4026535338 mark=0x0 ifindex=13 proto=8 mtu=1500 len=74 10.0.1.45:54816->10.0.0.99:80(tcp)
0xffff97bfe0bb54e8      0           [curl]         skb_scrub_packet netns=4026535338 mark=0x0 ifindex=13 proto=8 mtu=1500 len=74 10.0.1.45:54816->10.0.0.99:80(tcp)
0xffff97bfe0bb54e8      0           [curl]           eth_type_trans netns=4026535338 mark=0x0 ifindex=13 proto=8 mtu=1500 len=74 10.0.1.45:54816->10.0.0.99:80(tcp)

eth_type_trans is the last kernel function called, but it shouldn't be seen as the sign of lifetime termination. Searching eth_type_trans in kernel source, there are bunches of invocation not relevant to egress xmit at all.

dev_hard_start_xmit can be a better candidate, but there are other functions called after it, it's not suitable either.

I don't know how to stop tracking an egress skb.

@jschwinger233
Copy link
Member Author

jschwinger233 commented Aug 29, 2023

Maybe only handle kfree_skbmem?

Below is pwru output for an egress skb on bare metal:

0xffff97bfe0bb42e8      0           [curl]         __dev_queue_xmit netns=4026531840 mark=0x0 ifindex=2 proto=8 mtu=1500 len=74 192.168.0.110:43026->20.205.243.166:80(tcp)
0xffff97bfe0bb42e8      0           [curl]       qdisc_pkt_len_init netns=4026531840 mark=0x0 ifindex=2 proto=8 mtu=1500 len=74 192.168.0.110:43026->20.205.243.166:80(tcp)
0xffff97bfe0bb42e8      0           [curl]      netdev_core_pick_tx netns=4026531840 mark=0x0 ifindex=2 proto=8 mtu=1500 len=74 192.168.0.110:43026->20.205.243.166:80(tcp)
0xffff97bfe0bb42e8      0           [curl]          sch_direct_xmit netns=4026531840 mark=0x0 ifindex=2 proto=8 mtu=1500 len=74 192.168.0.110:43026->20.205.243.166:80(tcp)
0xffff97bfe0bb42e8      0           [curl]   validate_xmit_skb_list netns=4026531840 mark=0x0 ifindex=2 proto=8 mtu=1500 len=74 192.168.0.110:43026->20.205.243.166:80(tcp)
0xffff97bfe0bb42e8      0           [curl]        validate_xmit_skb netns=4026531840 mark=0x0 ifindex=2 proto=8 mtu=1500 len=74 192.168.0.110:43026->20.205.243.166:80(tcp)
0xffff97bfe0bb42e8      0           [curl]       netif_skb_features netns=4026531840 mark=0x0 ifindex=2 proto=8 mtu=1500 len=74 192.168.0.110:43026->20.205.243.166:80(tcp)
0xffff97bfe0bb42e8      0           [curl]  passthru_features_check netns=4026531840 mark=0x0 ifindex=2 proto=8 mtu=1500 len=74 192.168.0.110:43026->20.205.243.166:80(tcp)
0xffff97bfe0bb42e8      0           [curl]     skb_network_protocol netns=4026531840 mark=0x0 ifindex=2 proto=8 mtu=1500 len=74 192.168.0.110:43026->20.205.243.166:80(tcp)
0xffff97bfe0bb42e8      0           [curl]  skb_csum_hwoffload_help netns=4026531840 mark=0x0 ifindex=2 proto=8 mtu=1500 len=74 192.168.0.110:43026->20.205.243.166:80(tcp)
0xffff97bfe0bb42e8      0           [curl]       validate_xmit_xfrm netns=4026531840 mark=0x0 ifindex=2 proto=8 mtu=1500 len=74 192.168.0.110:43026->20.205.243.166:80(tcp)
0xffff97bfe0bb42e8      0           [curl]      dev_hard_start_xmit netns=4026531840 mark=0x0 ifindex=2 proto=8 mtu=1500 len=74 192.168.0.110:43026->20.205.243.166:80(tcp)
0xffff97bfe0bb42e8      0           [curl] e1000_xmit_frame	[e1000e] netns=4026531840 mark=0x0 ifindex=2 proto=8 mtu=1500 len=74 192.168.0.110:43026->20.205.243.166:80(tcp)
0xffff97bfe0bb42e8      0           [curl]       e1000_tso	[e1000e] netns=4026531840 mark=0x0 ifindex=2 proto=8 mtu=1500 len=74 192.168.0.110:43026->20.205.243.166:80(tcp)
0xffff97bfe0bb42e8      0           [curl]    e1000_tx_map	[e1000e] netns=4026531840 mark=0x0 ifindex=2 proto=8 mtu=1500 len=74 192.168.0.110:43026->20.205.243.166:80(tcp)
0xffff97bfe0bb42e8      0           [curl]   skb_clone_tx_timestamp netns=4026531840 mark=0x0 ifindex=2 proto=8 mtu=1500 len=74 192.168.0.110:43026->20.205.243.166:80(tcp)
0xffff97bfe0bb42e8     11        [<empty>]      __dev_kfree_skb_any netns=4026531840 mark=0x0 ifindex=2 proto=8 mtu=1500 len=74 192.168.0.110:43026->20.205.243.166:80(tcp)
0xffff97bfe0bb42e8     11        [<empty>]              consume_skb netns=4026531840 mark=0x0 ifindex=2 proto=8 mtu=1500 len=74 192.168.0.110:43026->20.205.243.166:80(tcp)
0xffff97bfe0bb42e8     11        [<empty>]   skb_release_head_state netns=4026531840 mark=0x0 ifindex=2 proto=8 mtu=1500 len=74 192.168.0.110:43026->20.205.243.166:80(tcp)
0xffff97bfe0bb42e8     11        [<empty>]                tcp_wfree netns=4026531840 mark=0x0 ifindex=2 proto=8 mtu=1500 len=74 192.168.0.110:43026->20.205.243.166:80(tcp)
0xffff97bfe0bb42e8     11        [<empty>]         skb_release_data netns=4026531840 mark=0x0 ifindex=2 proto=8 mtu=1500 len=74 192.168.0.110:43026->20.205.243.166:80(tcp)
0xffff97bfe0bb42e8     11        [<empty>]             kfree_skbmem netns=4026531840 mark=0x0 ifindex=2 proto=8 mtu=1500 len=74 192.168.0.110:43026->20.205.243.166:80(tcp)

As we see, there is a kfree_skbmem for "real" emission.

The disadvantage of stopping tracking only at kfree_skbmem is, pwru --filter-track-skb --filter-netns will misfunction: when an skb goes away from the "wanted" netns, reaches a new netns we're not interested, it's expected to output nothing, but pwru continues displaying due to --filter-track-skb, the previously matched skb is still in the bpf map to skip the netns filter check.

A quick fix for that is to filter the events in userspace: we double check the skb's netns before printing them from golang process.

Edit:

nothing needs to be fixed.

--filter-track --filter-netns $ns_id '$pcap_filter' actually means tracked or (netns = $ns and $pcap-filter) instead of tracked and (netns = $ns and $pcap-filter), it is designed to output skbs from other netns when specifying --filter-track-skb with --filter-netns. So be it.

@brb
Copy link
Member

brb commented Aug 29, 2023

Maybe only handle kfree_skbmem?

SGTM!

@jschwinger233 jschwinger233 marked this pull request as ready for review August 29, 2023 17:13
`kfree_skbmem` marks the end of an skb's lifetime, so we stop
tracking it at that time when `--filter-track-skb` is enabled.

Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com>
Copy link
Member

@brb brb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@brb brb merged commit cc9c63f into main Aug 30, 2023
5 checks passed
@jschwinger233 jschwinger233 deleted the gray/skb-lifetime branch September 27, 2023 13:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants