Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tcpdump for Everyone: Changes to cloud-controller for the proposed pcap-release #3193

Open
a18e opened this issue Feb 17, 2023 · 0 comments
Open

Comments

@a18e
Copy link

a18e commented Feb 17, 2023

Recently we proposed pcap-release as an easy way for CF application developers and landscape operators to capture network traffic for their apps and/or their BOSH VMs. See issue cloudfoundry/cf-deployment#980 for a more detailed description of pcap-release.

For the use case of capturing traffic from CF apps, we would need to implement some features in the cloud-controller and would like to get your feedback on our proposed solution.

The following diagram shows how we're planning to capture app network traffic via the pcap-agent on the app-container, which is then sent via the pcap-api to the cf-CLI on the client machine:

single_instance_stream_to_client_pcapagent_on_container

Our proposed solution would work similarly to the cf app-ssh process:

  • cf-CLI plugin that implements commands to enable and perform tcpdumps on specific apps/app instances, with a possibility to pass on a packet filter as a parameter (e.g. for a specific source address) (see app-ssh commands)
  • pcap-api (analogous to ssh-proxy for app-ssh) acts as endpoint for cf-CLI and passes the requests on to the pcap-agent on the app-containers. pcap-api is also responsible for user authentication.
  • pcap-agent (analogous to diego-sshd for app-ssh) runs on the container and acts as a wrapper to libpcap to capture network traffic

The only difference to app-ssh in regards to the cloud-controller implementation is that pcap-agent requires root permissions on the container to be able to access network traffic data. diego-sshd runs as user vcap.

We have already successfully executed a spike/PoC where we modified cloud-controller and diego-release on one of our dev-landscapes to globally enable pcap-agent/run the agent on every app-container in the landscape:

  • The pcap-agent binary is built and packaged into the buildpack_app_lifecycle by diego-release (alongside diego-sshd), which is then extracted on every app-container
  • We created a new action and port mappings for pcap-agent in the cloud-controller

(More details on the changes to diego-release here: cloudfoundry/diego-release#703)

With these modifications we were able to capture a tcpdump on an app-container via the pcap-agent from any landscape-internal VM.

The biggest limitation of this spike (in regards to cloud-controller) was that we didn't implement an "app-feature-flag" similarly to allow_ssh in the CC.

Before we move further, we would like to get your feedback, especially for the following questions:

  • Do you see any roadblocks or complexities we might have missed?
  • Is the container-root permission for pcap-agent acceptable or is it an issue?
  • While app-ssh permissions can be granted on both space and app-level, we were considering having only permissions on app-level for simplicity. Do you think this would be enough to satisfy legal requirements like GDPR?
  • Do we need/Do you see value in a global cf feature flag for pcap-release?
    • Is it even possible to switch on during runtime if the pcap-api needs to be deployed?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants