Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorporate Marinara tool in builds of distroless Mariner images #4771

Closed
mthalman opened this issue Jul 20, 2023 · 15 comments
Closed

Incorporate Marinara tool in builds of distroless Mariner images #4771

mthalman opened this issue Jul 20, 2023 · 15 comments
Assignees

Comments

@mthalman
Copy link
Member

We should explore the possibility of using Marinara for the production of our distroless Mariner images. This tool is intended to simplify the installation of additional packages on top of the base distroless Mariner images.

As can be seen from one of our Dockerfiles, the logic necessary to do this today is a bit verbose. It requires the installation of packages in an installroot location, updating of the manifest file, and clean-up. This can (or should) be handled by the Marinara tool which would greatly simplify our Dockerfiles.

@lbussell
Copy link
Contributor

This should be explored before the GA of Azure Linux 3.0, or else we will lose the opportunity to implement any improvements in our build process for .NET 8. .NET 9 can accept changes to these images up until RC.1 but we should explore these changes as early as possible.

@lbussell
Copy link
Contributor

lbussell commented Jul 1, 2024

Here's a quick summary of how Marinara works:

Given the following info:

Build Argument Name Possible Values
AZL_VERSION 2.0, 3.0
NAMESPACE cbl-mariner, azurelinux
PACKAGES_TO_INSTALL* Azure Linux packages separated by space surrounded by double quotes. For example: "pkg1 pkg2 pkg3" (these would be our .NET dependencies)
PACKAGES_TO_HOLDBACK** Azure Linux packages separated by space surrounded by double quotes. For example: "pkg1 pkg2 pkg3"
IMAGE_TYPE minimal, minimal-nonroot, minimal-debug, minimal-debug-nonroot, base, base-nonroot, base-debug, base-debug-nonroot, custom, custom-nonroot, custom-debug, custom-debug-nonroot (we would probably use one of the nonroot variants, minimal or custom)
USER*** nonroot or any username for nonroot user (we would use app)
USER_UID*** numeric value for nonroot user other than 0 (we would use 1654)
  1. Obtain the dockerfile-new-image Dockerfile from the marinara repo.
  2. docker pull azurelinuxpreview.azurecr.io/public/azurelinux/marinara:3.0 && docker tag azurelinuxpreview.azurecr.io/public/azurelinux/marinara:3.0 mcr.microsoft.com/azurelinux/marinara:3.0
  3. Run the command:
    docker build . -t distroless/minimal:3.0 -f dockerfiles/dockerfile-new-image --build-arg AZL_VERSION=3.0 --build-arg NAMESPACE="azurelinux" --build-arg IMAGE_TYPE="minimal" ... plus the rest of your args according to the info above

ImageBuilder does not support building anything like this today. It would be non-trivial to implement this.

If we were to go ahead and implement this, I imagine it'd be something like:

  1. Replace the AzureLinux 3.0 runtime-deps Dockerfiles with some json files describing the data above.
  2. Let ImageBuilder read those json files to parse the arguments to the Marinara Dockerfile.

We would also need to figure out:

  1. How to acquire the Marinara image and Dockerfile. How to pin the versions of both/each of those. How do we link those versions together if there are breaking changes?
  2. How to represent this change in the manifest.
  3. Probably a bunch more stuff.

@mthalman
Copy link
Member Author

mthalman commented Jul 1, 2024

Rather than integrating this directly into Image Builder, I see this more as the runtime-deps Dockerfile looking a lot like the content in https://github.com/microsoft/marinara/blob/main/dockerfiles/dockerfile-new-image. Then Image Builder just builds the Dockerfile like it normally would and that Dockerfile would run the Marinara tool.

@lbussell
Copy link
Contributor

lbussell commented Jul 1, 2024

Rather than integrating this directly into Image Builder, I see this more as the runtime-deps Dockerfile looking a lot like the content in https://github.com/microsoft/marinara/blob/main/dockerfiles/dockerfile-new-image.

That's a better idea. I'll put together a prototype of that.

@lbussell lbussell self-assigned this Jul 2, 2024
@lbussell
Copy link
Contributor

lbussell commented Jul 3, 2024

Here's a prototype runtime-deps Dockerfile using Marinara (intended to replace this Dockerfile):

FROM azurelinuxpreview.azurecr.io/public/azurelinux/marinara:3.0 AS builder

RUN marinaracreate.py \
    --image-type "minimal-nonroot" \
    --azure-linux-version "3.0" \
    --location "/staging" \
    --add-packages "prebuilt-ca-certificates glibc libgcc libstdc++ openssl-libs zlib" \
    --packages-to-holdback "" \
    --user "app" \
    --user-uid "1654" \
    --user-gid "1654"

# .NET runtime-deps image
FROM scratch

ENV \
    # UID of the non-root user 'app'
    APP_UID=1654 \
    # Configure web servers to bind to port 8080 when present
    ASPNETCORE_HTTP_PORTS=8080 \
    # Enable detection of running in a container
    DOTNET_RUNNING_IN_CONTAINER=true \
    # Set the invariant mode since ICU package isn't included (see https://github.com/dotnet/announcements/issues/20)
    DOTNET_SYSTEM_GLOBALIZATION_INVARIANT=true

COPY --from=builder /staging/ /

# Workaround for https://github.com/moby/moby/issues/38710
COPY --from=builder --chown=1654:1654 /staging/home/ /home/

USER app

This creates a container image that's functionally equivalent to our current runtime-deps Dockerfile, but with 100% space efficiency (no files are duplicated in the layer FS). Our current image, mcr.microsoft.com/dotnet/nightly/runtime-deps:8.0-azurelinux3.0-distroless, has around 90% space efficiency. The effect of this is an on-disk savings of over 2MB.

I went ahead and built the rest of the .NET image chain (Runtime and ASP.NET Core), and did a quick sanity check on the new image. .NET is still able to run just fine:

PS> docker run -it --entrypoint /usr/bin/dotnet --rm mcr.microsoft.com/dotnet/nightly/aspnet:8.0-azurelinux3.0-distroless --list-runtimes

Microsoft.AspNetCore.App 8.0.6 [/usr/share/dotnet/shared/Microsoft.AspNetCore.App]
Microsoft.NETCore.App 8.0.6 [/usr/share/dotnet/shared/Microsoft.NETCore.App]

@lbussell
Copy link
Contributor

lbussell commented Jul 3, 2024

And here's an example of using Marinara to modify that existing ASP.NET Core image that I created earlier:

FROM mcr.microsoft.com/dotnet/nightly/aspnet:8.0-azurelinux3.0-distroless AS base

FROM azurelinuxpreview.azurecr.io/public/azurelinux/marinara:3.0 AS builder

# Copy the manifest files from the distroless image to the installer image.
# In Azure Linux distroless containers, there are two files at this location:
#   (1) container-manifest-1
#   (2) container-manifest-2
COPY --from=base /var/lib/rpmmanifest/ /tmp/rpmmanifest/

RUN marinaraextend.py \
    --azure-linux-version "3.0" \
    --location "/staging" \
    --add-packages "ca-certificates icu tzdata" \
    --packages-to-holdback "" \
    --existing-manifest-location "/tmp/rpmmanifest" \
    --new-manifest-location "/var/lib/rpmmanifest" \
    --user "app" \
    --user-uid "1654" \
    --user-gid "1654"

FROM base as final

COPY --from=builder /staging/ /

# Workaround for https://github.com/moby/moby/issues/38710
COPY --from=builder --chown=1654:1654 /staging/home/ /home/

# Optional additional layer squash - need to redefine ENVs?
# FROM scratch
# COPY --from=final / /
# COPY --from=final --chown=1654:1654 /home/ /home/

USER $APP_UID

Here I used Marinara to add ca-certificates, icu, and tzdata to the image.

Unfortunately, it doesn't seem like Marinara has any magical logic for reducing layer FS diffs in the staging output. So when copying directly from the builder layer to the base layer, we end up with about 20 MB of overlap. The recommendation from the official marinara extension Dockerfile is to squash the final image to a single layer using a COPY instruction. Unfortunately this has two downsides - we lose the layer-sharing with other .NET images, and we also lose the .NET images' environment variables that are set across all of the runtime-deps, runtime, and aspnet images.

If I were writing documentation for users, I'd prefer not to squash the base layer. We have other guidance (internal only, would be published as part of #5031) for reducing diffs when copying files from staging to the base image. We should also file an issue on the Marinara tool for this.

@lbussell
Copy link
Contributor

lbussell commented Jul 3, 2024

Also, I stuck pretty close to the official image for that Dockerfile. It's not clear if or how /var/lib/rpmmanifest is making it back into the new image, or if a totally new one is just generated in the /staging/ dir.

@mthalman
Copy link
Member Author

mthalman commented Jul 8, 2024

Unfortunately, it doesn't seem like Marinara has any magical logic for reducing layer FS diffs in the staging output. So when copying directly from the builder layer to the base layer, we end up with about 20 MB of overlap. The recommendation from the official marinara extension Dockerfile is to squash the final image to a single layer using a COPY instruction. Unfortunately this has two downsides - we lose the layer-sharing with other .NET images, and we also lose the .NET images' environment variables that are set across all of the runtime-deps, runtime, and aspnet images.

That seems like a blocker to me.

If I were writing documentation for users, I'd prefer not to squash the base layer. We have other guidance (internal only, would be published as part of #5031) for reducing diffs when copying files from staging to the base image. We should also file an issue on the Marinara tool for this.

I agree. I think this should be handled by Marinara.

@lbussell
Copy link
Contributor

lbussell commented Jul 8, 2024

That seems like a blocker to me.

To be clear, we would still get a size and Dockerfile complexity win today by using Marinara to build the runtime-deps images. Regardless of whether we recommend users use Marinara directly to add packages to distroless images.

@mthalman
Copy link
Member Author

mthalman commented Jul 8, 2024

That seems like a blocker to me.

To be clear, we would still get a size and Dockerfile complexity win today by using Marinara to build the runtime-deps images. Regardless of whether we recommend users use Marinara directly to add packages to distroless images.

Ah, right.

@lbussell
Copy link
Contributor

lbussell commented Jul 8, 2024

[Triage] There are a few more things to figure out before we can go ahead with this Marinara based implementation for AzLinux 3.0.

  1. We should find out exactly where the 2MB size improvement comes from in the runtime-deps Dockerfile.
  2. We should get a clear support statement from the Marinara team.
  3. Is there any downside to not using Marinara for subsequent layers based off of this Marinara-generated runtime-deps image? For example, if we decide to produce distroless SDK images, will that prove to be an issue?

@lbussell
Copy link
Contributor

lbussell commented Jul 8, 2024

Here's a summary of the current wasted space in mcr.microsoft.com/dotnet/nightly/runtime-deps:8.0-azurelinux3.0-distroless, courtesy of dive. More investigation needs to be done on why all the certs seem to have been overwritten.

PS> .\dive.exe mcr.microsoft.com/dotnet/nightly/runtime-deps:8.0-azurelinux3.0-distroless
  Using default CI config
Image Source: docker://mcr.microsoft.com/dotnet/nightly/runtime-deps:8.0-azurelinux3.0-distroless
Fetching image... (this can take a while for large images)
Analyzing image...
  efficiency: 90.8115 %
  wastedBytes: 4903817 bytes (4.9 MB)
  userWastedPercent: 21.5194 %
Inefficient Files:
Count  Wasted Space  File Path
    2        843 kB  /etc/pki/ca-trust/extracted/openssl/ca-bundle.trust.crt
    2        803 kB  /etc/pki/tls/cert.pem
    2        803 kB  /etc/pki/ca-trust/extracted/pem/tls-ca-bundle.pem
    2        627 kB  /etc/pki/ca-trust/extracted/pem/email-ca-bundle.pem
    2        586 kB  /etc/pki/java/cacerts
    2        586 kB  /etc/pki/ca-trust/extracted/java/cacerts
    2        584 kB  /etc/pki/ca-trust/extracted/edk2/cacerts.bin
    2         57 kB  /usr/lib/sysimage/tdnf/history.db
    2        4.1 kB  /etc/profile
    2        3.1 kB  /etc/profile.d/proxy.sh
    2        2.0 kB  /etc/inputrc
    2        1.9 kB  /var/lib/rpmmanifest/container-manifest-2
    2        1.5 kB  /etc/sysconfig/proxy
    2         945 B  /etc/passwd
    2         620 B  /etc/group
    2         544 B  /etc/sysconfig/console
    2         440 B  /etc/modprobe.d/usb.conf
    2         378 B  /etc/sysconfig/clock
    2         234 B  /etc/hosts
    2          18 B  /etc/host.conf

@lbussell
Copy link
Contributor

Related: microsoft/marinara#8

@lbussell
Copy link
Contributor

lbussell commented Jul 22, 2024

[Triage] We should be good to go ahead and implement this for Azure Linux 3.0 images. We should make sure to pin the Marinara tool to a specific version/commit.

@lbussell
Copy link
Contributor

I completed a more in-depth investigation into using Marinara for our builds and concluded that it provided no size benefit over simply squashing the final layers in our existing Azure Linux runtime deps images. That has its own pros and cons. Instead of investing in switching the build to Marinara, we should focus on reducing image size by eliminating any data that's duplicated across layers. This duplicated data, from the ca-certificates package, is highlighted in the linked investigation.

Azure Linux images have been released without Marinara, so I am closing this issue as not planned.

@lbussell lbussell closed this as not planned Won't fix, can't repro, duplicate, stale Aug 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

No branches or pull requests

2 participants