Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dev-cmd/bottle: use system GNU tar if possible #18604

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

cho-m
Copy link
Member

@cho-m cho-m commented Oct 21, 2024

  • Have you followed the guidelines in our Contributing document?
  • Have you checked to ensure there aren't other open Pull Requests for the same change?
  • Have you added an explanation of what your changes do and why you'd like us to include them?
  • Have you written new tests for your changes? Here's an example.
  • Have you successfully run brew style with your changes locally?
  • Have you successfully run brew typecheck with your changes locally?
  • Have you successfully run brew tests with your changes locally?

I switched to non-reproducible args for building gnu-tar bottle in #18138 due to problems bottling gnu-tar with gnu-tar.

Looking into this again to restore reproducibility. Incomplete PR as back-to-back bottles are still producing a diff on Linux.


The main issue for gnu-tar is running with placeholders, particularly for the interpreter.

On Linux, there are 2 options:

  1. Use system GNU tar (changes in this PR). Potential limitations are if someone on old Linux tries bottling gnu-tar (unlikely) and maybe the version number difference.
  2. Directly run interpreter, e.g.
    #{HOMEBREW_PREFIX}/lib/ld.so \
      --library-path #{HOMEBREW_PREFIX}/lib:... \
      /path/to/tar ...
    

On macOS, gnu-tar has no dependencies and doesn't have interpreter handling like Linux so can just bottle gnu-tar with gnu-tar:

brew bottle --json --only-json-tab gnu-tar -v
/opt/homebrew/Library/Homebrew/vendor/bundle/ruby/3.3.0/bin/bundle clean
==> Determining gnu-tar bottle rebuild...
==> Bottling gnu-tar--1.35.arm64_sequoia.bottle.2.tar.gz...
==> Installing `gnu-tar` for bottling...
/opt/homebrew/bin/brew install --formula gnu-tar
Warning: gnu-tar 1.35 is already installed and up-to-date.
To reinstall 1.35, run:
  brew reinstall gnu-tar
/opt/homebrew/opt/gnu-tar/bin/gtar --create --numeric-owner --mtime=2023-07-18 02:55:23 --sort=name --owner=0 --group=0 --numeric-owner --format=pax --pax-option=globexthdr.name=/GlobalHead.%n,exthdr.name=%d/PaxHeaders/%f,delete=atime,delete=ctime --file /Users/cho-m/Desktop/gnu-tar--1.35.arm64_sequoia.bottle.2.tar gnu-tar/1.35
./gnu-tar--1.35.arm64_sequoia.bottle.2.tar.gz
  bottle do
    rebuild 2
    sha256 cellar: :any_skip_relocation, arm64_sequoia: "74c997076b93549ba3b8a5bd59d768a853b929d33317cc59ba7ec448824eff6f"
  end
Writing gnu-tar--1.35.arm64_sequoia.bottle.json

Copy link
Member

@MikeMcQuaid MikeMcQuaid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me 👍🏻

@Bo98
Copy link
Member

Bo98 commented Oct 21, 2024

I believe the reason for using Homebrew gnu-tar was because there was output differences in a recent version and the system tar being older caused issues. 71888db.

Makes sense to use system tar when bottling gnu-tar itself though.

In the longer term, a Ruby implementation (e.g. minitar) is something on my "to investigate" list.

@MikeMcQuaid
Copy link
Member

Makes sense to use system tar when bottling gnu-tar itself though.

Agreed 👍🏻

@cho-m
Copy link
Member Author

cho-m commented Oct 21, 2024

Makes sense to use system tar when bottling gnu-tar itself though.

Sounds reasonable. I was seeing a diff when trying to bottle meson.


In the longer term, a Ruby implementation (e.g. minitar) is something on my "to investigate" list.

GNU tar does make it easy to potentially change the compression format. Although it wasn't something worked on (#13621), there may be value in allowing different compression formats per-formula

Mainly useful if noticeable improvement and if dependency is already met.

For example, I was able to use zstd to create llvm bottle in similar compression time as gzip that had a >10% size reduction and decompression is 1.4x faster (though llvms total install time is impacted by relocation overhead).

Summary
  tar -xf llvm--19.1.2.arm64_sequoia.bottle.1.tar.zst ran
    1.42 ± 0.21 times faster than tar -xf llvm--19.1.2.arm64_sequoia.bottle.tar.gz

577M llvm--19.1.2.arm64_sequoia.bottle.1.tar.lz4
289M llvm--19.1.2.arm64_sequoia.bottle.1.tar.xz
430M llvm--19.1.2.arm64_sequoia.bottle.1.tar.zst
490M llvm--19.1.2.arm64_sequoia.bottle.tar.gz
brew reinstall --quiet --display-times  ./llvm--19.1.2.arm64_sequoia.bottle.1.tar.zst
...
==> Summary
🍺  /opt/homebrew/Cellar/llvm/19.1.2: 8,041 files, 1.9GB
==> Running `brew cleanup llvm`...
Disable this behaviour by setting HOMEBREW_NO_INSTALL_CLEANUP.
Hide these hints with HOMEBREW_NO_ENV_HINTS (see `man brew`).
==> Installation times
llvm                      6.580 sbrew reinstall --quiet --display-times  llvm
...
🍺  /opt/homebrew/Cellar/llvm/19.1.2: 8,041 files, 1.9GB
==> Running `brew cleanup llvm`...
Disable this behaviour by setting HOMEBREW_NO_INSTALL_CLEANUP.
Hide these hints with HOMEBREW_NO_ENV_HINTS (see `man brew`).
==> Installation times
llvm                      7.522 s

@MikeMcQuaid
Copy link
Member

GNU tar does make it easy to potentially change the compression format. Although it wasn't something worked on (#13621), there may be value in allowing different compression formats per-formula

Mainly useful if noticeable improvement and if dependency is already met.

Would only want to do this if we can do so for all supported macOS versions (and typical Ubuntu runners) with no extra Homebrew or system dependencies.

@Bo98
Copy link
Member

Bo98 commented Oct 21, 2024

The compression algorithm is really a completely different thing. tar handles it for ease of use but under the hood it doesn't actually implement any compression code and simply defers to another command so I wouldn't really conflate our choice of tar for future compression support. tar does save having to remember what those external commands are but you still need to e.g. handle making sure they're installed, handle the different media types in the OCI manifest, etc.

Using Ruby Gzip has also been a good improvement as we don't need to worry about the system anymore (particularly GNU Gzip's different implementation) so that does set a fairly high quality bar, whereas meanwhile we've had to adjust this tar code repeatedly.

@cho-m
Copy link
Member Author

cho-m commented Oct 21, 2024

Would only want to do this if we can do so for all supported macOS versions (and typical Ubuntu runners) with no extra Homebrew or system dependencies.

Probably won't be possible for all formulae unless Apple decides to ship a copy.

If only handling a subset of formulae, then possible if zstd is in dependency tree already, e.g. anything that depends on libtiff.

For performance, we may need to look into relocation handling first.


Using Ruby Gzip has also been a good improvement as we don't need to worry about the system anymore (particularly GNU Gzip's different implementation) so that does set a fairly high quality bar, whereas meanwhile we've had to adjust this tar code repeatedly.

Letting tar handle both phases does allow some small optimizations for intermediary files (EDIT: maybe Ruby tar could also do this). But avoiding implementation details (particularly dealing with oldest macOS/Linux support) does help.

On side note, using equivalent of gzip --no-name would probably have been ideal for reproducible builds (I think this is default for other compression like zstd, xz, etc) but probably too late for that since our existing bottles have name/timestamp.

@MikeMcQuaid
Copy link
Member

Probably won't be possible for all formulae unless Apple decides to ship a copy.

Yes, I consider this the requirement here.

@Bo98
Copy link
Member

Bo98 commented Oct 22, 2024

One problem with zstd is it is not very reproducible. It changes between versions and, in certain cases, between CPUs.

I know we've broken reproducibility a lot in the last year, though I think it's still something we aim to keep fairly stable.

On macOS, we do have options like LZFSE but I suspect it's not too big of an impact.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants