Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

formula: add make_deduplication_links_in #18478

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from
Draft
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 42 additions & 0 deletions Library/Homebrew/formula.rb
Original file line number Diff line number Diff line change
Expand Up @@ -1939,6 +1939,48 @@
end
end

# Replace duplicate files with links to reduce disk space.
#
# FIXME: Hardlinks are not fully supported so using `hardlink: true` will only
# reduce the bottle size or source build but will be duplicated on bottle pour.
Comment on lines +1944 to +1945
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What makes them not fully supported, out of interest?

Copy link
Member Author

@cho-m cho-m Oct 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My old incomplete PR #13154 is related as BSD cp duplicates hardlinks. Trying to switch this brings its own set of problems due to bugs/limitations in other commands.

An example of bottle pour behavior is trino which has a bottle/estimated-unpack size of <1GB but will pour to >2GB.

There also needs to be special handling when crossing filesystem boundaries (the most extreme case being someone running on a USB formatted as (ex)FAT which doesn't support hardlinks). Off the top of my head, some that can handle this are rsync -H and GNU cp --preserve=links

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for context!

I think an opt-in approach like this makes sense. I think ideally handling this with automatic hardlinks in future (given they are supported in default filesystems on both macOS and Linux) and accepting that you'll get duplication for non-hard links seems ideal.

#
# ### Example
#
# ```ruby
# make_deduplication_links_in libexec, extension: "jar"
# ```
sig { params(dir: Pathname, extension: T.nilable(String), allow_noop: T::Boolean, hardlink: T::Boolean).void }
def make_deduplication_links_in(dir, extension: nil, allow_noop: false, hardlink: false)
raise ArgumentError, "#{dir} is not a valid directory!" if !dir.directory? || dir.symlink?
raise ArgumentError, "#{dir} must be within #{prefix}!" unless dir.realpath.to_s.start_with?(prefix.realpath)

# Use Pathname.new to avoid caching information during build
odebug "Pre-deduplication disk usage of #{dir}: #{disk_usage_readable(Pathname.new(dir).disk_usage)}"

Check warning on line 1958 in Library/Homebrew/formula.rb

View check run for this annotation

Codecov / codecov/patch

Library/Homebrew/formula.rb#L1958

Added line #L1958 was not covered by tests

pattern = "**/*"

Check warning on line 1960 in Library/Homebrew/formula.rb

View check run for this annotation

Codecov / codecov/patch

Library/Homebrew/formula.rb#L1960

Added line #L1960 was not covered by tests
pattern += ".#{extension}" if extension
base_files = {}

Check warning on line 1962 in Library/Homebrew/formula.rb

View check run for this annotation

Codecov / codecov/patch

Library/Homebrew/formula.rb#L1962

Added line #L1962 was not covered by tests

nlinks = dir.realpath.glob(pattern).count do |path|

Check warning on line 1964 in Library/Homebrew/formula.rb

View check run for this annotation

Codecov / codecov/patch

Library/Homebrew/formula.rb#L1964

Added line #L1964 was not covered by tests
next false if !path.file? || path.symlink?

base_file = base_files[path.basename.to_s] ||= path

Check warning on line 1967 in Library/Homebrew/formula.rb

View check run for this annotation

Codecov / codecov/patch

Library/Homebrew/formula.rb#L1967

Added line #L1967 was not covered by tests
next false if base_file == path || !compare_file(base_file, path)

rm(path)
if hardlink

Check warning on line 1971 in Library/Homebrew/formula.rb

View check run for this annotation

Codecov / codecov/patch

Library/Homebrew/formula.rb#L1970-L1971

Added lines #L1970 - L1971 were not covered by tests
path.make_link base_file
else
path.parent.install_symlink base_file
end
true

Check warning on line 1976 in Library/Homebrew/formula.rb

View check run for this annotation

Codecov / codecov/patch

Library/Homebrew/formula.rb#L1976

Added line #L1976 was not covered by tests
end

odebug "Post-deduplication disk usage of #{dir}: #{disk_usage_readable(Pathname.new(dir).disk_usage)}"

Check warning on line 1979 in Library/Homebrew/formula.rb

View check run for this annotation

Codecov / codecov/patch

Library/Homebrew/formula.rb#L1979

Added line #L1979 was not covered by tests
odebug "#{nlinks} #{Utils.pluralize("#{hardlink ? "hard" : "sym"}link", nlinks)} created"
raise "No links were created!" if !allow_noop && nlinks.zero?
end

# Replaces a universal binary with its native slice.
#
# If called with no parameters, does this with all compatible
Expand Down
Loading