Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Includeresource duplicate strategy to append to existing file when unrolling jar #6326

Open
wants to merge 17 commits into
base: master
Choose a base branch
from

Conversation

chrisrueger
Copy link
Contributor

@chrisrueger chrisrueger commented Oct 13, 2024

Closes #6325

Adds a duplicate strategy to -includeresource (similar to :flatten or :rename)

  • onduplicate:=MERGE
  • onduplicate:='MERGE,sometag'
  • onduplicate:='MERGE,metainfservices'
  • onduplicate:=SKIP
  • onduplicate:=WARN
  • onduplicate:=ERROR

e.g.

@${repo;org.apache.xmlgraphics:fop-core;latest}!/META-INF/services/*
@${repo;org.apache.xmlgraphics:xmlgraphics-commons;latest}!/META-INF/services/*;onduplicate:=MERGE

The MERGE strategy:

  • it allows an additional list of tags to filter for specific plugins (MergeFile interface)
  • currently we only have one Plugin called MetaInfServiceMerger.java which can handle the META-INF/services path and append duplicate service files with a line-break
  • the tags may be useful in the future to use other plugins which can handle other files / paths
  • using tags on another strategy other than MERGE has no effect.

Another example using WARN

@${repo;org.apache.xmlgraphics:fop-core;latest}!/META-INF/services/*
@${repo;org.apache.xmlgraphics:xmlgraphics-commons;latest}!/META-INF/services/*;onduplicate:=WARN

image

@pkriens This is the latest based on the discussion today (2024/10/29)

NOTHING,
OVERWRITE,
APPEND

Signed-off-by: Christoph Rueger <chrisrueger@gmail.com>

use SequenceInputStream to concatenate...

resources to avoid intermediate Strings

Signed-off-by: Christoph Rueger <chrisrueger@gmail.com>

handle duplicates for :literal

useful for testcases
Signed-off-by: Christoph Rueger <chrisrueger@gmail.com>
Signed-off-by: Christoph Rueger <chrisrueger@gmail.com>
@chrisrueger chrisrueger force-pushed the includeresource-duplicate-strategy branch from 20a2822 to 7c6bb03 Compare October 14, 2024 04:25
@chrisrueger
Copy link
Contributor Author

@pkriens one thing I am thinking about and would like to discuss is, wether or not we shoud add a way to enforce a line-break between the appended files.
In my specific example

@${repo;org.apache.xmlgraphics:fop-core;latest}!/META-INF/services/*
@${repo;org.apache.xmlgraphics:xmlgraphics-commons;latest}!/META-INF/services/*;:duplicates:=APPEND

a line break is not required, since the first file ends with one.
But in case it is not, a line break would be required so that the content is correctly appended and does not result in an invalid file.

One option would be another strategy in addition to APPEND.

  • APPEND_ENFORCE_LINEBREAK (maybe shorter, but just to get my point across). This would add a linebreak before the duplicate file is appended (maybe if there is none already??)
  • APPEND (current append-logic... just append... nothing in between)

@pkriens
Copy link
Member

pkriens commented Oct 14, 2024

You are becoming the Elon Musk of bnd :-) Nice productivity.

Some comments.

  • This not a function of JAR. It is the include resource code in Builder that has the policies. It can access the properties and the plugins. Jar is just a holder, it should do what it is told. It should not be changed for this.
  • I think the actual merging should go to a plugin. This plugin should get a name and two Resource objects and return a single one or null if it can't merge. We can then make a plugin that can merge files in the services directory and just append with an extra line between because it knows the format. This can be a basic plugin that is always present in a Builder.
    package aQute.bnd.service.merge;

    public interface MergeFiles {
         Optional<Resource> merge( String path, Resource a, Resource b);
   }
  • I think you should be able to specify the duplicate strategy with a dup_overwrite:, dup_merge: (default), dup_error:, dup_warning:, dup_skip:. The value of these directives is a list of globs on the paths in the resource. Priority is probably merge (if plugin exists), overwrite, skip. Error/Warning should always be executed even if it matches the other ones.
-includeresource @jar1.jar, @jar2.jar;dup_overwrite:=*, @jar3.jar;dup_skip:="META-INF/services/*,META-INF/MANIFEST.MF"

@chrisrueger
Copy link
Contributor Author

chrisrueger commented Oct 14, 2024

Your approach sounds good and more flexible.

  • This not a function of JAR. It is the include resource code in Builder that has the policies. It can access the properties and the plugins. Jar is just a holder, it should do what it is told. It should not be changed for this.

I recognize a pattern here. I am always one level too deep 🤣

  • Jar is just a holder, it should do what it is told. It should not be changed for this.

What about the existing method with the boolean overwrite parameter and duplicate detection?
public boolean putResource(String path, Resource resource, boolean overwrite)

This was the main reason why I have put it here. But since it is public, we have to keep it anyway.

  • I think the actual merging should go to a plugin.

Are you talking plugin as in
public class MetaInfServiceParser implements AnalyzerPlugin

?

Ok I will try to digest your ideas and see what I can do.

@pkriens
Copy link
Member

pkriens commented Oct 14, 2024

I recognize a pattern here. I am always one level too deep 🤣

No, sometimes multiple levels 😎

When I was younger you had "structured design", DeMarco, Michael Jackson, Yourdan, etc. They were talking about coupling and cohesion. Did not get the cohesion until a few years ago but I think it is paramount. I try to write software based on reusable components. So each component must do one thing and one thing only otherwise reuse becomes hard. So I now strife to create classes and methods that do one thing. Very useful in there is not not do control (if/then/switch/case) and actual work in the same method.

Anyway, in bnd it is relatively straightforward. Whenever there is choice, it should be in a processor. Things like JAR, Resource, Parameters, etc. are reusable and should just do their work and not take decisions.

@chrisrueger
Copy link
Contributor Author

chrisrueger commented Oct 15, 2024

public interface MergeFiles {
Optional merge( String path, Resource a, Resource b);
}

Few questions @pkriens :

  1. What intention has the String path variable? Should the plugin check the path if it can merge with it? (e.g. path.startsWith('META-INF/services').

  2. How many plugins do you envision? Should there be multiple mergePlugins (similar to Analyzer Plugins, which are put in a list and processed all together (as in doPlugins(...)) e.g. one MetainfServicesMergeFiles.java and one MetainfServicesMergeFiles.java ?
    Or more like one "FileMerger" which checks the path to apply different merging mechanisms based on the path?

I ask because I have things setup so far (not pushed), but not sure how simple or complex the "plugin"-mechanism should be.

to handle -includeresource expressions like this, as suggested by @pkriens :

-includeresource @jar1.jar, @jar2.jar;dup_overwrite:=*, @jar3.jar;dup_skip:="META-INF/services/*,META-INF/MANIFEST.MF"

The value of these directives is a list of globs on the paths in the resource. Priority is probably merge (if plugin exists), overwrite, skip. Error/Warning should always be executed even if it matches the other ones.

Signed-off-by: Christoph Rueger <chrisrueger@gmail.com>

rework to MergeFiles plugin
@chrisrueger chrisrueger force-pushed the includeresource-duplicate-strategy branch from 226d669 to 5df7631 Compare October 15, 2024 19:19
should avoid re-creating multiple Globs (Pattern.compile()) for each file. Should speed up things e.g. for unrolling jars with lots of files

Signed-off-by: Christoph Rueger <chrisrueger@gmail.com>
@chrisrueger
Copy link
Contributor Author

  • I think you should be able to specify the duplicate strategy with a dup_overwrite:, dup_merge: (default), dup_error:, dup_warning:, dup_skip:. The value of these directives is a list of globs on the paths in the resource. Priority is probably merge (if plugin exists), overwrite, skip. Error/Warning should always be executed even if it matches the other ones.
-includeresource @jar1.jar, @jar2.jar;dup_overwrite:=*, @jar3.jar;dup_skip:="META-INF/services/*,META-INF/MANIFEST.MF"

@pkriens I pushed the first prototype of this approach.

Example: dup_merge AND dup_warning combined:

@${repo;org.apache.xmlgraphics:fop-core;latest}!/META-INF/services/*,\
@${repo;org.apache.xmlgraphics:xmlgraphics-commons;latest}!/META-INF/services/*;dup_warning:=*;dup_merge:=*,\

Warnings are displayed (TODO for me: the message is not correct. the word "overwritten" should not appear when same file is also affected by dup_merge):

image

dup_merge in effect: line break added in META-INF/services files

image

but other files outside META-INF/services will be just appended (without line break).

This part is what I would like to discuss: I have created two MergeFiles plugins and iterate over them via

return proc.getPlugins(MergeFiles.class)
					.stream()
					.map(mf -> mf.merge(path, existing, resource))
					.filter(Optional::isPresent)
					.findFirst()
					.orElse(Optional.of(resource));

But that does not feel right, because the order of the plugins is important.
Maybe I misunderstood you but in the beginning two fixed plugins might be better and more explicit, e.g.:

return Stream.of(metaInfServiceMerger, defaultResourceMerger)
					.map(mf -> mf.merge(path, existing, resource))
					.filter(Optional::isPresent)
					.findFirst()
					.orElse(Optional.of(resource));
  • metaInfServiceMerger can merge with additional line break IF path.startsWith(METAINF/services)
  • defaultResourceMerger just appends without anything in between and without any checking of the path.
  • if metaInfServiceMerger can merge, then this result is used. Otherwise defaultResourceMerger is used

- instead of iterating of plugins we just check directly which of the two plugins to use, whether we have a META-INF/services file or not (we can get more dynamic in the future if we want, since all is still private)
- and add testcases

add test for mixed files in META-INF

to test that META-INF/services files are merged differently than files in META-INF although they are captured by the same dub_merge glob

Signed-off-by: Christoph Rueger <chrisrueger@gmail.com>
Signed-off-by: Christoph Rueger <chrisrueger@gmail.com>
@chrisrueger chrisrueger marked this pull request as ready for review October 23, 2024 13:30
Signed-off-by: Christoph Rueger <chrisrueger@gmail.com>
@chrisrueger chrisrueger marked this pull request as draft October 29, 2024 14:30
@chrisrueger
Copy link
Contributor Author

Meeting notes

From meeting with @pkriens today.
We decided to rollback partially to a simpler model.

These are just loose notes so I don't forget. I will cleanup later.

Duplicate-Strategy:

  • WARN

  • ERROR

  • SKIP

  • MERGE

  • OVERRIDE (default)

  • warning by default if something is overridden (mention the options in the warning)

  • MergeFiles plugin should support tags

    • (merge plugin specifies multi tags) - ignore order.
    • handle * (all tags)

Example:
Plugin
com.foo.MyMerge; tags="metainfservices"

duplicate:="MERGE,metainfservices"

duplicate:="MERGE" - means all merge plugins will be called
duplicate:="MERGE,services" - means MERGE but only plugins with tag 'services'


plugin = tags | enum
tags = set of tag
enum ::= WARN, ERROR, MERGE
duplicate: ::= (glob | enum)*
plugin
com.foo.MyMerge; tags="services"
duplicate:="MERGE,services"
duplicate:="services"
duplicate:=MERGE

TODO add more testcases for tagging
Signed-off-by: Christoph Rueger <chrisrueger@gmail.com>
Signed-off-by: Christoph Rueger <chrisrueger@gmail.com>
- now we do a warning if duplicate files overwrite eachother and no onduplicate directive is giving. the warning now mentions the directive

- finetuned the parsing of the directive which can contain and enum n tags

Signed-off-by: Christoph Rueger <chrisrueger@gmail.com>
@chrisrueger
Copy link
Contributor Author

chrisrueger commented Oct 29, 2024

@pkriens I think I implemented what we have talked about today. I updated the PR description above

One thing though I noticed during the bnd build locally is this output of the new warning we added when duplicate files overwrite eachother and no duplicate strategy is defined.

I am not sure this output below is intended. Nor do I know exactly where it is coming from.
Just mentioning this because it could mean:

a) we haven't thought about all side effects of adding a warning (and may remove it again)
or
b) this output is ok

> Task :biz.aQute.bnd:jar
warning: Duplicate file: LICENSE (Consider using the onduplicate: directive to handle duplicates.)
warning: Duplicate file: OSGI-OPT/src/aQute/lib/io/ByteBufferDataInput.java (Consider using the onduplicate: directive to handle duplicates.)
warning: Duplicate file: OSGI-OPT/src/aQute/lib/io/ByteBufferDataOutput.java (Consider using the onduplicate: directive to handle duplicates.)
warning: Duplicate file: OSGI-OPT/src/aQute/lib/io/ByteBufferInputStream.java (Consider using the onduplicate: directive to handle duplicates.)
warning: Duplicate file: OSGI-OPT/src/aQute/lib/io/ByteBufferOutputStream.java (Consider using the onduplicate: directive to handle duplicates.)
warning: Duplicate file: OSGI-OPT/src/aQute/lib/io/CharBufferReader.java (Consider using the onduplicate: directive to handle duplicates.)

etc....

@chrisrueger chrisrueger marked this pull request as ready for review October 29, 2024 21:52
Signed-off-by: Christoph Rueger <chrisrueger@gmail.com>
Signed-off-by: Christoph Rueger <chrisrueger@gmail.com>
add word 'overwritten' to the default warning.
Also use a Constant for the default strategy.

Signed-off-by: Christoph Rueger <chrisrueger@gmail.com>
@chrisrueger
Copy link
Contributor Author

Meeting Notes:

enum ::= WARN, ERROR, MERGE, SKIP, OVERWRITE
duplicate: ::= (tagGlob | enum)*

MERGE, SKIP, OVERWRITE -> exclusive
WARN, ERROR -> can be combined

onduplicate:='MERGE'

  • all merge plugins

onduplicate:='MERGE,metainfservices'

  • OK

onduplicate:='SKIP,metainfservices'

  • just skip ignore tag

onduplicate:='metainfservices' (basically no enum, just a tag)

  • same as onduplicate:='MERGE,metainfservices'
  • no enum but tags means: otherwise OVERWRITE

onduplicate:='INVALIDENUM,metainfservices'

  • both should be considered tag

onduplicate:='metainfservices,MERGE,SKIP'

as discussed
Signed-off-by: Christoph Rueger <chrisrueger@gmail.com>
(only in the default case)
Signed-off-by: Christoph Rueger <chrisrueger@gmail.com>
Signed-off-by: Christoph Rueger <chrisrueger@gmail.com>
@pkriens
Copy link
Member

pkriens commented Oct 30, 2024

I took a look at the code. It is not bad but I think it can be done more readable. The logic when what happens is complex and it is not clear in the code. I came up with the following class (well record) that could help. It is a bit pedantic so you can probably convince me to go with your code but this is the way I would handle the core logic. (The MergeResources is copied for convenience.)

public record Duplication(String path, Resource to, Resource candidate) {

	interface MergeResources {
		Optional<Resource> tryMerge(String path, Resource to, Resource candidate);
	}

	enum OnDuplicateCommand {
		OVERWRITE,
		SKIP,
		MERGE,
		WARN,
		ERROR;
	}

	public static Function<Duplication, Optional<Resource>> doDuplicate(String onduplicate, Processor processor) {
		Set<OnDuplicateCommand> commands = new LinkedHashSet<>();
		List<String> tags = new ArrayList<String>();

		Strings.split(onduplicate)
			.forEach(string -> {
				try {
					commands.add(OnDuplicateCommand.valueOf(string));
				} catch (Exception e) {
					tags.add(string);
				}
			});

		Consumer<Duplication> error = commands.remove(OnDuplicateCommand.ERROR)
			? dupl -> processor.error("includeresource.duplicates: duplicate found for path %s", dupl.path)
			: d -> {};
		Consumer<Duplication> warn = commands.remove(OnDuplicateCommand.WARN)
			? dupl -> processor.warning("includeresource.duplicates: duplicate found for path %s", dupl.path)
			: d -> {};

		String[] tags2 = tags.toArray(new String[0]);
		Function<Duplication, Optional<Resource>> result;

		int what = tags.isEmpty() ? 0 : 1;
		if (!commands.isEmpty())
			what += 2;

		result = switch (what) {
			case 0 -> dupl -> Optional.of(dupl.to);
			case 1 -> {
				List<MergeResources> mergers = processor.getPlugins(MergeResources.class, tags2);
				yield dupl -> merge(dupl, mergers);
			}
			case 2, 3 -> getCommand(commands, tags2, processor);
			default -> throw new UnsupportedOperationException();
		};

		return dupl -> {
			error.accept(dupl);
			warn.accept(dupl);
			return result.apply(dupl);
		};
	}

	private static Function<Duplication, Optional<Resource>> getCommand(Set<OnDuplicateCommand> commands,
		String[] tags, Processor processor) {

		if (commands.size() != 1) {
			processor.error("includeresource.duplicates: specifies multiple strategies to handle duplicates: %s",
				commands);
			return dupl -> Optional.empty();
		}

		if (commands.contains(OnDuplicateCommand.OVERWRITE)) {
			return dupl -> Optional.of(dupl.candidate);
		} else if (commands.contains(OnDuplicateCommand.SKIP)) {
			return dupl -> Optional.empty();
		} else if (commands.contains(OnDuplicateCommand.MERGE)) {
			return dupl -> merge(dupl, processor.getPlugins(MergeResources.class, tags));
		} else {
			throw new UnsupportedOperationException("missed an enum value? " + commands);
		}
	}

	private static Optional<Resource> merge(Duplication dupl, List<MergeResources> list) {
		for (MergeResources mr : list) {
			Optional<Resource> merged = mr.tryMerge(dupl.path, dupl.to, dupl.candidate);
			if (merged.isPresent())
				return merged;

		}
		return Optional.empty();
	}
}

@pkriens
Copy link
Member

pkriens commented Oct 30, 2024

You can call the method doDuplicate with the directive content. This gives you back a function you can call when there is a duplicate. Advantage is that the errors are reported early and no unnecessary work is done.

@chrisrueger
Copy link
Contributor Author

chrisrueger commented Oct 30, 2024

Thanks a lot @pkriens I started adding your implementation.
It differs a little bit from my version in terms of behavior, since testcases fail.

I think I need to clarify again:

What should e.g. happen with a duplicate for:

onduplicate:='nonexistingtag'
or
onduplicate:='MERGE' on a path which cannot be merged by any plugin?

Should it
a) do nothing, since we find no plugin which can merge (means, keep the existing file, but do not overwrite it)
or
b) since we find no plugin which can merge, we just overwrite as we always did

Basically it could be summarized:
What should happen if we want MERGE, but cannot?
Previously I assumed b). But thinking about it a) also makes sense.

I think you implemented a) and I kept that now and modified the testcases accordingly.

Also your version did not have the default handling (backwards compatibility, meaning no :onduplicate directive) with the additional warning if resources are not identical. I added that.

If the implementation as it is right now is ok and the testcases make sense, then I would say this is ready.

by @pkriens
adjust testcases since this version uses different messages and treats MERGE differently than the previous implementation.
Basically now if a MERGE is not supported / possible, then the existing file is kept (is NOT overwritten).

Signed-off-by: Christoph Rueger <chrisrueger@gmail.com>

fix test

Signed-off-by: Christoph Rueger <chrisrueger@gmail.com>
@chrisrueger chrisrueger force-pushed the includeresource-duplicate-strategy branch from 85b01b5 to 1153f90 Compare October 30, 2024 18:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants