Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid file information in SPDX documents #1240

Open
armintaenzertng opened this issue Jul 20, 2023 · 9 comments
Open

Invalid file information in SPDX documents #1240

armintaenzertng opened this issue Jul 20, 2023 · 9 comments

Comments

@armintaenzertng
Copy link

Note: This uses the new version of the SPDX generation introduced in #1233. The old version sports the same errors and a few more that have been already fixed in the new version.

Describe the bug
SPDX outputs with file information have a number of validation issues:

  • some files don't have a checksum (maybe this is only the case for empty files, so currently this resorts to the SHA1 of the empty string so that the SpdxDocument can at least be generated)
  • some files have invalid SpdxIds like SPDXRef-None-None or SPDXRef-v2"
  • some license references from LicenseInfoInFile are not present in the ExtractedLicensingInfo section

To Reproduce
I used tern report -i golang:1.12-alpine -f spdxjson -sv 2.3 -o output.json to produce the output and then ran pyspdxtools -i output.json on it (note that the validation takes a while due to large SPDX document).
I'm not sure whether -x scancode would also be required as I recall that the above command used to not produce any file information before. In case there are problems, I attached my output.json as output.txt (JSON format is not supported by GitHub, it seems).

Error in terminal
Here are the validation issues:

Unrecognized license reference: LicenseRef-21495e9. license_expression must only use IDs from the license list or extracted licensing info, but is: LicenseRef-21495e9
Unrecognized license reference: LicenseRef-1c734cf. license_expression must only use IDs from the license list or extracted licensing info, but is: LicenseRef-1c734cf
Unrecognized license reference: LicenseRef-1b79b75. license_expression must only use IDs from the license list or extracted licensing info, but is: LicenseRef-1b79b75
Unrecognized license reference: LicenseRef-fa9fd02. license_expression must only use IDs from the license list or extracted licensing info, but is: LicenseRef-fa9fd02
Unrecognized license reference: LicenseRef-39c3ee0. license_expression must only use IDs from the license list or extracted licensing info, but is: LicenseRef-39c3ee0
Unrecognized license reference: LicenseRef-21495e9. license_expression must only use IDs from the license list or extracted licensing info, but is: LicenseRef-21495e9
Unrecognized license reference: LicenseRef-4ccf56f. license_expression must only use IDs from the license list or extracted licensing info, but is: LicenseRef-4ccf56f
Unrecognized license reference: LicenseRef-45c771b. license_expression must only use IDs from the license list or extracted licensing info, but is: LicenseRef-45c771b
Unrecognized license reference: LicenseRef-ca2312b. license_expression must only use IDs from the license list or extracted licensing info, but is: LicenseRef-ca2312b
spdx_id must only contain letters, numbers, "." and "-" and must begin with "SPDXRef-", but is: SPDXRef-v2"-None
spdx_id must only contain letters, numbers, "." and "-" and must begin with "SPDXRef-", but is: SPDXRef-v2"-None
did not find the referenced spdx_id "SPDXRef-None-None" in the SPDX document

Expected behavior
Tern's generated SPDX documents with file information should be valid.

Environment you are running Tern on
Enter all that apply

  • tern at 047e1cb
  • Ubuntu 22.0.4
  • Python 3.10.6
@rnjudge
Copy link
Contributor

rnjudge commented Jul 20, 2023

Hmm, I don't see these errors in the current/old version of Tern's output when I run tern report -i golang:1.12-alpine -f spdxjson -o output.json:

(ternenv) [rose@fedora tern]$ tern report -i golang:1.12-alpine -f spdxjson -o output-golang.json
(ternenv) [rose@fedora ternenv]$ java -jar tools-java-1.1.7-jar-with-dependencies.jar Verify tern/output-golang.json 
WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
This SPDX Document is valid.

@rnjudge
Copy link
Contributor

rnjudge commented Jul 20, 2023

I'll take a look at the output you pasted and the output I have, but this seems to be introduced with the latest changes.

@armintaenzertng
Copy link
Author

The java-tools don't seem to pick up on all invalidities, please also check with pyspdxtools -i output.json.

Also, do you get the large (around 6MB) SPDX output?

@rnjudge
Copy link
Contributor

rnjudge commented Jul 20, 2023

@armintaenzertng Yes, I do see errors with pyspdxtools although I'm not convinced all of them are valid or make sense. I tend to trust java tools more because it is actively maintained by @goneall and I'm not sure if the python tools are. But, if you see something that is valid that the java tools don't pick it up, you should file a bug with them.

As an example, I see this error with python tools:

package must contain no elements if `files_analyzed` is False, but found [Relationship(spdx_element_id='SPDXRef-golang-1.12-alpine', relationship_type=<RelationshipType.CONTAINS: 6>, related_spdx_element_id='SPDXRef-53745f29fd'

SPDXRef-53745f29fd is a layer package in the document, not a file.

It is true that a package may contain no files if files_analyzed is false but it may still contain other packages. This error is the majority of what I'm seeing. I don't see the Unrecognized license errors you are seeing.

Full error output:

$ pyspdxtools -i /home/rose/ternenv/tern/output-golang.json
ERROR:root:The document is invalid. The following issues have been found:
package must contain no elements if files_analyzed is False, but found [Relationship(spdx_element_id='SPDXRef-golang-1.12-alpine', relationship_type=<RelationshipType.CONTAINS: 6>, related_spdx_element_id='SPDXRef-53745f29fd', comment=None), Relationship(spdx_element_id='SPDXRef-golang-1.12-alpine', relationship_type=<RelationshipType.CONTAINS: 6>, related_spdx_element_id='SPDXRef-9c60a09a3e', comment=None), Relationship(spdx_element_id='SPDXRef-golang-1.12-alpine', relationship_type=<RelationshipType.CONTAINS: 6>, related_spdx_element_id='SPDXRef-1e34416158', comment=None), Relationship(spdx_element_id='SPDXRef-golang-1.12-alpine', relationship_type=<RelationshipType.CONTAINS: 6>, related_spdx_element_id='SPDXRef-ccad0a45fa', comment=None), Relationship(spdx_element_id='SPDXRef-golang-1.12-alpine', relationship_type=<RelationshipType.CONTAINS: 6>, related_spdx_element_id='SPDXRef-65e02ee814', comment=None)]
package must contain no elements if files_analyzed is False, but found [Relationship(spdx_element_id='SPDXRef-53745f29fd', relationship_type=<RelationshipType.CONTAINS: 6>, related_spdx_element_id='SPDXRef-musl-1.1.24-r0', comment=None), Relationship(spdx_element_id='SPDXRef-53745f29fd', relationship_type=<RelationshipType.CONTAINS: 6>, related_spdx_element_id='SPDXRef-busybox-1.31.1-r9', comment=None), Relationship(spdx_element_id='SPDXRef-53745f29fd', relationship_type=<RelationshipType.CONTAINS: 6>, related_spdx_element_id='SPDXRef-alpine-baselayout-3.2.0-r3', comment=None), Relationship(spdx_element_id='SPDXRef-53745f29fd', relationship_type=<RelationshipType.CONTAINS: 6>, related_spdx_element_id='SPDXRef-alpine-keys-2.1-r2', comment=None), Relationship(spdx_element_id='SPDXRef-53745f29fd', relationship_type=<RelationshipType.CONTAINS: 6>, related_spdx_element_id='SPDXRef-libcrypto1.1-1.1.1d-r3', comment=None), Relationship(spdx_element_id='SPDXRef-53745f29fd', relationship_type=<RelationshipType.CONTAINS: 6>, related_spdx_element_id='SPDXRef-libssl1.1-1.1.1d-r3', comment=None), Relationship(spdx_element_id='SPDXRef-53745f29fd', relationship_type=<RelationshipType.CONTAINS: 6>, related_spdx_element_id='SPDXRef-ca-certificates-cacert-20191127-r0', comment=None), Relationship(spdx_element_id='SPDXRef-53745f29fd', relationship_type=<RelationshipType.CONTAINS: 6>, related_spdx_element_id='SPDXRef-libtls-standalone-2.9.1-r0', comment=None), Relationship(spdx_element_id='SPDXRef-53745f29fd', relationship_type=<RelationshipType.CONTAINS: 6>, related_spdx_element_id='SPDXRef-ssl-client-1.31.1-r9', comment=None), Relationship(spdx_element_id='SPDXRef-53745f29fd', relationship_type=<RelationshipType.CONTAINS: 6>, related_spdx_element_id='SPDXRef-zlib-1.2.11-r3', comment=None), Relationship(spdx_element_id='SPDXRef-53745f29fd', relationship_type=<RelationshipType.CONTAINS: 6>, related_spdx_element_id='SPDXRef-apk-tools-2.10.4-r3', comment=None), Relationship(spdx_element_id='SPDXRef-53745f29fd', relationship_type=<RelationshipType.CONTAINS: 6>, related_spdx_element_id='SPDXRef-scanelf-1.2.4-r0', comment=None), Relationship(spdx_element_id='SPDXRef-53745f29fd', relationship_type=<RelationshipType.CONTAINS: 6>, related_spdx_element_id='SPDXRef-musl-utils-1.1.24-r0', comment=None), Relationship(spdx_element_id='SPDXRef-53745f29fd', relationship_type=<RelationshipType.CONTAINS: 6>, related_spdx_element_id='SPDXRef-libc-utils-0.7.2-r0', comment=None)]
package must contain no elements if files_analyzed is False, but found [Relationship(spdx_element_id='SPDXRef-9c60a09a3e', relationship_type=<RelationshipType.CONTAINS: 6>, related_spdx_element_id='SPDXRef-ca-certificates-20191127-r0', comment=None)]

Also, the output I have is only 25K... not 6MB. 6MB sounds like it contains file information? Maybe try to delete your cache and re-generate. I get 23K for output file when I run with the updated changes as well (no file info).

@rnjudge
Copy link
Contributor

rnjudge commented Jul 20, 2023

@armintaenzertng I will try to generate a file with golang:1.12-alpine using scancode and see if I can re-create the errors you are seeing.

@rnjudge
Copy link
Contributor

rnjudge commented Jul 20, 2023

Running with the old changes, my SBOM with scancode metadata is 3.3MB. Running with the new changes, when I generate a scancode SBOM, It is 6.0MB. So it seems like there is extra metadata in there somewhere....

I do see one of the errors you are talking about with the old changes, though, even with the java tools:
Analysis exception processing SPDX file: No SPDX element found for SPDX ID SPDXRef-None-None

I'll take a look. I'm assuming its another issue related to Scancode's recent restructuring.

@armintaenzertng
Copy link
Author

It is true that a package may contain no files if files_analyzed is false but it may still contain other packages. This error is the majority of what I'm seeing.

Yes, I noticed this bug, too. This is fixed in the current 0.8.0rc3 release (the spdx-tools PR also includes that fixed release already, please update your local code to get the change).

@armintaenzertng
Copy link
Author

Running with the old changes, my SBOM with scancode metadata is 3.3MB. Running with the new changes, when I generate a scancode SBOM, It is 6.0MB. So it seems like there is extra metadata in there somewhere....

This is due to the hasFiles field being deprecated, see here. All SPDXIDs from the hasFiles property are now represented as relationships, which have more lines than just the SPDXID.

@armintaenzertng
Copy link
Author

@rnjudge: It turns out the java-tools pick up on the invalidities mentioned above, but only after the
Analysis exception processing SPDX file: No SPDX element found for SPDX ID SPDXRef-None-None
is fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants