diff --git a/doc/release-notes/10169-JSON-schema-validation.md b/doc/release-notes/10169-JSON-schema-validation.md deleted file mode 100644 index 92ff4a917d5..00000000000 --- a/doc/release-notes/10169-JSON-schema-validation.md +++ /dev/null @@ -1,3 +0,0 @@ -### Improved JSON Schema validation for datasets - -Enhanced JSON schema validation with checks for required and allowed child objects, type checking for field types including `primitive`, `compound` and `controlledVocabulary`. More user-friendly error messages to help pinpoint the issues in the dataset JSON. See [Retrieve a Dataset JSON Schema for a Collection](https://guides.dataverse.org/en/6.3/api/native-api.html#retrieve-a-dataset-json-schema-for-a-collection) in the API Guide and PR #10543. diff --git a/doc/release-notes/10287-use-support-address-in-system-email-text.md b/doc/release-notes/10287-use-support-address-in-system-email-text.md deleted file mode 100644 index 4c294404298..00000000000 --- a/doc/release-notes/10287-use-support-address-in-system-email-text.md +++ /dev/null @@ -1,4 +0,0 @@ -### Notification Email Improvement - -The system email text has been improved to use the support email address (`dataverse.mail.support-email`) in the text where it states; "contact us for support at", instead of the default system email address. -Using the system email address here was particularly problematic when it was a 'noreply' address. diff --git a/doc/release-notes/10341-croissant.md b/doc/release-notes/10341-croissant.md deleted file mode 100644 index 15bc7029099..00000000000 --- a/doc/release-notes/10341-croissant.md +++ /dev/null @@ -1,9 +0,0 @@ -A new metadata export format called Croissant is now available as an external metadata exporter. It is oriented toward making datasets consumable by machine learning. - -When enabled, Croissant replaces the Schema.org JSON-LD format in the `` of dataset landing pages. For details, see the [Schema.org JSON-LD/Croissant Metadata](https://dataverse-guide--10533.org.readthedocs.build/en/10533/admin/discoverability.html#schema-org-head) under the discoverability section of the Admin Guide. - -For more about the Croissant exporter, see https://github.com/gdcc/exporter-croissant - -For installation instructions, see [Enabling External Exporters](https://dataverse-guide--10533.org.readthedocs.build/en/10533/installation/advanced.html#enabling-external-exporters) in the Installation Guide. - -See also Issue #10341 and PR #10533. diff --git a/doc/release-notes/10433-add-thumbnail-for-featured-dataverses.md b/doc/release-notes/10433-add-thumbnail-for-featured-dataverses.md deleted file mode 100644 index 0ebb84a8eb0..00000000000 --- a/doc/release-notes/10433-add-thumbnail-for-featured-dataverses.md +++ /dev/null @@ -1,5 +0,0 @@ -Add the ability to configure a thumbnail logo that is displayed for a collection when the collection is configured as a featured collection. If present, this thumbnail logo is shown. Otherwise, the collection logo is shown. Configuration is done under the "Theme" for a collection. - -The HTML preview of the documentation can be found [here](https://dataverse-guide--10433.org.readthedocs.build/en/10433/user/dataverse-management.html#theme). - -For more information, see [#10291](https://github.com/IQSS/dataverse/issues/10291). diff --git a/doc/release-notes/10478-version-base-image.md b/doc/release-notes/10478-version-base-image.md deleted file mode 100644 index 34f444a2122..00000000000 --- a/doc/release-notes/10478-version-base-image.md +++ /dev/null @@ -1,7 +0,0 @@ -### Adding versioned tags to Container Base Images - -With this release we introduce a detailed maintenance workflow for our container images. -As output of the GDCC Containerization Working Group, the community takes another step towards production ready containers available directly from the core project. - -The maintenance workflow regularly updates the Container Base Image, which contains the operating system, Java, Payara Application Server, as well as tools and libraries required by the Dataverse application. -Shipping these rolling releases as well as immutable revisions is the foundation for secure and reliable Dataverse Application Container images. diff --git a/doc/release-notes/10508-base-image-fixes.md b/doc/release-notes/10508-base-image-fixes.md deleted file mode 100644 index 148066435e8..00000000000 --- a/doc/release-notes/10508-base-image-fixes.md +++ /dev/null @@ -1,12 +0,0 @@ -# Security and Compatibility Fixes to the Container Base Image - -- Switch "wait-for" to "wait4x", aligned with the Configbaker Image -- Update "jattach" to v2.2 -- Install AMD64 / ARM64 versions of tools as necessary -- Run base image as unprivileged user by default instead of `root` - this was an oversight from OpenShift changes -- Linux User, Payara Admin and Domain Master passwords: - - Print hints about default, public knowledge passwords in place for - - Enable replacing these passwords at container boot time -- Enable building with updates Temurin JRE image based on Ubuntu 24.04 LTS -- Fix entrypoint script troubles with pre- and postboot script files -- Unify location of files at CONFIG_DIR=/opt/payara/config, avoid writing to other places \ No newline at end of file diff --git a/doc/release-notes/10517-datasetType.md b/doc/release-notes/10517-datasetType.md deleted file mode 100644 index 2e3aff940c7..00000000000 --- a/doc/release-notes/10517-datasetType.md +++ /dev/null @@ -1,10 +0,0 @@ -### Initial Support for Dataset Types - -Out of the box, all datasets have the type "dataset" but superusers can add additional types. At this time the type can only be set at creation time via API. The types "dataset", "software", and "workflow" will be sent to DataCite when the dataset is published. - -For details see and #10517. Please note that this feature is highly experimental and is expected to evolve. - -Upgrade instructions --------------------- - -Update your Solr schema.xml file to pick up the "datasetType" additions and do a full reindex. diff --git a/doc/release-notes/10583-dataset-unlink-functionality-same-permission-as-link.md b/doc/release-notes/10583-dataset-unlink-functionality-same-permission-as-link.md deleted file mode 100644 index f97bd252db3..00000000000 --- a/doc/release-notes/10583-dataset-unlink-functionality-same-permission-as-link.md +++ /dev/null @@ -1,2 +0,0 @@ -New "Unlink Dataset" button has been added to the Dataset Page to allow a user to unlink a dataset from a collection that was previously linked with the "Link Dataset" button. The user must possess the same permissions needed to unlink the Dataset as they would to link the Dataset. -The [existing API](https://guides.dataverse.org/en/6.3/admin/dataverses-datasets.html#unlink-a-dataset) for unlinking datasets has been updated to no longer require superuser access. The "Publish Dataset" permission is now enough. diff --git a/doc/release-notes/10606-dataverse-in-windows-wsl.md b/doc/release-notes/10606-dataverse-in-windows-wsl.md deleted file mode 100644 index 9501d6e3090..00000000000 --- a/doc/release-notes/10606-dataverse-in-windows-wsl.md +++ /dev/null @@ -1 +0,0 @@ -New instructions have been added for developers on Windows trying to run a Dataverse development environment using Windows Subsystem for Linux (WSL). See https://dataverse-guide--10608.org.readthedocs.build/en/10608/developers/windows.html #10606 and #10608. diff --git a/doc/release-notes/10632-DataCiteXMLandRelationType.md b/doc/release-notes/10632-DataCiteXMLandRelationType.md deleted file mode 100644 index 42c1cfb6eda..00000000000 --- a/doc/release-notes/10632-DataCiteXMLandRelationType.md +++ /dev/null @@ -1,41 +0,0 @@ -### Enhanced DataCite Metadata, Relation Type - -A new field has been added to the citation metadatablock to allow entry of the "Relation Type" between a "Related Publication" and a dataset. The Relation Type is currently limited to the most common 6 values recommended by DataCite: isCitedBy, Cites, IsSupplementTo, IsSupplementedBy, IsReferencedBy, and References. For existing datasets where no "Relation Type" has been specified, "IsSupplementTo" is assumed. - -Dataverse now supports the DataCite v4.5 schema. Additional metadata, including metadata about Related Publications, and files in the dataset are now being sent to DataCite and improvements to how PIDs (ORCID, ROR, DOIs, etc.), license/terms, geospatial, and other metadata is represented have been made. The enhanced metadata will automatically be sent when datasets are created and published and is available in the DataCite XML export after publication. - -The additions are in rough alignment with the OpenAIRE XML export, but there are some minor differences in addition to the Relation Type addition, including an update to the DataCite 4.5 schema. For details see https://github.com/IQSS/dataverse/pull/10632 and https://github.com/IQSS/dataverse/pull/10615 and the [design document](https://docs.google.com/document/d/1JzDo9UOIy9dVvaHvtIbOI8tFU6bWdfDfuQvWWpC0tkA/edit?usp=sharing) referenced there. - -Multiple backward incompatible changes and bug fixes have been made to API calls (3 of the four of which were not documented) related to updating PID target urls and metadata at the provider service: -- [Update Target URL for a Published Dataset at the PID provider](https://guides.dataverse.org/en/latest/admin/dataverses-datasets.html#update-target-url-for-a-published-dataset-at-the-pid-provider) -- [Update Target URL for all Published Datasets at the PID provider](https://guides.dataverse.org/en/latest/admin/dataverses-datasets.html#update-target-url-for-all-published-datasets-at-the-pid-provider) -- [Update Metadata for a Published Dataset at the PID provider](https://guides.dataverse.org/en/latest/admin/dataverses-datasets.html#update-metadata-for-a-published-dataset-at-the-pid-provider) -- [Update Metadata for all Published Datasets at the PID provider](https://guides.dataverse.org/en/latest/admin/dataverses-datasets.html#update-metadata-for-all-published-datasets-at-the-pid-provider) - -Upgrade instructions --------------------- - -The Solr schema has to be updated via the normal mechanism to add the new "relationType" field. - -The citation metadatablock has to be reinstalled using the standard instructions. - -With these two changes, the "Relation Type" fields will be available and creation/publication of datasets will result in the expanded XML being sent to DataCite. - -To update existing datasets (and files using DataCite DOIs): - -Exports can be updated by running `curl http://localhost:8080/api/admin/metadata/reExportAll` - -Entries at DataCite for published datasets can be updated by a superuser using an API call (newly documented): - -`curl -X POST -H 'X-Dataverse-key:' http://localhost:8080/api/datasets/modifyRegistrationPIDMetadataAll` - -This will loop through all published datasets (and released files with PIDs). As long as the loop completes, the call will return a 200/OK response. Any PIDs for which the update fails can be found using - -`grep 'Failure for id' server.log` - -Failures may occur if PIDs were never registered, or if they were never made findable. Any such cases can be fixed manually in DataCite Fabrica or using the [Reserve a PID](https://guides.dataverse.org/en/latest/api/native-api.html#reserve-a-pid) API call and the newly documented `/api/datasets//modifyRegistration` call respectively. See https://guides.dataverse.org/en/latest/admin/dataverses-datasets.html#send-dataset-metadata-to-pid-provider. Please reach out with any questions. - -PIDs can also be updated by a superuser on a per-dataset basis using - -`curl -X POST -H 'X-Dataverse-key:' http://localhost:8080/api/datasets//modifyRegistrationMetadata` - diff --git a/doc/release-notes/10633-add-dataverse-api-extension.md b/doc/release-notes/10633-add-dataverse-api-extension.md deleted file mode 100644 index f5d8030e8ac..00000000000 --- a/doc/release-notes/10633-add-dataverse-api-extension.md +++ /dev/null @@ -1 +0,0 @@ -The addDataverse (/api/dataverses/{identifier}) API endpoint has been extended to allow adding metadata blocks, input levels and facet ids at creation time, as the Dataverse page in create mode does in JSF. diff --git a/doc/release-notes/10726-dataverse-facets-api-extension.md b/doc/release-notes/10726-dataverse-facets-api-extension.md deleted file mode 100644 index baf6f798e35..00000000000 --- a/doc/release-notes/10726-dataverse-facets-api-extension.md +++ /dev/null @@ -1,3 +0,0 @@ -New optional query parameter "returnDetails" added to "dataverses/{identifier}/facets/" endpoint to include detailed information of each DataverseFacet. - -New endpoint "datasetfields/facetables" that lists all facetable dataset fields defined in the installation. diff --git a/doc/release-notes/10733-add-publication-status-to-search-api-results.md b/doc/release-notes/10733-add-publication-status-to-search-api-results.md deleted file mode 100644 index d015a50a00d..00000000000 --- a/doc/release-notes/10733-add-publication-status-to-search-api-results.md +++ /dev/null @@ -1,14 +0,0 @@ -Search API (/api/search) response will now include publicationStatuses in the Json response as long as the list is not empty - -Example: -```javascript -"items": [ - { - "name": "Darwin's Finches", - ... - "publicationStatuses": [ - "Unpublished", - "Draft" - ], -(etc, etc) -``` diff --git a/doc/release-notes/10741-list-metadatablocks-display-on-create-fix.md b/doc/release-notes/10741-list-metadatablocks-display-on-create-fix.md deleted file mode 100644 index 4edadcaa1fc..00000000000 --- a/doc/release-notes/10741-list-metadatablocks-display-on-create-fix.md +++ /dev/null @@ -1 +0,0 @@ -Fixed dataverses/{identifier}/metadatablocks endpoint to not return fields marked as displayOnCreate=true if there is an input level with include=false, when query parameters returnDatasetFieldTypes=true and onlyDisplayedOnCreate=true are set. diff --git a/doc/release-notes/10744-ro-crate-docs.md b/doc/release-notes/10744-ro-crate-docs.md deleted file mode 100644 index 9d52b4578b4..00000000000 --- a/doc/release-notes/10744-ro-crate-docs.md +++ /dev/null @@ -1,3 +0,0 @@ -## RO-Crate Support (Metadata Export) - -Dataverse now supports [RO-Crate](https://www.researchobject.org/ro-crate/) in the sense that dataset metadata can be exported in that format. This functionality is not available out of the box but you can enable one or more RO-Crate exporters from the [list of external exporters](https://preview.guides.gdcc.io/en/develop/installation/advanced.html#inventory-of-external-exporters). See also #10744. diff --git a/doc/release-notes/10749-dataverse-user-permissions-api-extension.md b/doc/release-notes/10749-dataverse-user-permissions-api-extension.md deleted file mode 100644 index 706b1f42641..00000000000 --- a/doc/release-notes/10749-dataverse-user-permissions-api-extension.md +++ /dev/null @@ -1 +0,0 @@ -New API endpoint "dataverses/{identifier}/userPermissions" for obtaining the user permissions on a dataverse. diff --git a/doc/release-notes/10758-rust-client.md b/doc/release-notes/10758-rust-client.md deleted file mode 100644 index e206f27ce65..00000000000 --- a/doc/release-notes/10758-rust-client.md +++ /dev/null @@ -1,3 +0,0 @@ -### Rust API client library - -An API client library for the Rust programming language is now available at https://github.com/gdcc/rust-dataverse and has been added to the [list of client libraries](https://dataverse-guide--10758.org.readthedocs.build/en/10758/api/client-libraries.html) in the API Guide. See also #10758. diff --git a/doc/release-notes/10797-update-current-version-bug-fix.md b/doc/release-notes/10797-update-current-version-bug-fix.md deleted file mode 100644 index 2cfaf69cad3..00000000000 --- a/doc/release-notes/10797-update-current-version-bug-fix.md +++ /dev/null @@ -1,11 +0,0 @@ -A significant bug in the superuser-only "Update-Current-Version" publication was found and fixed in this release. If the Update-Current-Version option was used when changes were made to the dataset Terms (rather than to dataset metadata), or if the PID provider service was down/returned an error, the update would fail and render the dataset unusable and require restoration from a backup. The fix in this release allows the update to succeed in both of these cases and redesigns the functionality such that any unknown issues should not make the dataset unusable (i.e. the error would be reported and the dataset would remain in its current state with the last-published version as it was and changes still in the draft version.) - -Users of earlier Dataverse releases are encouraged to alert their superusers to this issue. Those who wish to disable this functionality have two options: -* Change the dataset.updateRelease entry in the Bundle.properties file (or local language version) to "Do Not Use" or similar (doesn't disable but alerts superusers to the issue), or -* Edit the dataset.xhtml file to remove the lines - - - - - -, delete the contents of the generated and osgi-cache directories in the Dataverse Payara domain, and restart the Payara server. diff --git a/doc/release-notes/10800-add-dataverse-request-json-fix.md b/doc/release-notes/10800-add-dataverse-request-json-fix.md deleted file mode 100644 index ddd6c388ec6..00000000000 --- a/doc/release-notes/10800-add-dataverse-request-json-fix.md +++ /dev/null @@ -1 +0,0 @@ -Fixed the "addDataverse" API endpoint (/dataverses/{id} POST) expected request JSON structure to parse facetIds as described in the docs. \ No newline at end of file diff --git a/doc/release-notes/10810-search-api-payload-extensions.md b/doc/release-notes/10810-search-api-payload-extensions.md deleted file mode 100644 index 5112d9f62ee..00000000000 --- a/doc/release-notes/10810-search-api-payload-extensions.md +++ /dev/null @@ -1,52 +0,0 @@ -Search API (/api/search) response will now include new fields for the different entities. - -For Dataverse: - -- "affiliation" -- "parentDataverseName" -- "parentDataverseIdentifier" -- "image_url" (optional) - -```javascript -"items": [ - { - "name": "Darwin's Finches", - ... - "affiliation": "Dataverse.org", - "parentDataverseName": "Root", - "parentDataverseIdentifier": "root", - "image_url":"..." -(etc, etc) -``` - -For DataFile: - -- "releaseOrCreateDate" -- "image_url" (optional) - -```javascript -"items": [ - { - "name": "test.txt", - ... - "releaseOrCreateDate": "2016-05-10T12:53:39Z", - "image_url":"..." -(etc, etc) -``` - -For Dataset: - -- "image_url" (optional) - -```javascript -"items": [ - { - ... - "image_url": "http://localhost:8080/api/datasets/2/logo" - ... -(etc, etc) -``` - -The image_url field was already part of the SolrSearchResult JSON (and incorrectly appeared in Search API documentation), but it wasn’t returned by the API because it was appended only after the Solr query was executed in the SearchIncludeFragment of JSF. Now, the field is set in SearchServiceBean, ensuring it is always returned by the API when an image is available. - -The schema.xml file for Solr has been updated to include a new field called dvParentAlias for supporting the new response field "parentDataverseIdentifier". So for the next Dataverse released version, a Solr reindex will be necessary to apply the new schema.xml version. diff --git a/doc/release-notes/10819-publish-thumbnail-bug.md b/doc/release-notes/10819-publish-thumbnail-bug.md deleted file mode 100644 index 46c9875a6ef..00000000000 --- a/doc/release-notes/10819-publish-thumbnail-bug.md +++ /dev/null @@ -1,6 +0,0 @@ -The initial release of the Dataverse v6.3 introduced a bug where publishing would break the dataset thumbnail, which in turn broke the rendering of the parent Collection ("dataverse") page. This problem was fixed in the PR 10820. - -This bug fix will prevent this from happening in the future, but does not fix any existing broken links. To restore any broken thumbnails caused by this bug, you can call the http://localhost:8080/api/admin/clearThumbnailFailureFlag API, which will attempt to clear the flag on all files (regardless of whether caused by this bug or some other problem with the file) or the http://localhost:8080/api/admin/clearThumbnailFailureFlag/id to clear the flag for individual files. Calling the former, batch API is recommended. - -Additionally, the same PR made it possible to turn off the feature that automatically selects of one of the image datafiles to serve as the thumbnail of the parent dataset. An admin can turn it off by raising the feature flag `-Ddataverse.feature.disable-dataset-thumbnail-autoselect=true`. When the feature is disabled, a user can still manually pick a thumbnail image, or upload a dedicated thumbnail image. - diff --git a/doc/release-notes/10831-standardize-image-url-of-search-api.md b/doc/release-notes/10831-standardize-image-url-of-search-api.md deleted file mode 100644 index 1910091455c..00000000000 --- a/doc/release-notes/10831-standardize-image-url-of-search-api.md +++ /dev/null @@ -1,28 +0,0 @@ -Search API (/api/search) response will now include new image_url format for the Datafile and Dataverse logo. -Note to release note writer: this supersedes the release note 10810-search-api-payload-extensions.md - -For Dataverse: - -- "image_url" (optional) - -```javascript -"items": [ - { - "name": "Darwin's Finches", - ... - "image_url":"/api/access/dvCardImage/{identifier}" -(etc, etc) -``` - -For DataFile: - -- "image_url" (optional) - -```javascript -"items": [ - { - "name": "test.txt", - ... - "image_url":"/api/access/datafile/{identifier}?imageThumb=true" -(etc, etc) -``` diff --git a/doc/release-notes/10857-add-expiration-date-to-recreate-token-api.md b/doc/release-notes/10857-add-expiration-date-to-recreate-token-api.md new file mode 100644 index 00000000000..b450867c630 --- /dev/null +++ b/doc/release-notes/10857-add-expiration-date-to-recreate-token-api.md @@ -0,0 +1 @@ +An optional query parameter called 'returnExpiration' has been added to the 'users/token/recreate' endpoint, which, if set to true, returns the expiration time in the response message. diff --git a/doc/release-notes/10869-fix-npe-using-cvoc.md b/doc/release-notes/10869-fix-npe-using-cvoc.md deleted file mode 100644 index 53214d3789d..00000000000 --- a/doc/release-notes/10869-fix-npe-using-cvoc.md +++ /dev/null @@ -1 +0,0 @@ -This release fixes a bug in the external controlled vocabulary mechanism (introduced in v6.3) that could cause indexing to fail when a script is configured for one child field and no other child fields were managed. \ No newline at end of file diff --git a/doc/release-notes/6.4-release-notes.md b/doc/release-notes/6.4-release-notes.md new file mode 100644 index 00000000000..979fd16bf9e --- /dev/null +++ b/doc/release-notes/6.4-release-notes.md @@ -0,0 +1,526 @@ +# Dataverse 6.4 + +Please note: To read these instructions in full, please go to https://github.com/IQSS/dataverse/releases/tag/v6.4 rather than the list of releases, which will cut them off. + +This release brings new features, enhancements, and bug fixes to Dataverse. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project. + +## Release Highlights + +New features in Dataverse 6.4: + +- Enhanced DataCite Metadata, including "Relation Type" +- All ISO 639-3 languages are now supported +- There is now a button for "Unlink Dataset" +- Users will have DOIs/PIDs reserved for their files as part of file upload instead of at publication time +- Datasets can now have types such as "software" or "workflow" +- Croissant support +- RO-Crate support +- and more! Please see below. + +New client library: + +- Rust + +This release also fixes two important bugs described below and in [a post](https://groups.google.com/g/dataverse-community/c/evn5C-pyrS8/m/JrH9vp47DwAJ) on the mailing list: + +- "Update Current Version" can cause metadata loss +- Publishing breaks designated dataset thumbnail, messes up collection page + +Additional details on the above as well as many more features and bug fixes included in the release are described below. Read on! + +## Features Added + +### Enhanced DataCite Metadata, Including "Relation Type" + +Within the "Related Publication" field, a new subfield has been added called "Relation Type" that allows for the most common [values](https://datacite-metadata-schema.readthedocs.io/en/4.5/appendices/appendix-1/relationType/) recommended by DataCite: isCitedBy, Cites, IsSupplementTo, IsSupplementedBy, IsReferencedBy, and References. For existing datasets where no "Relation Type" has been specified, "IsSupplementTo" is assumed. + +Dataverse now supports the [DataCite v4.5 schema](http://schema.datacite.org/meta/kernel-4/). Additional metadata is now being sent to DataCite including metadata about related publications and files in the dataset. Improved metadata is being sent including how PIDs (ORCID, ROR, DOIs, etc.), license/terms, geospatial, and other metadata are represented. The enhanced metadata will automatically be sent to DataCite when datasets are created and published. Additionally, after publication, you can inspect what was sent by looking at the DataCite XML export. + +The additions are in rough alignment with the OpenAIRE XML export, but there are some minor differences in addition to the Relation Type addition, including an update to the DataCite 4.5 schema. For details see #10632, #10615 and the [design document](https://docs.google.com/document/d/1JzDo9UOIy9dVvaHvtIbOI8tFU6bWdfDfuQvWWpC0tkA/edit?usp=sharing) referenced there. + +Multiple backward incompatible changes and bug fixes have been made to API calls (three of four of which were not documented) related to updating PID target URLs and metadata at the provider service: +- [Update Target URL for a Published Dataset at the PID provider](https://guides.dataverse.org/en/6.4/admin/dataverses-datasets.html#update-target-url-for-a-published-dataset-at-the-pid-provider) +- [Update Target URL for all Published Datasets at the PID provider](https://guides.dataverse.org/en/6.4/admin/dataverses-datasets.html#update-target-url-for-all-published-datasets-at-the-pid-provider) +- [Update Metadata for a Published Dataset at the PID provider](https://guides.dataverse.org/en/6.4/admin/dataverses-datasets.html#update-metadata-for-a-published-dataset-at-the-pid-provider) +- [Update Metadata for all Published Datasets at the PID provider](https://guides.dataverse.org/en/6.4/admin/dataverses-datasets.html#update-metadata-for-all-published-datasets-at-the-pid-provider) + +### Full List of ISO 639-3 Languages Now Supported + +The controlled vocabulary values list for the metadata field "Language" in the citation block has now been extended to include roughly 7920 ISO 639-3 values. + +Some of the language entries in the pre-6.4 list correspond to "macro languages" in ISO-639-3 and admins/users may wish to update to use the corresponding individual language entries from ISO-639-3. As these cases are expected to be rare (they do not involve major world languages), finding them is not covered in the release notes. Anyone who desires help in this area is encouraged to reach out to the Dataverse community via any of the standard communication channels. + +ISO 639-3 codes were downloaded from [sil.org](https://iso639-3.sil.org/code_tables/download_tables#Complete%20Code%20Tables:~:text=iso%2D639%2D3_Code_Tables_20240415.zip) and the file used for merging with the existing citation.tsv was "iso-639-3.tab". See also #8578 and #10762. + +### Unlink Dataset Button + +A new "Unlink Dataset" button has been added to the dataset page to allow a user to unlink a dataset from a collection. To unlink a dataset the user must have permission to link the dataset. Additionally, the [existing API](https://guides.dataverse.org/en/6.4/admin/dataverses-datasets.html#unlink-a-dataset) for unlinking datasets has been updated to no longer require superuser access as the "Publish Dataset" permission is now enough. See also #10583 and #10689. + +### Pre-Publish File DOI Reservation + +Dataverse installations using DataCite as a persistent identifier (PID) provider (or other providers that support reserving PIDs) will be able to reserve PIDs for files when they are uploaded (rather than at publication time). Note that reserving file DOIs can slow uploads with large numbers of files so administrators may need to adjust timeouts (specifically any Apache "``ProxyPass / ajp://localhost:8009/ timeout=``" setting in the recommended Dataverse configuration). + +### Initial Support for Dataset Types + +Out of the box, all datasets now have the type "dataset" but superusers can add additional types. At this time the type of a dataset can only be set at creation time via API. The types "dataset", "software", and "workflow" (just those three, for now) will be sent to DataCite (as `resourceTypeGeneral`) when the dataset is published. + +For details see [the guides](https://guides.dataverse.org/en/6.4/user/dataset-management.html#dataset-types), #10517 and #10694. Please note that this feature is highly experimental and is expected to [evolve](https://github.com/IQSS/dataverse-pm/issues/307). + +### Croissant Support (Metadata Export) + +A new metadata export format called [Croissant](https://github.com/mlcommons/croissant) is now available as an external metadata exporter. It is oriented toward making datasets consumable by machine learning. + +For more about the Croissant exporter, including installation instructions, see . See also #10341, #10533, and [discussion](https://groups.google.com/g/dataverse-community/c/JI8HPgGarr8/m/DqEIkiwlAgAJ) on the mailing list. + +Please note: the Croissant exporter works best with Dataverse 6.2 and higher (where it updates the content of `` as [described](https://guides.dataverse.org/en/6.4/admin/discoverability.html#schema-org-head) in the guides) but can be used with 6.0 and higher (to get the export functionality). + +### RO-Crate Support (Metadata Export) + +Dataverse now supports [RO-Crate](https://www.researchobject.org/ro-crate/) as a metadata export format. This functionality is not available out of the box, but you can enable one or more RO-Crate exporters from the [list of external exporters](https://guides.dataverse.org/en/6.4/installation/advanced.html#inventory-of-external-exporters). See also #10744 and #10796. + +### Rust API Client Library + +An Dataverse API client library for the Rust programming language is now available at https://github.com/gdcc/rust-dataverse and has been added to the [list of client libraries](https://guides.dataverse.org/en/6.4/api/client-libraries.html) in the API Guide. See also #10758. + +### Collection Thumbnail Logo for Featured Collections + +Collections can now have a thumbnail logo that is displayed when the collection is configured as a featured collection. If present, this thumbnail logo is shown. Otherwise, the collection logo is shown. Configuration is done under the "Theme" for a collection as explained in [the guides](https://guides.dataverse.org/en/6.4/user/dataverse-management.html#theme). See also #10291 and #10433. + +### Saved Searches Can Be Deleted + +Saved searches can now be deleted via API. See the [Saved Search](https://guides.dataverse.org/en/6.4/api/native-api.html#saved-search) section of the API Guide, #9317 and #10198. + +### Notification Email Improvement + +When notification emails are sent the part of the closing that says "contact us for support at" will now show the support email address (`dataverse.mail.support-email`), when configured, instead of the default system email address. Using the system email address here was particularly problematic when it was a "noreply" address. See also #10287 and #10504. + +### Ability to Disable Automatic Thumbnail Selection + +It is now possible to turn off the feature that automatically selects one of the image datafiles to serve as the thumbnail of the parent dataset. An admin can turn it off by enabling the [feature flag](https://guides.dataverse.org/en/6.4/installation/config.html#feature-flags) `dataverse.feature.disable-dataset-thumbnail-autoselect`. When the feature is disabled, a user can still manually pick a thumbnail image, or upload a dedicated thumbnail image. See also #10820. + +### More Flexible PermaLinks + +The configuration setting `dataverse.pid.*.permalink.base-url`, which is used for PermaLinks, has been updated to support greater flexibility. Previously, the string `/citation?persistentId=` was automatically appended to the configured base URL. With this update, the base URL will now be used exactly as configured, without any automatic additions. See also #10775. + +### Globus Async Framework + +A new alternative implementation of Globus polling during upload data transfers has been added in this release. This experimental framework does not rely on the instance staying up continuously for the duration of the transfer and saves the state information about Globus upload requests in the database. See `globus-use-experimental-async-framework` under [Feature Flags](https://guides.dataverse.org/en/6.4/installation/config.html#feature-flags) and [dataverse.files.globus-monitoring-server](https://guides.dataverse.org/en/6.4/installation/config.html#dataverse-files-globus-monitoring-server) in the Installation Guide. See also #10623 and #10781. + +### CVoc (Controlled Vocabulary): Allow ORCID and ROR to Be Used Together in Author Field + +Changes in Dataverse and updates to the ORCID and ROR external vocabulary scripts support deploying these for the citation block author field (and others). See also #10711, #10712, and . + +### Development on Windows + +New instructions have been added for developers on Windows trying to run a Dataverse development environment using Windows Subsystem for Linux (WSL). See [the guides](https://guides.dataverse.org/en/6.4/developers/windows.html), #10606, and #10608. + +### Experimental Crossref PID (DOI) Provider + +Crossref can now be used as a PID (DOI) provider, but this feature is experimental. Please provide feedback through the usual channels. See also the [guides](https://guides.dataverse.org/en/6.4/installation/config.html#crossref-specific-settings), #8581, and #10806. + +### Improved JSON Schema Validation for Datasets + +JSON Schema validation has been enhanced with checks for required and allowed child objects as well as type checking for field types including `primitive`, `compound` and `controlledVocabulary`. More user-friendly error messages help pinpoint the issues in the dataset JSON. See [Retrieve a Dataset JSON Schema for a Collection](https://guides.dataverse.org/en/6.4/api/native-api.html#retrieve-a-dataset-json-schema-for-a-collection) in the API Guide, #10169, and #10543. + +### Counter Processor 1.05 Support (Make Data Count) + +Counter Processor 1.05 is now supported for use with Make Data Count. If you are running Counter Processor, you should reinstall/reconfigure it as described in the latest guides. Note that Counter Processor 1.05 requires Python 3, so you will need to follow the full Counter Processor install. Also note that if you configure the new version the same way, it will reprocess the days in the current month when it is first run. This is normal and will not affect the metrics in Dataverse. See also #10479. + +### Version Tags for Container Base Images + +With this release we introduce a detailed maintenance workflow for our container images. As output of the [Containerization Working Group](https://ct.gdcc.io), the community takes another step towards production ready containers available directly from the core project. + +The maintenance workflow regularly updates the [Container Base Image](https://guides.dataverse.org/en/6.4/container/base-image.html), which contains the operating system, Java, Payara, and tools and libraries required by the Dataverse application. Shipping these rolling releases as well as immutable revisions is the foundation for secure and reliable [Dataverse Application Container](https://guides.dataverse.org/en/6.4/container/app-image.html) images. See also #10478 and #10827. + +## Bugs Fixed + +### Update Current Version + +A significant bug in the superuser-only [Update Current Version](https://guides.dataverse.org/en/6.4/admin/dataverses-datasets.html#make-metadata-updates-without-changing-dataset-version) publication option was fixed. If the "Update Current Version" option was used when changes were made to the dataset terms (rather than to dataset metadata) or if the PID provider service was down or returned an error, the update would fail and render the dataset unusable and require restoration from a backup. The fix in this release allows the update to succeed in both of these cases and redesigns the functionality such that any unknown issues should not make the dataset unusable (i.e. the error would be reported and the dataset would remain in its current state with the last-published version as it was and changes still in the draft version.) + +If you do not plan to upgrade to Dataverse 6.4 right away, you are encouraged to alert your superusers to this issue (see [this post](https://groups.google.com/g/dataverse-community/c/evn5C-pyrS8/m/JrH9vp47DwAJ)). Here are some workarounds for pre-6.4 versions: + +* Change the "dataset.updateRelease" entry in the Bundle.properties file (or local language version) to "Do Not Use" or similar (this doesn't disable the button but alerts superusers to the issue), or +* Edit the dataset.xhtml file to remove the lines below, delete the contents of the generated and osgi-cache directories in the Dataverse Payara domain, and restart the Payara server. This will remove the "Update Current Version" from the UI. + +``` + + + +``` + +Again, the workarounds above are only for pre-6.4 versions. The bug has been fixed in Dataverse 6.4. See also #10797. + +### Broken Thumbnails + +Dataverse 6.3 introduced a bug where publishing would break the dataset thumbnail, which in turn broke the rendering of the parent collection (dataverse) page. + +This bug has been fixed but any existing broken thumbnails must be fixed manually. See "clearThumbnailFailureFlag" in the upgrade instructions below. + +Additionally, it is now possible to turn off the feature that automatically selects of one of the image datafiles to serve as the thumbnail of the parent dataset. An admin can turn it off by raising the feature flag `-Ddataverse.feature.disable-dataset-thumbnail-autoselect=true`. When the feature is disabled, a user can still manually pick a thumbnail image, or upload a dedicated thumbnail image. + +See also #10819, #10820, and [the post](https://groups.google.com/g/dataverse-community/c/evn5C-pyrS8/m/JrH9vp47DwAJ) on the mailing list. + +### No License, No Terms of Use + +When datasets have neither a license nor custom terms of use, the dataset page will now indicate this. Also, these datasets will no longer be indexed as having custom terms. See also #8796, #10513, and #10614. + +### CC0 License Bug Fix + +At a high level, some datasets have been mislabeled as "Custom License" when they should have been "CC0 1.0". This has been corrected. + +In Dataverse 5.10, datasets with only "CC0 Waiver" in the "termsofuse" field were converted to "Custom License" (instead of the CC0 1.0 license) through a SQL migration script (see #10634). On deployment of Dataverse 6.4, a new SQL migration script will be run automatically to correct this, changing these datasets to CC0. You can review the script in #10634, which only affect the following datasets: + +- The existing "Terms of Use" must be equal to "This dataset is made available under a Creative Commons CC0 license with the following additional/modified terms and conditions: CC0 Waiver" (this was set in #10634). +- The following terms fields must be empty: Confidentiality Declaration, Special Permissions, Restrictions, Citation Requirements, Depositor Requirements, Conditions, and Disclaimer. +- The license ID must not be assigned. + +The script will set the license ID to that of the CC0 1.0 license and remove the contents of "termsofuse" field. See also #9081 and #10634. + +### Remap oai_dc Export and Harvesting Format Fields: dc:type and dc:date + +The `oai_dc` export and harvesting format has had the following fields remapped: + +- dc:type was mapped to the field "Kind of Data". Now it is hard-coded to the word "Dataset". +- dc:date was mapped to the field "Production Date" when available and otherwise to "Publication Date". Now it is mapped the field "Publication Date" or the field used for the citation date, if set (see [Set Citation Date Field Type for a Dataset](https://guides.dataverse.org/en/6.4/api/native-api.html#set-citation-date-field-type-for-a-dataset)). + +In order for these changes to be reflected in existing datasets, a [reexport all](https://guides.dataverse.org/en/6.4/admin/metadataexport.html#batch-exports-through-the-api) should be run (mentioned below). See #8129 and #10737. + +### Zip File No Longer Misdetected as Shapefile (Hidden Directories) + +When detecting files types, Dataverse would previously detect a zip file as a shapefile if it contained [markers of a shapefile](https://guides.dataverse.org/en/6.4/developers/geospatial.html) in hidden directories. These hidden directories are now ignored when deciding if a zip file is a shapefile or not. See also #8945 and #10627. + +### External Controlled Vocabulary + +This release fixes a bug (introduced in v6.3) in the external controlled vocabulary mechanism that could cause indexing to fail (with a NullPointerException) when a script is configured for one child field and no other child fields were managed. See also #10869 and #10870. + +### Valid JSON in Error Response + +When any `ApiBlockingFilter` policy applies to a request, the JSON in the body of the error response is now valid JSON. See also #10085. + +### Docker Container Base Image Security and Compatibility + +- Switch "wait-for" to "wait4x", aligned with the Configbaker Image +- Update "jattach" to v2.2 +- Install AMD64 / ARM64 versions of tools as necessary +- Run base image as unprivileged user by default instead of `root` - this was an oversight from OpenShift changes +- Linux User, Payara Admin and Domain Master passwords: + - Print hints about default, public knowledge passwords in place for + - Enable replacing these passwords at container boot time +- Enable building with updates Temurin JRE image based on Ubuntu 24.04 LTS +- Fix entrypoint script troubles with pre- and postboot script files +- Unify location of files at CONFIG_DIR=/opt/payara/config, avoid writing to other places + +See also #10508, #10672 and #10722. + +### Cleanup of Temp Directories + +In this release we addressed an issue where copies of files uploaded via the UI were left in one specific temp directory (`.../domain1/uploads` by default). We would like to remind all the installation admins that it is strongly recommended to have some automated (and aggressive) cleanup mechanisms in place for all the temp directories used by Dataverse. For example, at Harvard/IQSS we have the following configuration for the PrimeFaces uploads directory above: (note that, even with this fix in place, PrimeFaces will be leaving a large number of small log files in that location) + +Instead of the default location (`.../domain1/uploads`) we use a directory on a dedicated partition, outside of the filesystem where Dataverse is installed, via the following JVM option: + +``` +-Ddataverse.files.uploads=/uploads/web +``` + +and we have a dedicated cronjob that runs every 30 minutes and deletes everything older than 2 hours in that directory: + +``` +15,45 * * * * /bin/find /uploads/web/ -mmin +119 -type f -name "upload*" -exec rm -f {} \; > /dev/null 2>&1 +``` + +### Trailing Commas in Author Name Now Permitted + +When an author name ended in a comma (e.g. `Smith,` or `Smith, `), the dataset page was broken after publishing (a "500" error page was presented to the user). The underlying issue causing the JSON-LD Schema.org output on the page to break was fixed. See #10343 and #10776. + +## API Updates + +### Search API: affiliation, parentDataverseName, image_url, etc. + +The Search API (`/api/search`) response now includes additional fields, depending on the type. + +For collections (dataverses): + +- "affiliation" +- "parentDataverseName" +- "parentDataverseIdentifier" +- "image_url" (optional) + +```javascript +"items": [ + { + "name": "Darwin's Finches", + ... + "affiliation": "Dataverse.org", + "parentDataverseName": "Root", + "parentDataverseIdentifier": "root", + "image_url":"/api/access/dvCardImage/{identifier}" +(etc, etc) +``` + +For datasets: + +- "image_url" (optional) + +```javascript +"items": [ + { + ... + "image_url": "http://localhost:8080/api/datasets/2/logo" + ... +(etc, etc) +``` + +For files: + +- "releaseOrCreateDate" +- "image_url" (optional) + +```javascript +"items": [ + { + "name": "test.png", + ... + "releaseOrCreateDate": "2016-05-10T12:53:39Z", + "image_url":"/api/access/datafile/42?imageThumb=true" +(etc, etc) +``` + +These examples are also shown in the [Search API](https://guides.dataverse.org/en/6.4/api/search.html) section of the API Guide. + +The image_url field was already part of the SolrSearchResult JSON (and incorrectly appeared in Search API documentation), but it wasn't returned by the API because it was appended only after the Solr query was executed in the SearchIncludeFragment of JSF (the old/current UI framework). Now, the field is set in SearchServiceBean, ensuring it is always returned by the API when an image is available. + +The Solr schema.xml file has been updated to include a new field called "dvParentAlias" for supporting the new response field "parentDataverseIdentifier". See upgrade instructions below. + +See also #10810 and #10811. + +### Search API: publicationStatuses + +The Search API (`/api/search`) response will now include publicationStatuses in the JSON response as long as the list is not empty. + +Example: + +```javascript +"items": [ + { + "name": "Darwin's Finches", + ... + "publicationStatuses": [ + "Unpublished", + "Draft" + ], +(etc, etc) +``` + +See also #10733 and #10738. + +### Search Facet Information Exposed + +A new endpoint `/api/datasetfields/facetables` lists all facetable dataset fields defined in the installation, as described in [the guides](https://guides.dataverse.org/en/6.4/api/native-api.html#list-all-facetable-dataset-fields). + +A new optional query parameter "returnDetails" added to `/api/dataverses/{identifier}/facets/` endpoint to include detailed information of each DataverseFacet, as described in [the guides](https://guides.dataverse.org/en/6.4/api/native-api.html#list-facets-configured-for-a-dataverse-collection). See also #10726 and #10727. + +### User Permissions on Collections + +A new endpoint at `/api/dataverses/{identifier}/userPermissions` for obtaining the user permissions on a collection (dataverse) has been added. See also [the guides](https://guides.dataverse.org/en/6.4/api/native-api.html#get-user-permissions-on-a-dataverse), #10749 and #10751. + +### addDataverse Extended + +The addDataverse (`/api/dataverses/{identifier}`) API endpoint has been extended to allow adding metadata blocks, input levels and facet IDs at creation time, as the Dataverse page in create mode does in JSF. See also [the guides](https://guides.dataverse.org/en/6.4/api/native-api.html#create-a-dataverse-collection), #10633 and #10644. + +### Metadata Blocks and Display on Create + +The `/api/dataverses/{identifier}/metadatablocks` endpoint has been fixed to not return fields marked as displayOnCreate=true if there is an input level with include=false, when query parameters returnDatasetFieldTypes=true and onlyDisplayedOnCreate=true are set. See also #10741 and #10767. + +The fields "depositor" and "dateOfDeposit" in the citation.tsv metadata block file have been updated to have the property "displayOnCreate" set to TRUE. In practice, only the API is affected because the UI has special logic that already shows these fields when datasets are created. See also and #10850 and #10884. + +### Feature Flags Can Be Listed + +It is now possible to list all feature flags and see if they are enabled or not. See also [the guides](https://guides.dataverse.org/en/6.4/api/native-api.html#list-all-feature-flags) and #10732. + +## Settings Added + +The following settings have been added: + +- dataverse.feature.disable-dataset-thumbnail-autoselect +- dataverse.feature.globus-use-experimental-async-framework +- dataverse.files.globus-monitoring-server +- dataverse.pid.*.crossref.url +- dataverse.pid.*.crossref.rest-api-url +- dataverse.pid.*.crossref.username +- dataverse.pid.*.crossref.password +- dataverse.pid.*.crossref.depositor +- dataverse.pid.*.crossref.depositor-email + +## Backward Incompatible Changes + +- The oai_dc export format has changed. See the "Remap oai_dc" section above. +- Several APIs related to DataCite have changed. See "More and Better Data Sent to DataCite" above. + +## Complete List of Changes + +For the complete list of code changes in this release, see the [6.4 milestone](https://github.com/IQSS/dataverse/issues?q=milestone%3A6.4+is%3Aclosed) in GitHub. + +## Getting Help + +For help with upgrading, installing, or general questions please post to the [Dataverse Community Google Group](https://groups.google.com/g/dataverse-community) or email support@dataverse.org. + +## Installation + +If this is a new installation, please follow our [Installation Guide](https://guides.dataverse.org/en/latest/installation/). Please don't be shy about [asking for help](https://guides.dataverse.org/en/latest/installation/intro.html#getting-help) if you need it! + +Once you are in production, we would be delighted to update our [map of Dataverse installations](https://dataverse.org/installations) around the world to include yours! Please [create an issue](https://github.com/IQSS/dataverse-installations/issues) or email us at support@dataverse.org to join the club! + +You are also very welcome to join the [Global Dataverse Community Consortium](https://www.gdcc.io/) (GDCC). + +## Upgrade Instructions + +Upgrading requires a maintenance window and downtime. Please plan accordingly, create backups of your database, etc. + +These instructions assume that you've already upgraded through all the 5.x releases and are now running Dataverse 6.3. + +0\. These instructions assume that you are upgrading from the immediate previous version. If you are running an earlier version, the only supported way to upgrade is to progress through the upgrades to all the releases in between before attempting the upgrade to this version. + +If you are running Payara as a non-root user (and you should be!), **remember not to execute the commands below as root**. Use `sudo` to change to that user first. For example, `sudo -i -u dataverse` if `dataverse` is your dedicated application user. + +In the following commands, we assume that Payara 6 is installed in `/usr/local/payara6`. If not, adjust as needed. + +```shell +export PAYARA=/usr/local/payara6` +``` + +(or `setenv PAYARA /usr/local/payara6` if you are using a `csh`-like shell) + +1\. Undeploy the previous version + +```shell +$PAYARA/bin/asadmin undeploy dataverse-6.3 +``` + +2\. Stop and start Payara + +```shell +service payara stop +sudo service payara start +``` + +3\. Deploy this version + +```shell +$PAYARA/bin/asadmin deploy dataverse-6.4.war +``` + +Note: if you have any trouble deploying, stop Payara, remove the following directories, start Payara, and try to deploy again. + +```shell +service payara stop +rm -rf $PAYARA/glassfish/domains/domain1/generated +rm -rf $PAYARA/glassfish/domains/domain1/osgi-cache +rm -rf $PAYARA/glassfish/domains/domain1/lib/databases +``` + +4\. For installations with internationalization: + +Please remember to update translations via [Dataverse language packs](https://github.com/GlobalDataverseCommunityConsortium/dataverse-language-packs). + +5\. Restart Payara + +```shell +service payara stop +service payara start +``` + +6\. Update metadata blocks + +These changes reflect incremental improvements made to the handling of core metadata fields. + +```shell +wget https://raw.githubusercontent.com/IQSS/dataverse/v6.4/scripts/api/data/metadatablocks/citation.tsv + +curl http://localhost:8080/api/admin/datasetfield/load -H "Content-type: text/tab-separated-values" -X POST --upload-file citation.tsv +``` + +7\. Update Solr schema.xml file. Start with the standard v6.4 schema.xml, then, if your installation uses any custom or experimental metadata blocks, update it to include the extra fields (step 7a). + +Stop Solr (usually `service solr stop`, depending on Solr installation/OS, see the [Installation Guide](https://guides.dataverse.org/en/6.4/installation/prerequisites.html#solr-init-script)). + +```shell +service solr stop +``` + +Replace schema.xml + +```shell +wget https://raw.githubusercontent.com/IQSS/dataverse/v6.4/conf/solr/schema.xml +cp schema.xml /usr/local/solr/solr-9.4.1/server/solr/collection1/conf +``` + +Start Solr (but if you use any custom metadata blocks, perform the next step, 7a first). + +```shell +service solr start +``` + +7a\. For installations with custom or experimental metadata blocks: + +Before starting Solr, update the schema to include all the extra metadata fields that your installation uses. We do this by collecting the output of the Dataverse schema API and feeding it to the `update-fields.sh` script that we supply, as in the example below (modify the command lines as needed to reflect the names of the directories, if different): + +```shell + wget https://raw.githubusercontent.com/IQSS/dataverse/v6.4/conf/solr/update-fields.sh + chmod +x update-fields.sh + curl "http://localhost:8080/api/admin/index/solr/schema" | ./update-fields.sh /usr/local/solr/solr-9.4.1/server/solr/collection1/conf/schema.xml +``` + +Now start Solr. + +8\. Reindex Solr + +Below is the simplest way to reindex Solr: + +```shell +curl http://localhost:8080/api/admin/index +``` + +The API above rebuilds the existing index "in place". If you want to be absolutely sure that your index is up-to-date and consistent, you may consider wiping it clean and reindexing everything from scratch (see [the guides](https://guides.dataverse.org/en/latest/admin/solr-search-index.html)). Just note that, depending on the size of your database, a full reindex may take a while and the users will be seeing incomplete search results during that window. + +9\. Run reExportAll to update dataset metadata exports + +This step is necessary because of changes described above for the `Datacite` and `oai_dc` export formats. + +Below is the simple way to reexport all dataset metadata. For more advanced usage, please see [the guides](http://guides.dataverse.org/en/6.4/admin/metadataexport.html#batch-exports-through-the-api). + +```shell +curl http://localhost:8080/api/admin/metadata/reExportAll +``` + +10\. Pushing updated metadata to DataCite + +(If you don't use DataCite, you can skip this.) + +Above you updated the citation metadata block and Solr with the new "relationType" field. With these two changes, the "Relation Type" fields will be available and creation/publication of datasets will result in the expanded XML being sent to DataCite. You've also already run "reExportAll" to update the `Datacite` metadata export format. + +Entries at DataCite for published datasets can be updated by a superuser using an API call (newly [documented](https://guides.dataverse.org/en/6.4/admin/dataverses-datasets.html#update-metadata-for-all-published-datasets-at-the-pid-provider)): + +`curl -X POST -H 'X-Dataverse-key:' http://localhost:8080/api/datasets/modifyRegistrationPIDMetadataAll` + +This will loop through all published datasets (and released files with PIDs). As long as the loop completes, the call will return a 200/OK response. Any PIDs for which the update fails can be found using the following command: + +`grep 'Failure for id' server.log` + +Failures may occur if PIDs were never registered, or if they were never made findable. Any such cases can be fixed manually in DataCite Fabrica or using the [Reserve a PID](https://guides.dataverse.org/en/6.4/api/native-api.html#reserve-a-pid) API call and the newly documented `/api/datasets//modifyRegistration` call respectively. See https://guides.dataverse.org/en/6.4/admin/dataverses-datasets.html#send-dataset-metadata-to-pid-provider. Please reach out with any questions. + +PIDs can also be updated by a superuser on a per-dataset basis using + +`curl -X POST -H 'X-Dataverse-key:' http://localhost:8080/api/datasets//modifyRegistrationMetadata` + +### Additional Upgrade Steps + +11\. If there are broken thumbnails + +To restore any broken thumbnails caused by the bug described above, you can call the `http://localhost:8080/api/admin/clearThumbnailFailureFlag` API, which will attempt to clear the flag on all files (regardless of whether caused by this bug or some other problem with the file) or the `http://localhost:8080/api/admin/clearThumbnailFailureFlag/$FILE_ID` to clear the flag for individual files. Calling the former, batch API is recommended. + +12\. PermaLinks with custom base-url + +If you currently use PermaLinks with a custom `base-url`: You must manually append `/citation?persistentId=` to the base URL to maintain functionality. + +If you use a PermaLinks without a configured `base-url`, no changes are required. diff --git a/doc/release-notes/7068-reserve-file-pids.md b/doc/release-notes/7068-reserve-file-pids.md deleted file mode 100644 index 182a0d7f67b..00000000000 --- a/doc/release-notes/7068-reserve-file-pids.md +++ /dev/null @@ -1,9 +0,0 @@ -## Release Highlights - -### Pre-Publish File DOI Reservation with DataCite - -Dataverse installations using DataCite (or other persistent identifier (PID) Providers that support reserving PIDs) will be able to reserve PIDs for files when they are uploaded (rather than at publication time). Note that reserving file DOIs can slow uploads with large numbers of files so administrators may need to adjust timeouts (specifically any Apache "``ProxyPass / ajp://localhost:8009/ timeout=``" setting in the recommended Dataverse configuration). - -## Major Use Cases - -- Users will have DOIs/PIDs reserved for their files as part of file upload instead of at publication time. (Issue #7068, PR #7334) diff --git a/doc/release-notes/8129-harvesting.md b/doc/release-notes/8129-harvesting.md deleted file mode 100644 index 63ca8744941..00000000000 --- a/doc/release-notes/8129-harvesting.md +++ /dev/null @@ -1,18 +0,0 @@ -### Remap oai_dc export and harvesting format fields: dc:type and dc:date - -The `oai_dc` export and harvesting format has had the following fields remapped: - -- dc:type was mapped to the field "Kind of Data". Now it is hard-coded to the word "Dataset". -- dc:date was mapped to the field "Production Date" when available and otherwise to "Publication Date". Now it is mapped the field "Publication Date" or the field used for the citation date, if set (see [Set Citation Date Field Type for a Dataset](https://guides.dataverse.org/en/6.3/api/native-api.html#set-citation-date-field-type-for-a-dataset)). - -In order for these changes to be reflected in existing datasets, a [reexport all](https://guides.dataverse.org/en/6.3/admin/metadataexport.html#batch-exports-through-the-api) should be run. - -For more information, please see #8129 and #10737. - -### Backward incompatible changes - -See the "Remap oai_dc export" section above. - -### Upgrade instructions - -In order for changes to the `oai_dc` metadata export format to be reflected in existing datasets, a [reexport all](https://guides.dataverse.org/en/6.3/admin/metadataexport.html#batch-exports-through-the-api) should be run. diff --git a/doc/release-notes/8578-support-for-iso-639-3-languages.md b/doc/release-notes/8578-support-for-iso-639-3-languages.md deleted file mode 100644 index c702b6b8a59..00000000000 --- a/doc/release-notes/8578-support-for-iso-639-3-languages.md +++ /dev/null @@ -1,17 +0,0 @@ -The Controlled Vocabulary Values list for the metadata field Language in the Citation block has now been extended to include roughly 7920 ISO 639-3 values. -- Some of the language entries in the pre-v.6.4 list correspond to "macro languages" in ISO-639-3 and admins/users may wish to update to use the corresponding individual language entries from ISO-639-3. As these cases are expected to be rare (they do not involve major world languages), finding them is not covered in the release notes. Anyone who desires help in this area is encouraged to reach out to the Dataverse community via any of the standard communication channels. -- ISO 639-3 codes were downloaded from: -``` -https://iso639-3.sil.org/code_tables/download_tables#Complete%20Code%20Tables:~:text=iso%2D639%2D3_Code_Tables_20240415.zip -``` -- The file used for merging with the existing citation.tsv was iso-639-3.tab - -To be added to the 6.4 release instructions: - -### Additional Upgrade Steps -6\. Update the Citation metadata block: - -``` -- `wget https://github.com/IQSS/dataverse/releases/download/v6.4/citation.tsv` -- `curl http://localhost:8080/api/admin/datasetfield/load -X POST --data-binary @citation.tsv -H "Content-type: text/tab-separated-values"` -``` diff --git a/doc/release-notes/8581-add-crossref-pid-provider.md b/doc/release-notes/8581-add-crossref-pid-provider.md deleted file mode 100644 index 3610aa6d7cc..00000000000 --- a/doc/release-notes/8581-add-crossref-pid-provider.md +++ /dev/null @@ -1,3 +0,0 @@ -Added CrossRef DOI Pid Provider - -See Installation Configuration document for JVM Settings to enable CrossRef as a Pid Provider diff --git a/doc/release-notes/8796-fix-license-display-indexing.md b/doc/release-notes/8796-fix-license-display-indexing.md deleted file mode 100644 index ebded088875..00000000000 --- a/doc/release-notes/8796-fix-license-display-indexing.md +++ /dev/null @@ -1 +0,0 @@ -When datasets have neither a license nor custom terms of use the display will indicate this. Also, these datasets will no longer be indexed as having custom terms. diff --git a/doc/release-notes/8945-ignore-shapefiles-under-hidden-directories-in-zip.md b/doc/release-notes/8945-ignore-shapefiles-under-hidden-directories-in-zip.md deleted file mode 100644 index 145ae5f6d55..00000000000 --- a/doc/release-notes/8945-ignore-shapefiles-under-hidden-directories-in-zip.md +++ /dev/null @@ -1,5 +0,0 @@ -### Shapefile Handling will now ignore files under a hidden directory within the zip file - -Directories that are hidden will be ignored when determining if a zip file contains Shapefile files. - -For more information, see #8945. \ No newline at end of file diff --git a/doc/release-notes/9081-CC0-waiver-turned-into-custom-license.md b/doc/release-notes/9081-CC0-waiver-turned-into-custom-license.md deleted file mode 100644 index 042b2ec39fd..00000000000 --- a/doc/release-notes/9081-CC0-waiver-turned-into-custom-license.md +++ /dev/null @@ -1,6 +0,0 @@ -In an earlier Dataverse release, Datasets with only 'CC0 Waiver' in termsofuse field were converted to 'Custom License' instead of CC0 1.0 licenses during an automated process. A new process was added to correct this. Only Datasets with no terms other than the one create by the previous process will be modified. -- The existing 'Terms of Use' must be equal to 'This dataset is made available under a Creative Commons CC0 license with the following additional/modified terms and conditions: CC0 Waiver' -- The following terms fields must be empty: Confidentiality Declaration, Special Permissions, Restrictions, Citation Requirements, Depositor Requirements, Conditions, and Disclaimer. -- The License ID must not be assigned. - -This process will set the License ID to that of the CC0 1.0 license and remove the contents of termsofuse field. diff --git a/doc/release-notes/9317-delete-saved-search.md b/doc/release-notes/9317-delete-saved-search.md deleted file mode 100644 index 34723801036..00000000000 --- a/doc/release-notes/9317-delete-saved-search.md +++ /dev/null @@ -1,4 +0,0 @@ -### Saved search deletion - -Saved searches can now be removed using API `/api/admin/savedsearches/$id`. See PR #10198. -This is reflected in the [Saved Search Native API section](https://dataverse-guide--10198.org.readthedocs.build/en/10198/api/native-api.html#saved-search) of the Guide. \ No newline at end of file diff --git a/doc/release-notes/api-blocking-filter-json.md b/doc/release-notes/api-blocking-filter-json.md deleted file mode 100644 index 337ff82dd8b..00000000000 --- a/doc/release-notes/api-blocking-filter-json.md +++ /dev/null @@ -1,3 +0,0 @@ -* When any `ApiBlockingFilter` policy applies to a request, the JSON in the body of the error response is now valid JSON. - In case an API client did any special processing to allow it to parse the body, that is no longer necessary. - The status code of such responses has not changed. diff --git a/doc/release-notes/make-data-count-.md b/doc/release-notes/make-data-count-.md deleted file mode 100644 index 9022582dddb..00000000000 --- a/doc/release-notes/make-data-count-.md +++ /dev/null @@ -1,3 +0,0 @@ -### Counter Processor 1.05 Support - -This release includes support for counter-processor-1.05 for processing Make Data Count metrics. If you are running Make Data Counts support, you should reinstall/reconfigure counter-processor as described in the latest Guides. (For existing installations, note that counter-processor-1.05 requires a Python3, so you will need to follow the full counter-processor install. Also note that if you configure the new version the same way, it will reprocess the days in the current month when it is first run. This is normal and will not affect the metrics in Dataverse.) diff --git a/doc/release-notes/permalink-base-urls.md b/doc/release-notes/permalink-base-urls.md deleted file mode 100644 index 1dd74057351..00000000000 --- a/doc/release-notes/permalink-base-urls.md +++ /dev/null @@ -1,10 +0,0 @@ -The configuration setting `dataverse.pid.*.permalink.base-url`, which is used for PermaLinks, has been updated to -support greater flexibility. Previously, the string "/citation?persistentId=" was automatically appended to the -configured base URL. With this update, the base URL will now be used exactly as configured, without any automatic -additions. - -**Upgrade instructions:** - -- If you currently use a PermaLink provider with a configured `base-url`: You must manually append - "/citation?persistentId=" to the existing base URL to maintain functionality. -- If you use a PermaLink provider without a configured `base-url`: No changes are required. \ No newline at end of file diff --git a/doc/sphinx-guides/source/api/native-api.rst b/doc/sphinx-guides/source/api/native-api.rst index 117aceb141d..f8b8620f121 100644 --- a/doc/sphinx-guides/source/api/native-api.rst +++ b/doc/sphinx-guides/source/api/native-api.rst @@ -887,7 +887,7 @@ Before calling the API, make sure the data files referenced by the ``POST``\ ed * This API does not cover staging files (with correct contents, checksums, sizes, etc.) in the corresponding places in the Dataverse installation's filestore. * This API endpoint does not support importing *files'* persistent identifiers. - * A Dataverse installation can import datasets with a valid PID that uses a different protocol or authority than said server is configured for. However, the server will not update the PID metadata on subsequent update and publish actions. + * A Dataverse installation can only import datasets with a valid PID that is managed by one of the PID providers that said installation is configured for. .. _import-dataset-with-type: @@ -935,7 +935,7 @@ Note that DDI XML does not have a field that corresponds to the "Subject" field .. warning:: * This API does not handle files related to the DDI file. - * A Dataverse installation can import datasets with a valid PID that uses a different protocol or authority than said server is configured for. However, the server will not update the PID metadata on subsequent update and publish actions. + * A Dataverse installation can only import datasets with a valid PID that is managed by one of the PID providers that said installation is configured for. .. _publish-dataverse-api: @@ -4412,6 +4412,12 @@ In order to obtain a new token use:: curl -H "X-Dataverse-key:$API_TOKEN" -X POST "$SERVER_URL/api/users/token/recreate" +This endpoint by default will return a response message indicating the user identifier and the new token. + +To also include the expiration time in the response message, the query parameter ``returnExpiration`` must be set to true:: + + curl -H "X-Dataverse-key:$API_TOKEN" -X POST "$SERVER_URL/api/users/token/recreate?returnExpiration=true" + Delete a Token ~~~~~~~~~~~~~~ diff --git a/doc/sphinx-guides/source/conf.py b/doc/sphinx-guides/source/conf.py index c719fb05e3c..7ee355302d8 100755 --- a/doc/sphinx-guides/source/conf.py +++ b/doc/sphinx-guides/source/conf.py @@ -68,9 +68,9 @@ # built documents. # # The short X.Y version. -version = '6.3' +version = '6.4' # The full version, including alpha/beta/rc tags. -release = '6.3' +release = '6.4' # The language for content autogenerated by Sphinx. Refer to documentation # for a list of supported languages. diff --git a/doc/sphinx-guides/source/contributor/documentation.md b/doc/sphinx-guides/source/contributor/documentation.md index 96277c3b373..2a8d6794921 100644 --- a/doc/sphinx-guides/source/contributor/documentation.md +++ b/doc/sphinx-guides/source/contributor/documentation.md @@ -56,7 +56,7 @@ In case you decide to use a Sphinx Docker container to build the guides, you can ### Installing Sphinx -First, make a fork of https://github.com/IQSS/dataverse and clone your fork locally. Then change to the ``doc/sphinx-guides`` directory. +First, make a fork of and clone your fork locally. Then change to the ``doc/sphinx-guides`` directory. ``cd doc/sphinx-guides`` @@ -83,7 +83,7 @@ On a Mac we recommend installing GraphViz through [Homebrew](). To edit the existing documentation: -- Create a branch (see :ref:`how-to-make-a-pull-request`). +- Create a branch (see {ref}`how-to-make-a-pull-request`). - In ``doc/sphinx-guides/source`` you will find the .rst files that correspond to https://guides.dataverse.org. - Using your preferred text editor, open and edit the necessary files, or create new ones. diff --git a/doc/sphinx-guides/source/developers/big-data-support.rst b/doc/sphinx-guides/source/developers/big-data-support.rst index 4aaed10512e..759dd40413b 100644 --- a/doc/sphinx-guides/source/developers/big-data-support.rst +++ b/doc/sphinx-guides/source/developers/big-data-support.rst @@ -187,3 +187,5 @@ As described in that document, Globus transfers can be initiated by choosing the An overview of the control and data transfer interactions between components was presented at the 2022 Dataverse Community Meeting and can be viewed in the `Integrations and Tools Session Video `_ around the 1 hr 28 min mark. See also :ref:`Globus settings <:GlobusSettings>`. + +An alternative, experimental implementation of Globus polling of ongoing upload transfers has been added in v6.4. This framework does not rely on the instance staying up continuously for the duration of the transfer and saves the state information about Globus upload requests in the database. Due to its experimental nature it is not enabled by default. See the ``globus-use-experimental-async-framework`` feature flag (see :ref:`feature-flags`) and the JVM option :ref:`dataverse.files.globus-monitoring-server`. diff --git a/doc/sphinx-guides/source/developers/globus-api.rst b/doc/sphinx-guides/source/developers/globus-api.rst index 902fc9db2ee..43c237546be 100644 --- a/doc/sphinx-guides/source/developers/globus-api.rst +++ b/doc/sphinx-guides/source/developers/globus-api.rst @@ -185,6 +185,8 @@ As the transfer can take significant time and the API call is asynchronous, the Once the transfer completes, Dataverse will remove the write permission for the principal. +An alternative, experimental implementation of Globus polling of ongoing upload transfers has been added in v6.4. This new framework does not rely on the instance staying up continuously for the duration of the transfer and saves the state information about Globus upload requests in the database. Due to its experimental nature it is not enabled by default. See the ``globus-use-experimental-async-framework`` feature flag (see :ref:`feature-flags`) and the JVM option :ref:`dataverse.files.globus-monitoring-server`. + Note that when using a managed endpoint that uses the Globus S3 Connector, the checksum should be correct as Dataverse can validate it. For file-based endpoints, the checksum should be included if available but Dataverse cannot verify it. In the remote/reference case, where there is no transfer to monitor, the standard /addFiles API call (see :ref:`direct-add-to-dataset-api`) is used instead. There are no changes for the Globus case. diff --git a/doc/sphinx-guides/source/installation/config.rst b/doc/sphinx-guides/source/installation/config.rst index 8fff2dc33ad..e98ed8f5189 100644 --- a/doc/sphinx-guides/source/installation/config.rst +++ b/doc/sphinx-guides/source/installation/config.rst @@ -3312,6 +3312,13 @@ The email for your institution that you'd like to appear in bag-info.txt. See :r Can also be set via *MicroProfile Config API* sources, e.g. the environment variable ``DATAVERSE_BAGIT_SOURCEORG_EMAIL``. +.. _dataverse.files.globus-monitoring-server: + +dataverse.files.globus-monitoring-server +++++++++++++++++++++++++++++++++++++++++ + +This setting is required in conjunction with the ``globus-use-experimental-async-framework`` feature flag (see :ref:`feature-flags`). Setting it to true designates the Dataverse instance to serve as the dedicated polling server. It is needed so that the new framework can be used in a multi-node installation. + .. _feature-flags: Feature Flags @@ -3348,7 +3355,10 @@ please find all known feature flags below. Any of these flags can be activated u - Removes the reason field in the `Publish/Return To Author` dialog that was added as a required field in v6.2 and makes the reason an optional parameter in the :ref:`return-a-dataset` API call. - ``Off`` * - disable-dataset-thumbnail-autoselect - - Turns off automatic selection of a dataset thumbnail from image files in that dataset. When set to ``On``, a user can still manually pick a thumbnail image, or upload a dedicated thumbnail image. + - Turns off automatic selection of a dataset thumbnail from image files in that dataset. When set to ``On``, a user can still manually pick a thumbnail image or upload a dedicated thumbnail image. + - ``Off`` + * - globus-use-experimental-async-framework + - Activates a new experimental implementation of Globus polling of ongoing remote data transfers that does not rely on the instance staying up continuously for the duration of the transfers and saves the state information about Globus upload requests in the database. Added in v6.4. Affects :ref:`:GlobusPollingInterval`. Note that the JVM option :ref:`dataverse.files.globus-monitoring-server` described above must also be enabled on one (and only one, in a multi-node installation) Dataverse instance. - ``Off`` **Note:** Feature flags can be set via any `supported MicroProfile Config API source`_, e.g. the environment variable @@ -4828,10 +4838,12 @@ The list of parent dataset field names for which the LDN Announce workflow step The URL where the `dataverse-globus `_ "transfer" app has been deployed to support Globus integration. See :ref:`globus-support` for details. +.. _:GlobusPollingInterval: + :GlobusPollingInterval ++++++++++++++++++++++ -The interval in seconds between Dataverse calls to Globus to check on upload progress. Defaults to 50 seconds. See :ref:`globus-support` for details. +The interval in seconds between Dataverse calls to Globus to check on upload progress. Defaults to 50 seconds (or to 10 minutes, when the ``globus-use-experimental-async-framework`` feature flag is enabled). See :ref:`globus-support` for details. :GlobusSingleFileTransfer +++++++++++++++++++++++++ diff --git a/doc/sphinx-guides/source/versions.rst b/doc/sphinx-guides/source/versions.rst index 952eba72616..800bdc6e0f9 100755 --- a/doc/sphinx-guides/source/versions.rst +++ b/doc/sphinx-guides/source/versions.rst @@ -7,7 +7,8 @@ Dataverse Software Documentation Versions This list provides a way to refer to the documentation for previous and future versions of the Dataverse Software. In order to learn more about the updates delivered from one version to another, visit the `Releases `__ page in our GitHub repo. - pre-release `HTML (not final!) `__ and `PDF (experimental!) `__ built from the :doc:`develop ` branch :doc:`(how to contribute!) ` -- 6.3 +- 6.4 +- `6.3 `__ - `6.2 `__ - `6.1 `__ - `6.0 `__ diff --git a/modules/dataverse-parent/pom.xml b/modules/dataverse-parent/pom.xml index 6bea02569ec..5abf2763128 100644 --- a/modules/dataverse-parent/pom.xml +++ b/modules/dataverse-parent/pom.xml @@ -131,7 +131,7 @@ - 6.3 + 6.4 17 UTF-8 @@ -447,7 +447,7 @@ (These properties are provided by the build-helper plugin below.) --> ${parsedVersion.majorVersion}.${parsedVersion.nextMinorVersion} - + diff --git a/scripts/api/data/metadatablocks/citation.tsv b/scripts/api/data/metadatablocks/citation.tsv index a9ea2b9ca0e..abc09465603 100644 --- a/scripts/api/data/metadatablocks/citation.tsv +++ b/scripts/api/data/metadatablocks/citation.tsv @@ -59,8 +59,8 @@ distributorURL URL The URL of the distributor's webpage https:// url 55 #VALUE FALSE FALSE FALSE FALSE FALSE FALSE distributor citation distributorLogoURL Logo URL The URL of the distributor's logo image, used to show the image on the Dataset's page https:// url 56
FALSE FALSE FALSE FALSE FALSE FALSE distributor citation distributionDate Distribution Date The date when the Dataset was made available for distribution/presentation YYYY-MM-DD date 57 TRUE FALSE FALSE TRUE FALSE FALSE citation - depositor Depositor The entity, such as a person or organization, that deposited the Dataset in the repository 1) FamilyName, GivenName or 2) Organization text 58 FALSE FALSE FALSE FALSE FALSE FALSE citation - dateOfDeposit Deposit Date The date when the Dataset was deposited into the repository YYYY-MM-DD date 59 FALSE FALSE FALSE TRUE FALSE FALSE citation http://purl.org/dc/terms/dateSubmitted + depositor Depositor The entity, such as a person or organization, that deposited the Dataset in the repository 1) FamilyName, GivenName or 2) Organization text 58 FALSE FALSE FALSE FALSE TRUE FALSE citation + dateOfDeposit Deposit Date The date when the Dataset was deposited into the repository YYYY-MM-DD date 59 FALSE FALSE FALSE TRUE TRUE FALSE citation http://purl.org/dc/terms/dateSubmitted timePeriodCovered Time Period The time period that the data refer to. Also known as span. This is the time period covered by the data, not the dates of coding, collecting data, or making documents machine-readable none 60 ; FALSE FALSE TRUE FALSE FALSE FALSE citation https://schema.org/temporalCoverage timePeriodCoveredStart Start Date The start date of the time period that the data refer to YYYY-MM-DD date 61 #NAME: #VALUE TRUE FALSE FALSE TRUE FALSE FALSE timePeriodCovered citation timePeriodCoveredEnd End Date The end date of the time period that the data refer to YYYY-MM-DD date 62 #NAME: #VALUE TRUE FALSE FALSE TRUE FALSE FALSE timePeriodCovered citation diff --git a/src/main/java/edu/harvard/iq/dataverse/DatasetFieldCompoundValue.java b/src/main/java/edu/harvard/iq/dataverse/DatasetFieldCompoundValue.java index c679cd7edad..c03baec73af 100644 --- a/src/main/java/edu/harvard/iq/dataverse/DatasetFieldCompoundValue.java +++ b/src/main/java/edu/harvard/iq/dataverse/DatasetFieldCompoundValue.java @@ -225,6 +225,15 @@ public Pair getLinkComponents() { return linkComponents.get(parentDatasetField.getDatasetFieldType().getName()); } + public boolean hasChildOfType(String name) { + for (DatasetField child : childDatasetFields) { + if (child.getDatasetFieldType().getName().equals(name)) { + return true; + } + } + return false; + } + private Map removeLastComma(Map mapIn) { Iterator> itr = mapIn.entrySet().iterator(); diff --git a/src/main/java/edu/harvard/iq/dataverse/DatasetFieldServiceBean.java b/src/main/java/edu/harvard/iq/dataverse/DatasetFieldServiceBean.java index 8cc86626c6a..ff78b0c83ec 100644 --- a/src/main/java/edu/harvard/iq/dataverse/DatasetFieldServiceBean.java +++ b/src/main/java/edu/harvard/iq/dataverse/DatasetFieldServiceBean.java @@ -85,6 +85,9 @@ public class DatasetFieldServiceBean implements java.io.Serializable { //Note that for primitive fields, the prent and term-uri-field are the same and these maps have the same entry Map cvocMapByTermUri = null; + //Flat list of cvoc term-uri and managed fields by Id + Set cvocFieldSet = null; + //The hash of the existing CVocConf setting. Used to determine when the setting has changed and it needs to be re-parsed to recreate the cvocMaps String oldHash = null; @@ -278,6 +281,10 @@ public Map getCVocConf(boolean byTermUriField){ String cvocSetting = settingsService.getValueForKey(SettingsServiceBean.Key.CVocConf); if (cvocSetting == null || cvocSetting.isEmpty()) { oldHash=null; + //Release old maps + cvocMap=null; + cvocMapByTermUri=null; + cvocFieldSet = null; return new HashMap<>(); } String newHash = DigestUtils.md5Hex(cvocSetting); @@ -287,6 +294,7 @@ public Map getCVocConf(boolean byTermUriField){ oldHash=newHash; cvocMap=new HashMap<>(); cvocMapByTermUri=new HashMap<>(); + cvocFieldSet = new HashSet<>(); try (JsonReader jsonReader = Json.createReader(new StringReader(settingsService.getValueForKey(SettingsServiceBean.Key.CVocConf)))) { JsonArray cvocConfJsonArray = jsonReader.readArray(); @@ -303,11 +311,13 @@ public Map getCVocConf(boolean byTermUriField){ if (termUriField.equals(dft.getName())) { logger.fine("Found primitive field for term uri : " + dft.getName() + ": " + dft.getId()); cvocMapByTermUri.put(dft.getId(), jo); + cvocFieldSet.add(dft.getId()); } } else { DatasetFieldType childdft = findByNameOpt(jo.getString("term-uri-field")); logger.fine("Found term child field: " + childdft.getName()+ ": " + childdft.getId()); cvocMapByTermUri.put(childdft.getId(), jo); + cvocFieldSet.add(childdft.getId()); if (childdft.getParentDatasetFieldType() != dft) { logger.warning("Term URI field (" + childdft.getDisplayName() + ") not a child of parent: " + dft.getDisplayName()); @@ -327,6 +337,7 @@ public Map getCVocConf(boolean byTermUriField){ + managedFields.getString(s)); } else { logger.fine("Found: " + dft.getName()); + cvocFieldSet.add(dft.getId()); } } } @@ -338,6 +349,10 @@ public Map getCVocConf(boolean byTermUriField){ return byTermUriField ? cvocMapByTermUri : cvocMap; } + public Set getCvocFieldSet() { + return cvocFieldSet; + } + /** * Adds information about the external vocabulary term being used in this DatasetField to the ExternalVocabularyValue table if it doesn't already exist. * @param df - the primitive/parent compound field containing a newly saved value @@ -468,7 +483,8 @@ public JsonObject getExternalVocabularyValue(String termUri) { logger.warning("Problem parsing external vocab value for uri: " + termUri + " : " + e.getMessage()); } } catch (NoResultException nre) { - logger.warning("No external vocab value for uri: " + termUri); + //Could just be a plain text value + logger.fine("No external vocab value for uri: " + termUri); } return null; } diff --git a/src/main/java/edu/harvard/iq/dataverse/DatasetServiceBean.java b/src/main/java/edu/harvard/iq/dataverse/DatasetServiceBean.java index ca044c1554d..e519614ba55 100644 --- a/src/main/java/edu/harvard/iq/dataverse/DatasetServiceBean.java +++ b/src/main/java/edu/harvard/iq/dataverse/DatasetServiceBean.java @@ -412,12 +412,20 @@ public boolean checkDatasetLock(Long datasetId) { List lock = lockCounter.getResultList(); return lock.size()>0; } - + + public List getLocksByDatasetId(Long datasetId) { + TypedQuery locksQuery = em.createNamedQuery("DatasetLock.getLocksByDatasetId", DatasetLock.class); + locksQuery.setParameter("datasetId", datasetId); + return locksQuery.getResultList(); + } + public List getDatasetLocksByUser( AuthenticatedUser user) { return listLocks(null, user); } + // @todo: we'll be better off getting rid of this method and using the other + // version of addDatasetLock() (that uses datasetId instead of Dataset). @TransactionAttribute(TransactionAttributeType.REQUIRES_NEW) public DatasetLock addDatasetLock(Dataset dataset, DatasetLock lock) { lock.setDataset(dataset); @@ -467,6 +475,7 @@ public DatasetLock addDatasetLock(Long datasetId, DatasetLock.Reason reason, Lon * is {@code aReason}. * @param dataset the dataset whose locks (for {@code aReason}) will be removed. * @param aReason The reason of the locks that will be removed. + * @todo this should probably take dataset_id, not a dataset */ @TransactionAttribute(TransactionAttributeType.REQUIRES_NEW) public void removeDatasetLocks(Dataset dataset, DatasetLock.Reason aReason) { diff --git a/src/main/java/edu/harvard/iq/dataverse/EditDatafilesPage.java b/src/main/java/edu/harvard/iq/dataverse/EditDatafilesPage.java index 8953c16aa6e..87997731642 100644 --- a/src/main/java/edu/harvard/iq/dataverse/EditDatafilesPage.java +++ b/src/main/java/edu/harvard/iq/dataverse/EditDatafilesPage.java @@ -2127,8 +2127,12 @@ public void handleFileUpload(FileUploadEvent event) throws IOException { } /** - * Using information from the DropBox choose, ingest the chosen files - * https://www.dropbox.com/developers/dropins/chooser/js + * External, aka "Direct" Upload. + * The file(s) have been uploaded to physical storage (such as S3) directly, + * this call is to create and add the DataFiles to the Dataset on the Dataverse + * side. The method does NOT finalize saving the datafiles in the database - + * that will happen when the user clicks 'Save', similar to how the "normal" + * uploads are handled. * * @param event */ diff --git a/src/main/java/edu/harvard/iq/dataverse/ExternalFileUploadInProgress.java b/src/main/java/edu/harvard/iq/dataverse/ExternalFileUploadInProgress.java new file mode 100644 index 00000000000..c90fdc6edc2 --- /dev/null +++ b/src/main/java/edu/harvard/iq/dataverse/ExternalFileUploadInProgress.java @@ -0,0 +1,110 @@ +package edu.harvard.iq.dataverse; + +import jakarta.persistence.Column; +import jakarta.persistence.Index; +import jakarta.persistence.NamedQueries; +import jakarta.persistence.NamedQuery; +import jakarta.persistence.Table; +import java.io.Serializable; +import jakarta.persistence.Entity; +import jakarta.persistence.GeneratedValue; +import jakarta.persistence.GenerationType; +import jakarta.persistence.Id; + +/** + * + * @author landreev + * + * The name of the class is provisional. I'm open to better-sounding alternatives, + * if anyone can think of any. + * But I wanted to avoid having the word "Globus" in the entity name. I'm adding + * it specifically for the Globus use case. But I'm guessing there's a chance + * this setup may come in handy for other types of datafile uploads that happen + * externally. (?) + */ +@NamedQueries({ + @NamedQuery(name = "ExternalFileUploadInProgress.deleteByTaskId", + query = "DELETE FROM ExternalFileUploadInProgress f WHERE f.taskId=:taskId"), + @NamedQuery(name = "ExternalFileUploadInProgress.findByTaskId", + query = "SELECT f FROM ExternalFileUploadInProgress f WHERE f.taskId=:taskId")}) +@Entity +@Table(indexes = {@Index(columnList="taskid")}) +public class ExternalFileUploadInProgress implements Serializable { + + private static final long serialVersionUID = 1L; + @Id + @GeneratedValue(strategy = GenerationType.IDENTITY) + private Long id; + + public Long getId() { + return id; + } + + public void setId(Long id) { + this.id = id; + } + + /** + * Rather than saving various individual fields defining the datafile, + * which would essentially replicate the DataFile table, we are simply + * storing the full json record as passed to the API here. + */ + @Column(columnDefinition = "TEXT", nullable=false) + private String fileInfo; + + /** + * This is Globus-specific task id associated with the upload in progress + */ + @Column(nullable=false) + private String taskId; + + public ExternalFileUploadInProgress() { + } + + public ExternalFileUploadInProgress(String taskId, String fileInfo) { + this.taskId = taskId; + this.fileInfo = fileInfo; + } + + public String getFileInfo() { + return fileInfo; + } + + public void setFileInfo(String fileInfo) { + this.fileInfo = fileInfo; + } + + public String getTaskId() { + return taskId; + } + + public void setTaskId(String taskId) { + this.taskId = taskId; + } + + @Override + public int hashCode() { + int hash = 0; + hash += (id != null ? id.hashCode() : 0); + return hash; + } + + @Override + public boolean equals(Object object) { + // TODO: Warning - this method won't work in the case the id fields are not set + if (!(object instanceof ExternalFileUploadInProgress)) { + return false; + } + ExternalFileUploadInProgress other = (ExternalFileUploadInProgress) object; + if ((this.id == null && other.id != null) || (this.id != null && !this.id.equals(other.id))) { + return false; + } + return true; + } + + @Override + public String toString() { + return "edu.harvard.iq.dataverse.ExternalFileUploadInProgress[ id=" + id + " ]"; + } + +} diff --git a/src/main/java/edu/harvard/iq/dataverse/MailServiceBean.java b/src/main/java/edu/harvard/iq/dataverse/MailServiceBean.java index aee51ed573a..2995c0c5f47 100644 --- a/src/main/java/edu/harvard/iq/dataverse/MailServiceBean.java +++ b/src/main/java/edu/harvard/iq/dataverse/MailServiceBean.java @@ -624,6 +624,7 @@ public String getMessageTextBasedOnNotification(UserNotification userNotificatio comment )) ; return downloadCompletedMessage; + case GLOBUSUPLOADCOMPLETEDWITHERRORS: dataset = (Dataset) targetObject; messageText = BundleUtil.getStringFromBundle("notification.email.greeting.html"); @@ -634,8 +635,30 @@ public String getMessageTextBasedOnNotification(UserNotification userNotificatio comment )) ; return uploadCompletedWithErrorsMessage; + + case GLOBUSUPLOADREMOTEFAILURE: + dataset = (Dataset) targetObject; + messageText = BundleUtil.getStringFromBundle("notification.email.greeting.html"); + String uploadFailedRemotelyMessage = messageText + BundleUtil.getStringFromBundle("notification.mail.globus.upload.failedRemotely", Arrays.asList( + systemConfig.getDataverseSiteUrl(), + dataset.getGlobalId().asString(), + dataset.getDisplayName(), + comment + )) ; + return uploadFailedRemotelyMessage; - case GLOBUSDOWNLOADCOMPLETEDWITHERRORS: + case GLOBUSUPLOADLOCALFAILURE: + dataset = (Dataset) targetObject; + messageText = BundleUtil.getStringFromBundle("notification.email.greeting.html"); + String uploadFailedLocallyMessage = messageText + BundleUtil.getStringFromBundle("notification.mail.globus.upload.failedLocally", Arrays.asList( + systemConfig.getDataverseSiteUrl(), + dataset.getGlobalId().asString(), + dataset.getDisplayName(), + comment + )) ; + return uploadFailedLocallyMessage; + + case GLOBUSDOWNLOADCOMPLETEDWITHERRORS: dataset = (Dataset) targetObject; messageText = BundleUtil.getStringFromBundle("notification.email.greeting.html"); String downloadCompletedWithErrorsMessage = messageText + BundleUtil.getStringFromBundle("notification.mail.globus.download.completedWithErrors", Arrays.asList( @@ -764,6 +787,8 @@ public Object getObjectOfNotification (UserNotification userNotification){ return versionService.find(userNotification.getObjectId()); case GLOBUSUPLOADCOMPLETED: case GLOBUSUPLOADCOMPLETEDWITHERRORS: + case GLOBUSUPLOADREMOTEFAILURE: + case GLOBUSUPLOADLOCALFAILURE: case GLOBUSDOWNLOADCOMPLETED: case GLOBUSDOWNLOADCOMPLETEDWITHERRORS: return datasetService.find(userNotification.getObjectId()); diff --git a/src/main/java/edu/harvard/iq/dataverse/SettingsWrapper.java b/src/main/java/edu/harvard/iq/dataverse/SettingsWrapper.java index 48196591b19..222d2881cd2 100644 --- a/src/main/java/edu/harvard/iq/dataverse/SettingsWrapper.java +++ b/src/main/java/edu/harvard/iq/dataverse/SettingsWrapper.java @@ -98,6 +98,7 @@ public class SettingsWrapper implements java.io.Serializable { //External Vocabulary support private Map cachedCvocMap = null; private Map cachedCvocByTermFieldMap = null; + private Set cvocFieldSet; private Long zipDownloadLimit = null; @@ -806,6 +807,17 @@ public Map getCVocConf(boolean byTermField) { } } + public boolean isCvocField(Long fieldId) { + + if(cvocFieldSet == null) { + cvocFieldSet = fieldService.getCvocFieldSet(); + } + if(cvocFieldSet == null) { + return false; + } + return cvocFieldSet.contains(fieldId); + } + public String getMetricsUrl() { if (metricsUrl == null) { metricsUrl = getValueForKey(SettingsServiceBean.Key.MetricsUrl); diff --git a/src/main/java/edu/harvard/iq/dataverse/UserNotification.java b/src/main/java/edu/harvard/iq/dataverse/UserNotification.java index 280c2075494..2d37540fab3 100644 --- a/src/main/java/edu/harvard/iq/dataverse/UserNotification.java +++ b/src/main/java/edu/harvard/iq/dataverse/UserNotification.java @@ -39,7 +39,8 @@ public enum Type { CHECKSUMIMPORT, CHECKSUMFAIL, CONFIRMEMAIL, APIGENERATED, INGESTCOMPLETED, INGESTCOMPLETEDWITHERRORS, PUBLISHFAILED_PIDREG, WORKFLOW_SUCCESS, WORKFLOW_FAILURE, STATUSUPDATED, DATASETCREATED, DATASETMENTIONED, GLOBUSUPLOADCOMPLETED, GLOBUSUPLOADCOMPLETEDWITHERRORS, - GLOBUSDOWNLOADCOMPLETED, GLOBUSDOWNLOADCOMPLETEDWITHERRORS, REQUESTEDFILEACCESS; + GLOBUSDOWNLOADCOMPLETED, GLOBUSDOWNLOADCOMPLETEDWITHERRORS, REQUESTEDFILEACCESS, + GLOBUSUPLOADREMOTEFAILURE, GLOBUSUPLOADLOCALFAILURE; public String getDescription() { return BundleUtil.getStringFromBundle("notification.typeDescription." + this.name()); diff --git a/src/main/java/edu/harvard/iq/dataverse/api/ApiConstants.java b/src/main/java/edu/harvard/iq/dataverse/api/ApiConstants.java index 347a8946a46..15114085c21 100644 --- a/src/main/java/edu/harvard/iq/dataverse/api/ApiConstants.java +++ b/src/main/java/edu/harvard/iq/dataverse/api/ApiConstants.java @@ -17,4 +17,8 @@ private ApiConstants() { public static final String DS_VERSION_LATEST = ":latest"; public static final String DS_VERSION_DRAFT = ":draft"; public static final String DS_VERSION_LATEST_PUBLISHED = ":latest-published"; + + // addFiles call + public static final String API_ADD_FILES_COUNT_PROCESSED = "Total number of files"; + public static final String API_ADD_FILES_COUNT_SUCCESSFUL = "Number of files successfully added"; } diff --git a/src/main/java/edu/harvard/iq/dataverse/api/Datasets.java b/src/main/java/edu/harvard/iq/dataverse/api/Datasets.java index e0d3bf6126e..369a22fe8d7 100644 --- a/src/main/java/edu/harvard/iq/dataverse/api/Datasets.java +++ b/src/main/java/edu/harvard/iq/dataverse/api/Datasets.java @@ -98,6 +98,7 @@ import java.util.stream.Collectors; import static edu.harvard.iq.dataverse.api.ApiConstants.*; +import edu.harvard.iq.dataverse.engine.command.exception.IllegalCommandException; import edu.harvard.iq.dataverse.dataset.DatasetType; import edu.harvard.iq.dataverse.dataset.DatasetTypeServiceBean; import static edu.harvard.iq.dataverse.util.json.JsonPrinter.*; @@ -3919,7 +3920,7 @@ public Response requestGlobusUpload(@Context ContainerRequestContext crc, @PathP if (!systemConfig.isGlobusUpload()) { return error(Response.Status.SERVICE_UNAVAILABLE, - BundleUtil.getStringFromBundle("datasets.api.globusdownloaddisabled")); + BundleUtil.getStringFromBundle("file.api.globusUploadDisabled")); } // ------------------------------------- @@ -4019,10 +4020,6 @@ public Response addGlobusFilesToDataset(@Context ContainerRequestContext crc, logger.info(" ==== (api addGlobusFilesToDataset) jsonData ====== " + jsonData); - if (!systemConfig.isHTTPUpload()) { - return error(Response.Status.SERVICE_UNAVAILABLE, BundleUtil.getStringFromBundle("file.api.httpDisabled")); - } - // ------------------------------------- // (1) Get the user from the API key // ------------------------------------- @@ -4045,6 +4042,32 @@ public Response addGlobusFilesToDataset(@Context ContainerRequestContext crc, return wr.getResponse(); } + // Is Globus upload service available? + + // ... on this Dataverse instance? + if (!systemConfig.isGlobusUpload()) { + return error(Response.Status.SERVICE_UNAVAILABLE, BundleUtil.getStringFromBundle("file.api.globusUploadDisabled")); + } + + // ... and on this specific Dataset? + String storeId = dataset.getEffectiveStorageDriverId(); + // acceptsGlobusTransfers should only be true for an S3 or globus store + if (!GlobusAccessibleStore.acceptsGlobusTransfers(storeId) + && !GlobusAccessibleStore.allowsGlobusReferences(storeId)) { + return badRequest(BundleUtil.getStringFromBundle("datasets.api.globusuploaddisabled")); + } + + // Check if the dataset is already locked + // We are reusing the code and logic used by various command to determine + // if there are any locks on the dataset that would prevent the current + // users from modifying it: + try { + DataverseRequest dataverseRequest = createDataverseRequest(authUser); + permissionService.checkEditDatasetLock(dataset, dataverseRequest, null); + } catch (IllegalCommandException icex) { + return error(Response.Status.FORBIDDEN, "Dataset " + datasetId + " is locked: " + icex.getLocalizedMessage()); + } + JsonObject jsonObject = null; try { jsonObject = JsonUtil.getJsonObject(jsonData); @@ -4075,18 +4098,18 @@ public Response addGlobusFilesToDataset(@Context ContainerRequestContext crc, logger.log(Level.WARNING, "Failed to lock the dataset (dataset id={0})", dataset.getId()); } - - ApiToken token = authSvc.findApiTokenByUser(authUser); - if(uriInfo != null) { logger.info(" ==== (api uriInfo.getRequestUri()) jsonData ====== " + uriInfo.getRequestUri().toString()); } - String requestUrl = SystemConfig.getDataverseSiteUrlStatic(); // Async Call - globusService.globusUpload(jsonObject, token, dataset, requestUrl, authUser); + try { + globusService.globusUpload(jsonObject, dataset, requestUrl, authUser); + } catch (IllegalArgumentException ex) { + return badRequest("Invalid parameters: "+ex.getMessage()); + } return ok("Async call to Globus Upload started "); diff --git a/src/main/java/edu/harvard/iq/dataverse/api/Dataverses.java b/src/main/java/edu/harvard/iq/dataverse/api/Dataverses.java index 17e3086f184..0ee146ed99b 100644 --- a/src/main/java/edu/harvard/iq/dataverse/api/Dataverses.java +++ b/src/main/java/edu/harvard/iq/dataverse/api/Dataverses.java @@ -407,6 +407,12 @@ public Response importDataset(@Context ContainerRequestContext crc, String jsonB if (ds.getIdentifier() == null) { return badRequest("Please provide a persistent identifier, either by including it in the JSON, or by using the pid query parameter."); } + + PidProvider pidProvider = PidUtil.getPidProvider(ds.getGlobalId().getProviderId()); + if (pidProvider == null || !pidProvider.canManagePID()) { + return badRequest("Cannot import a dataset that has a PID that doesn't match the server's settings"); + } + boolean shouldRelease = StringUtil.isTrue(releaseParam); DataverseRequest request = createDataverseRequest(u); diff --git a/src/main/java/edu/harvard/iq/dataverse/api/Users.java b/src/main/java/edu/harvard/iq/dataverse/api/Users.java index 1f5430340c2..c1a7c95dbff 100644 --- a/src/main/java/edu/harvard/iq/dataverse/api/Users.java +++ b/src/main/java/edu/harvard/iq/dataverse/api/Users.java @@ -24,13 +24,7 @@ import jakarta.ejb.Stateless; import jakarta.json.JsonArray; import jakarta.json.JsonObjectBuilder; -import jakarta.ws.rs.BadRequestException; -import jakarta.ws.rs.DELETE; -import jakarta.ws.rs.GET; -import jakarta.ws.rs.POST; -import jakarta.ws.rs.Path; -import jakarta.ws.rs.PathParam; -import jakarta.ws.rs.Produces; +import jakarta.ws.rs.*; import jakarta.ws.rs.container.ContainerRequestContext; import jakarta.ws.rs.core.Context; import jakarta.ws.rs.core.MediaType; @@ -157,7 +151,7 @@ public Response getTokenExpirationDate() { @Path("token/recreate") @AuthRequired @POST - public Response recreateToken(@Context ContainerRequestContext crc) { + public Response recreateToken(@Context ContainerRequestContext crc, @QueryParam("returnExpiration") boolean returnExpiration) { User u = getRequestUser(crc); AuthenticatedUser au; @@ -174,8 +168,12 @@ public Response recreateToken(@Context ContainerRequestContext crc) { ApiToken newToken = authSvc.generateApiTokenForUser(au); authSvc.save(newToken); - return ok("New token for " + au.getUserIdentifier() + " is " + newToken.getTokenString()); + String message = "New token for " + au.getUserIdentifier() + " is " + newToken.getTokenString(); + if (returnExpiration) { + message += " and expires on " + newToken.getExpireTime(); + } + return ok(message); } @GET diff --git a/src/main/java/edu/harvard/iq/dataverse/authorization/providers/builtin/DataverseUserPage.java b/src/main/java/edu/harvard/iq/dataverse/authorization/providers/builtin/DataverseUserPage.java index a0e3f899443..48afb2b830a 100644 --- a/src/main/java/edu/harvard/iq/dataverse/authorization/providers/builtin/DataverseUserPage.java +++ b/src/main/java/edu/harvard/iq/dataverse/authorization/providers/builtin/DataverseUserPage.java @@ -528,6 +528,8 @@ public void displayNotification() { case GLOBUSUPLOADCOMPLETEDWITHERRORS: case GLOBUSDOWNLOADCOMPLETED: case GLOBUSDOWNLOADCOMPLETEDWITHERRORS: + case GLOBUSUPLOADREMOTEFAILURE: + case GLOBUSUPLOADLOCALFAILURE: userNotification.setTheObject(datasetService.find(userNotification.getObjectId())); break; diff --git a/src/main/java/edu/harvard/iq/dataverse/datasetutility/AddReplaceFileHelper.java b/src/main/java/edu/harvard/iq/dataverse/datasetutility/AddReplaceFileHelper.java index 0143fced87c..a470f08f736 100644 --- a/src/main/java/edu/harvard/iq/dataverse/datasetutility/AddReplaceFileHelper.java +++ b/src/main/java/edu/harvard/iq/dataverse/datasetutility/AddReplaceFileHelper.java @@ -2139,9 +2139,9 @@ public Response addFiles(String jsonData, Dataset dataset, User authUser) { logger.log(Level.WARNING, "Dataset not locked for EditInProgress "); } else { datasetService.removeDatasetLocks(dataset, DatasetLock.Reason.EditInProgress); - logger.log(Level.INFO, "Removed EditInProgress lock "); + logger.log(Level.FINE, "Removed EditInProgress lock"); } - + try { Command cmd = new UpdateDatasetVersionCommand(dataset, dvRequest, clone); ((UpdateDatasetVersionCommand) cmd).setValidateLenient(true); @@ -2167,8 +2167,8 @@ public Response addFiles(String jsonData, Dataset dataset, User authUser) { } JsonObjectBuilder result = Json.createObjectBuilder() - .add("Total number of files", totalNumberofFiles) - .add("Number of files successfully added", successNumberofFiles); + .add(ApiConstants.API_ADD_FILES_COUNT_PROCESSED, totalNumberofFiles) + .add(ApiConstants.API_ADD_FILES_COUNT_SUCCESSFUL, successNumberofFiles); return Response.ok().entity(Json.createObjectBuilder() @@ -2306,7 +2306,7 @@ public Response replaceFiles(String jsonData, Dataset ds, User authUser) { logger.warning("Dataset not locked for EditInProgress "); } else { datasetService.removeDatasetLocks(dataset, DatasetLock.Reason.EditInProgress); - logger.info("Removed EditInProgress lock "); + logger.fine("Removed EditInProgress lock "); } try { diff --git a/src/main/java/edu/harvard/iq/dataverse/engine/command/impl/UpdateDatasetVersionCommand.java b/src/main/java/edu/harvard/iq/dataverse/engine/command/impl/UpdateDatasetVersionCommand.java index 768bb88fd43..bb5f5a71e24 100644 --- a/src/main/java/edu/harvard/iq/dataverse/engine/command/impl/UpdateDatasetVersionCommand.java +++ b/src/main/java/edu/harvard/iq/dataverse/engine/command/impl/UpdateDatasetVersionCommand.java @@ -298,5 +298,5 @@ public boolean onSuccess(CommandContext ctxt, Object r) { ctxt.index().asyncIndexDataset((Dataset) r, true); return true; } - + } diff --git a/src/main/java/edu/harvard/iq/dataverse/globus/GlobusServiceBean.java b/src/main/java/edu/harvard/iq/dataverse/globus/GlobusServiceBean.java index fb50214c259..ac3c81622fc 100644 --- a/src/main/java/edu/harvard/iq/dataverse/globus/GlobusServiceBean.java +++ b/src/main/java/edu/harvard/iq/dataverse/globus/GlobusServiceBean.java @@ -22,7 +22,6 @@ import jakarta.json.JsonString; import jakarta.json.JsonValue.ValueType; import jakarta.json.stream.JsonParsingException; -import jakarta.servlet.http.HttpServletRequest; import jakarta.ws.rs.HttpMethod; import static edu.harvard.iq.dataverse.util.json.JsonPrinter.json; @@ -33,7 +32,6 @@ import java.net.HttpURLConnection; import java.net.MalformedURLException; import java.net.URL; -import java.net.URLEncoder; import java.sql.Timestamp; import java.text.SimpleDateFormat; import java.time.Duration; @@ -53,6 +51,7 @@ import org.primefaces.PrimeFaces; import com.google.gson.Gson; +import edu.harvard.iq.dataverse.api.ApiConstants; import edu.harvard.iq.dataverse.authorization.AuthenticationServiceBean; import edu.harvard.iq.dataverse.authorization.users.ApiToken; import edu.harvard.iq.dataverse.authorization.users.AuthenticatedUser; @@ -61,15 +60,26 @@ import edu.harvard.iq.dataverse.dataaccess.DataAccess; import edu.harvard.iq.dataverse.dataaccess.GlobusAccessibleStore; import edu.harvard.iq.dataverse.dataaccess.StorageIO; +import edu.harvard.iq.dataverse.datasetutility.AddReplaceFileHelper; +import edu.harvard.iq.dataverse.engine.command.DataverseRequest; +import edu.harvard.iq.dataverse.ingest.IngestServiceBean; import edu.harvard.iq.dataverse.privateurl.PrivateUrl; import edu.harvard.iq.dataverse.privateurl.PrivateUrlServiceBean; +import edu.harvard.iq.dataverse.settings.FeatureFlags; import edu.harvard.iq.dataverse.settings.JvmSettings; import edu.harvard.iq.dataverse.settings.SettingsServiceBean; import edu.harvard.iq.dataverse.util.FileUtil; +import edu.harvard.iq.dataverse.util.StringUtil; import edu.harvard.iq.dataverse.util.SystemConfig; import edu.harvard.iq.dataverse.util.URLTokenUtil; import edu.harvard.iq.dataverse.util.UrlSignerUtil; import edu.harvard.iq.dataverse.util.json.JsonUtil; +import jakarta.json.JsonReader; +import jakarta.persistence.EntityManager; +import jakarta.persistence.PersistenceContext; +import jakarta.servlet.http.HttpServletRequest; +import jakarta.ws.rs.core.Response; +import org.apache.http.util.EntityUtils; @Stateless @Named("GlobusServiceBean") @@ -81,6 +91,8 @@ public class GlobusServiceBean implements java.io.Serializable { protected SettingsServiceBean settingsSvc; @Inject DataverseSession session; + @Inject + DataverseRequestServiceBean dataverseRequestSvc; @EJB protected AuthenticationServiceBean authSvc; @EJB @@ -92,7 +104,15 @@ public class GlobusServiceBean implements java.io.Serializable { @EJB FileDownloadServiceBean fileDownloadService; @EJB - DataFileServiceBean dataFileService; + DataFileServiceBean dataFileSvc; + @EJB + PermissionServiceBean permissionSvc; + @EJB + IngestServiceBean ingestSvc; + @EJB + SystemConfig systemConfig; + @PersistenceContext(unitName = "VDCNet-ejbPU") + private EntityManager em; private static final Logger logger = Logger.getLogger(GlobusServiceBean.class.getCanonicalName()); private static final SimpleDateFormat logFormatter = new SimpleDateFormat("yyyy-MM-dd'T'HH-mm-ss"); @@ -130,7 +150,7 @@ private String getRuleId(GlobusEndpoint endpoint, String principal, String permi * @param ruleId - Globus rule id - assumed to be associated with the * dataset's file path (should not be called with a user * specified rule id w/o further checking) - * @param datasetId - the id of the dataset associated with the rule + * @param dataset - the dataset associated with the rule * @param globusLogger - a separate logger instance, may be null */ public void deletePermission(String ruleId, Dataset dataset, Logger globusLogger) { @@ -379,19 +399,33 @@ private void monitorTemporaryPermissions(String ruleId, long datasetId) { * @return * @throws MalformedURLException */ - public GlobusTask getTask(String accessToken, String taskId, Logger globusLogger) throws MalformedURLException { + public GlobusTaskState getTask(String accessToken, String taskId, Logger globusLogger) { + + Logger myLogger = globusLogger != null ? globusLogger : logger; - URL url = new URL("https://transfer.api.globusonline.org/v0.10/endpoint_manager/task/" + taskId); + URL url; + try { + url = new URL("https://transfer.api.globusonline.org/v0.10/endpoint_manager/task/" + taskId); + } catch (MalformedURLException mue) { + myLogger.warning("Malformed URL exception when trying to contact Globus. Globus API url: " + + "https://transfer.api.globusonline.org/v0.10/endpoint_manager/task/" + + taskId); + return null; + } MakeRequestResponse result = makeRequest(url, "Bearer", accessToken, "GET", null); - GlobusTask task = null; + GlobusTaskState task = null; if (result.status == 200) { - task = parseJson(result.jsonResponse, GlobusTask.class, false); + task = parseJson(result.jsonResponse, GlobusTaskState.class, false); } if (result.status != 200) { - globusLogger.warning("Cannot find information for the task " + taskId + " : Reason : " + // @todo It should probably retry it 2-3 times before giving up; + // similarly, it should probably differentiate between a "no such task" + // response and something intermittent like a server/network error or + // an expired token... i.e. something that's recoverable (?) + myLogger.warning("Cannot find information for the task " + taskId + " : Reason : " + result.jsonResponse.toString()); } @@ -633,41 +667,50 @@ private String getGlobusDownloadScript(Dataset dataset, ApiToken apiToken, List< @Asynchronous @TransactionAttribute(TransactionAttributeType.REQUIRES_NEW) - public void globusUpload(JsonObject jsonData, ApiToken token, Dataset dataset, String httpRequestUrl, - AuthenticatedUser authUser) throws ExecutionException, InterruptedException, MalformedURLException { + public void globusUpload(JsonObject jsonData, Dataset dataset, String httpRequestUrl, + AuthenticatedUser authUser) throws IllegalArgumentException, ExecutionException, InterruptedException, MalformedURLException { - Integer countAll = 0; - Integer countSuccess = 0; - Integer countError = 0; - String logTimestamp = logFormatter.format(new Date()); + // Before we do anything else, let's do some basic validation of what + // we've been passed: + + JsonArray filesJsonArray = jsonData.getJsonArray("files"); + + if (filesJsonArray == null || filesJsonArray.size() < 1) { + throw new IllegalArgumentException("No valid json entries supplied for the files being uploaded"); + } + + Date startDate = new Date(); + + String logTimestamp = logFormatter.format(startDate); Logger globusLogger = Logger.getLogger( "edu.harvard.iq.dataverse.upload.client.DatasetServiceBean." + "GlobusUpload" + logTimestamp); - String logFileName = System.getProperty("com.sun.aas.instanceRoot") + File.separator + "logs" + File.separator + "globusUpload_id_" + dataset.getId() + "_" + logTimestamp + String logFileName = System.getProperty("com.sun.aas.instanceRoot") + File.separator + "logs" + File.separator + "globusUpload_" + dataset.getId() + "_" + logTimestamp + ".log"; FileHandler fileHandler; - boolean fileHandlerSuceeded; + try { fileHandler = new FileHandler(logFileName); globusLogger.setUseParentHandlers(false); - fileHandlerSuceeded = true; } catch (IOException | SecurityException ex) { Logger.getLogger(DatasetServiceBean.class.getName()).log(Level.SEVERE, null, ex); - return; + fileHandler = null; } - if (fileHandlerSuceeded) { + if (fileHandler != null) { globusLogger.addHandler(fileHandler); } else { globusLogger = logger; } logger.fine("json: " + JsonUtil.prettyPrint(jsonData)); + + globusLogger.info("Globus upload initiated"); String taskIdentifier = jsonData.getString("taskIdentifier"); GlobusEndpoint endpoint = getGlobusEndpoint(dataset); - GlobusTask task = getTask(endpoint.getClientToken(), taskIdentifier, globusLogger); - String ruleId = getRuleId(endpoint, task.getOwner_id(), "rw"); + GlobusTaskState taskState = getTask(endpoint.getClientToken(), taskIdentifier, globusLogger); + String ruleId = getRuleId(endpoint, taskState.getOwner_id(), "rw"); logger.fine("Found rule: " + ruleId); if (ruleId != null) { Long datasetId = rulesCache.getIfPresent(ruleId); @@ -676,28 +719,109 @@ public void globusUpload(JsonObject jsonData, ApiToken token, Dataset dataset, S rulesCache.invalidate(ruleId); } } - + // Wait before first check Thread.sleep(5000); - // globus task status check - task = globusStatusCheck(endpoint, taskIdentifier, globusLogger); - String taskStatus = getTaskStatus(task); + + if (FeatureFlags.GLOBUS_USE_EXPERIMENTAL_ASYNC_FRAMEWORK.enabled()) { + + // Save the task information in the database so that the Globus monitoring + // service can continue checking on its progress. + + GlobusTaskInProgress taskInProgress = new GlobusTaskInProgress(taskIdentifier, GlobusTaskInProgress.TaskType.UPLOAD, dataset, endpoint.getClientToken(), authUser, ruleId, new Timestamp(startDate.getTime())); + em.persist(taskInProgress); + + // Save the metadata entries that define the files that are being uploaded + // in the database. These entries will be used once/if the uploads + // completes successfully to add the files to the dataset. - globusLogger.info("Starting a globusUpload "); + for (JsonObject fileJsonObject : filesJsonArray.getValuesAs(JsonObject.class)) { + ExternalFileUploadInProgress fileUploadRecord = new ExternalFileUploadInProgress(taskIdentifier, fileJsonObject.toString()); + em.persist(fileUploadRecord); + } + + if (fileHandler != null) { + fileHandler.close(); + } + + // return and forget + return; + } + + + // the old implementation that relies on looping continuosly, + // sleeping-then-checking the task status repeatedly: + + // globus task status check + // (the following method performs continuous looped checks of the remote + // Globus API, monitoring it for as long as it takes for the task to + // finish one way or another!) + taskState = globusStatusCheck(endpoint, taskIdentifier, globusLogger); + // @todo null check, or make sure it's never null + String taskStatus = GlobusUtil.getTaskStatus(taskState); + + boolean taskSuccess = GlobusUtil.isTaskCompleted(taskState); + + processCompletedUploadTask(dataset, filesJsonArray, authUser, ruleId, globusLogger, taskSuccess, taskStatus); + + if (fileHandler != null) { + fileHandler.close(); + } + } + /** + * As the name suggests, the method completes and finalizes an upload task, + * whether it completed successfully or failed. (In the latter case, it + * simply sends a failure notification and does some cleanup). + * The method is called in both task monitoring scenarios: the old method, + * that relies on continuous looping, and the new, implemented on the basis + * of timer-like monitoring from a dedicated monitoring Singleton service. + * @param dataset the dataset + * @param filesJsonArray JsonArray containing files metadata entries as passed to /addGlobusFiles + * @param authUser the user that should be be performing the addFiles call + * finalizing adding the files to the Dataset. Note that this + * user will need to be obtained from the saved api token, when this + * method is called via the TaskMonitoringService + * @param ruleId Globus rule/permission id associated with the task + * @param myLogger the Logger; if null, the main logger of the service bean will be used + * @param fileHandler FileHandler associated with the Logger, when not null + * @param taskSuccess boolean task status of the completed task + * @param taskState human-readable task status label as reported by the Globus API + * the method should not throw any exceptions; all the exceptions thrown + * by the methods within are expected to be intercepted. + */ + private void processCompletedUploadTask(Dataset dataset, + JsonArray filesJsonArray, + AuthenticatedUser authUser, + String ruleId, + Logger globusLogger, + boolean taskSuccess, + String taskStatus) { + + Logger myLogger = globusLogger == null ? logger : globusLogger; + if (ruleId != null) { // Transfer is complete, so delete rule - deletePermission(ruleId, dataset, globusLogger); - + deletePermission(ruleId, dataset, myLogger); } - + // If success, switch to an EditInProgress lock - do this before removing the // GlobusUpload lock // Keeping a lock through the add datafiles API call avoids a conflicting edit - // and keeps any open dataset page refreshing until the datafile appears - if (!(taskStatus.startsWith("FAILED") || taskStatus.startsWith("INACTIVE"))) { - datasetSvc.addDatasetLock(dataset, - new DatasetLock(DatasetLock.Reason.EditInProgress, authUser, "Completing Globus Upload")); + // and keeps any open dataset page refreshing until the datafile appears. + + if (taskSuccess) { + myLogger.info("Finished upload via Globus job."); + + DatasetLock editLock = datasetSvc.addDatasetLock(dataset.getId(), + DatasetLock.Reason.EditInProgress, + (authUser).getId(), + "Completing Globus Upload"); + if (editLock != null) { + dataset.addLock(editLock); + } else { + myLogger.log(Level.WARNING, "Failed to lock the dataset (dataset id={0})", dataset.getId()); + } } DatasetLock gLock = dataset.getLockFor(DatasetLock.Reason.GlobusUpload); @@ -714,205 +838,260 @@ public void globusUpload(JsonObject jsonData, ApiToken token, Dataset dataset, S * addFilesAsync method called within the globusUpload method. I.e. it appeared * that the lock removal was not committed/visible outside this method until * globusUpload itself ended. + * (from @landreev:) If I understand the comment above correctly - annotations + * like "@TransactionAttribute(REQUIRES_NEW) do NOT work when you call a method + * directly within the same service bean. Strictly speaking, it's not the + * "within the same bean" part that is the key, rather, these annotations + * only apply when calling a method via an @EJB-defined service. So it + * is generally possible to call another method within FooServiceBean + * with the REQUIRES_NEW transaction taking effect - but then it would need + * to define *itself* as an @EJB - + * @EJB FooServiceBean fooSvc; + * ... + * fooSvc.doSomethingInNewTransaction(...); + * etc. */ datasetSvc.removeDatasetLocks(dataset, DatasetLock.Reason.GlobusUpload); } - - if (taskStatus.startsWith("FAILED") || taskStatus.startsWith("INACTIVE")) { - String comment = "Reason : " + taskStatus.split("#")[1] + "
Short Description : " - + taskStatus.split("#")[2]; + + if (!taskSuccess) { + String comment; + if (taskStatus != null) { + comment = "Reason : " + taskStatus.split("#")[1] + "
Short Description : " + + taskStatus.split("#")[2]; + } else { + comment = "No further information available"; + } + + myLogger.info("Globus Upload task failed "); userNotificationService.sendNotification((AuthenticatedUser) authUser, new Timestamp(new Date().getTime()), - UserNotification.Type.GLOBUSUPLOADCOMPLETEDWITHERRORS, dataset.getId(), comment, true); - globusLogger.info("Globus task failed "); + UserNotification.Type.GLOBUSUPLOADREMOTEFAILURE, dataset.getId(), comment, true); } else { try { - // - - List inputList = new ArrayList(); - JsonArray filesJsonArray = jsonData.getJsonArray("files"); - - if (filesJsonArray != null) { - String datasetIdentifier = dataset.getAuthorityForFileStorage() + "/" - + dataset.getIdentifierForFileStorage(); - - for (JsonObject fileJsonObject : filesJsonArray.getValuesAs(JsonObject.class)) { - - // storageIdentifier s3://gcs5-bucket1:1781cfeb8a7-748c270a227c from - // externalTool - String storageIdentifier = fileJsonObject.getString("storageIdentifier"); - String[] parts = DataAccess.getDriverIdAndStorageLocation(storageIdentifier); - String storeId = parts[0]; - // If this is an S3 store, we need to split out the bucket name - String[] bits = parts[1].split(":"); - String bucketName = ""; - if (bits.length > 1) { - bucketName = bits[0]; - } - String fileId = bits[bits.length - 1]; - - // fullpath s3://gcs5-bucket1/10.5072/FK2/3S6G2E/1781cfeb8a7-4ad9418a5873 - // or globus:///10.5072/FK2/3S6G2E/1781cfeb8a7-4ad9418a5873 - String fullPath = storeId + "://" + bucketName + "/" + datasetIdentifier + "/" + fileId; - String fileName = fileJsonObject.getString("fileName"); - - inputList.add(fileId + "IDsplit" + fullPath + "IDsplit" + fileName); - } - - // calculateMissingMetadataFields: checksum, mimetype - JsonObject newfilesJsonObject = calculateMissingMetadataFields(inputList, globusLogger); - JsonArray newfilesJsonArray = newfilesJsonObject.getJsonArray("files"); - logger.fine("Size: " + newfilesJsonArray.size()); - logger.fine("Val: " + JsonUtil.prettyPrint(newfilesJsonArray.getJsonObject(0))); - JsonArrayBuilder jsonDataSecondAPI = Json.createArrayBuilder(); - - for (JsonObject fileJsonObject : filesJsonArray.getValuesAs(JsonObject.class)) { - - countAll++; - String storageIdentifier = fileJsonObject.getString("storageIdentifier"); - String fileName = fileJsonObject.getString("fileName"); - String[] parts = DataAccess.getDriverIdAndStorageLocation(storageIdentifier); - // If this is an S3 store, we need to split out the bucket name - String[] bits = parts[1].split(":"); - if (bits.length > 1) { - } - String fileId = bits[bits.length - 1]; - - List newfileJsonObject = IntStream.range(0, newfilesJsonArray.size()) - .mapToObj(index -> ((JsonObject) newfilesJsonArray.get(index)).getJsonObject(fileId)) - .filter(Objects::nonNull).collect(Collectors.toList()); - if (newfileJsonObject != null) { - logger.fine("List Size: " + newfileJsonObject.size()); - // if (!newfileJsonObject.get(0).getString("hash").equalsIgnoreCase("null")) { - JsonPatch path = Json.createPatchBuilder() - .add("/md5Hash", newfileJsonObject.get(0).getString("hash")).build(); - fileJsonObject = path.apply(fileJsonObject); - path = Json.createPatchBuilder() - .add("/mimeType", newfileJsonObject.get(0).getString("mime")).build(); - fileJsonObject = path.apply(fileJsonObject); - jsonDataSecondAPI.add(fileJsonObject); - countSuccess++; - // } else { - // globusLogger.info(fileName - // + " will be skipped from adding to dataset by second API due to missing - // values "); - // countError++; - // } - } else { - globusLogger.info(fileName - + " will be skipped from adding to dataset by second API due to missing values "); - countError++; - } - } - - String newjsonData = jsonDataSecondAPI.build().toString(); - - globusLogger.info("Successfully generated new JsonData for Second API call"); - - String command = "curl -H \"X-Dataverse-key:" + token.getTokenString() + "\" -X POST " - + httpRequestUrl + "/api/datasets/:persistentId/addFiles?persistentId=doi:" - + datasetIdentifier + " -F jsonData='" + newjsonData + "'"; - System.out.println("*******====command ==== " + command); - - // ToDo - refactor to call AddReplaceFileHelper.addFiles directly instead of - // calling API - - String output = addFilesAsync(command, globusLogger); - if (output.equalsIgnoreCase("ok")) { - // if(!taskSkippedFiles) - if (countError == 0) { - userNotificationService.sendNotification((AuthenticatedUser) authUser, - new Timestamp(new Date().getTime()), UserNotification.Type.GLOBUSUPLOADCOMPLETED, - dataset.getId(), countSuccess + " files added out of " + countAll, true); - } else { - userNotificationService.sendNotification((AuthenticatedUser) authUser, - new Timestamp(new Date().getTime()), - UserNotification.Type.GLOBUSUPLOADCOMPLETEDWITHERRORS, dataset.getId(), - countSuccess + " files added out of " + countAll, true); - } - globusLogger.info("Successfully completed api/datasets/:persistentId/addFiles call "); - } else { - globusLogger.log(Level.SEVERE, - "******* Error while executing api/datasets/:persistentId/add call ", command); - } - - } - - globusLogger.info("Files processed: " + countAll.toString()); - globusLogger.info("Files added successfully: " + countSuccess.toString()); - globusLogger.info("Files failures: " + countError.toString()); - globusLogger.info("Finished upload via Globus job."); - + processUploadedFiles(filesJsonArray, dataset, authUser, myLogger); } catch (Exception e) { - logger.info("Exception from globusUpload call "); + logger.info("Exception from processUploadedFiles call "); e.printStackTrace(); - globusLogger.info("Exception from globusUpload call " + e.getMessage()); + myLogger.info("Exception from processUploadedFiles call " + e.getMessage()); datasetSvc.removeDatasetLocks(dataset, DatasetLock.Reason.EditInProgress); } } if (ruleId != null) { - deletePermission(ruleId, dataset, globusLogger); - globusLogger.info("Removed upload permission: " + ruleId); - } - if (fileHandlerSuceeded) { - fileHandler.close(); + deletePermission(ruleId, dataset, myLogger); + myLogger.info("Removed upload permission: " + ruleId); } + //if (fileHandler != null) { + // fileHandler.close(); + //} + } + + + /** + * The code in this method is copy-and-pasted from the previous Borealis + * implemenation. + * @todo see if it can be refactored and simplified a bit, the json manipulation + * specifically (?) + * @param filesJsonArray JsonArray containing files metadata entries as passed to /addGlobusFiles + * @param dataset the dataset + * @param authUser the user that should be be performing the addFiles call + * finalizing adding the files to the Dataset. Note that this + * user will need to be obtained from the saved api token, when this + * method is called via the TaskMonitoringService + * @param myLogger the Logger; if null, the main logger of the service bean will be used + * @throws IOException, InterruptedException, ExecutionException @todo may need to throw more exceptions (?) + */ + private void processUploadedFiles(JsonArray filesJsonArray, Dataset dataset, AuthenticatedUser authUser, Logger myLogger) throws IOException, InterruptedException, ExecutionException { + myLogger = myLogger != null ? myLogger : logger; + + Integer countAll = 0; + Integer countSuccess = 0; + Integer countError = 0; + Integer countAddFilesSuccess = 0; + String notificationErrorMessage = ""; + + List inputList = new ArrayList(); + + String datasetIdentifier = dataset.getAuthorityForFileStorage() + "/" + + dataset.getIdentifierForFileStorage(); + + for (JsonObject fileJsonObject : filesJsonArray.getValuesAs(JsonObject.class)) { + + // storageIdentifier s3://gcs5-bucket1:1781cfeb8a7-748c270a227c from + // externalTool + String storageIdentifier = fileJsonObject.getString("storageIdentifier"); + String[] parts = DataAccess.getDriverIdAndStorageLocation(storageIdentifier); + String storeId = parts[0]; + // If this is an S3 store, we need to split out the bucket name + String[] bits = parts[1].split(":"); + String bucketName = ""; + if (bits.length > 1) { + bucketName = bits[0]; + } + String fileId = bits[bits.length - 1]; - public String addFilesAsync(String curlCommand, Logger globusLogger) - throws ExecutionException, InterruptedException { - CompletableFuture addFilesFuture = CompletableFuture.supplyAsync(() -> { - try { - Thread.sleep(2000); - } catch (InterruptedException e) { - e.printStackTrace(); + // fullpath s3://gcs5-bucket1/10.5072/FK2/3S6G2E/1781cfeb8a7-4ad9418a5873 + // or globus:///10.5072/FK2/3S6G2E/1781cfeb8a7-4ad9418a5873 + String fullPath = storeId + "://" + bucketName + "/" + datasetIdentifier + "/" + fileId; + String fileName = fileJsonObject.getString("fileName"); + + inputList.add(fileId + "IDsplit" + fullPath + "IDsplit" + fileName); + } + + // calculateMissingMetadataFields: checksum, mimetype + JsonObject newfilesJsonObject = calculateMissingMetadataFields(inputList, myLogger); + JsonArray newfilesJsonArray = newfilesJsonObject.getJsonArray("files"); + logger.fine("Size: " + newfilesJsonArray.size()); + logger.fine("Val: " + JsonUtil.prettyPrint(newfilesJsonArray.getJsonObject(0))); + JsonArrayBuilder addFilesJsonData = Json.createArrayBuilder(); + + for (JsonObject fileJsonObject : filesJsonArray.getValuesAs(JsonObject.class)) { + + countAll++; + String storageIdentifier = fileJsonObject.getString("storageIdentifier"); + String fileName = fileJsonObject.getString("fileName"); + String[] parts = DataAccess.getDriverIdAndStorageLocation(storageIdentifier); + // If this is an S3 store, we need to split out the bucket name + String[] bits = parts[1].split(":"); + if (bits.length > 1) { } - return (addFiles(curlCommand, globusLogger)); - }, executor).exceptionally(ex -> { - globusLogger.fine("Something went wrong : " + ex.getLocalizedMessage()); - ex.printStackTrace(); - return null; - }); + String fileId = bits[bits.length - 1]; + + List newfileJsonObject = IntStream.range(0, newfilesJsonArray.size()) + .mapToObj(index -> ((JsonObject) newfilesJsonArray.get(index)).getJsonObject(fileId)) + .filter(Objects::nonNull).collect(Collectors.toList()); + if (newfileJsonObject != null) { + logger.fine("List Size: " + newfileJsonObject.size()); + // if (!newfileJsonObject.get(0).getString("hash").equalsIgnoreCase("null")) { + JsonPatch path = Json.createPatchBuilder() + .add("/md5Hash", newfileJsonObject.get(0).getString("hash")).build(); + fileJsonObject = path.apply(fileJsonObject); + path = Json.createPatchBuilder() + .add("/mimeType", newfileJsonObject.get(0).getString("mime")).build(); + fileJsonObject = path.apply(fileJsonObject); + addFilesJsonData.add(fileJsonObject); + countSuccess++; + // } else { + // globusLogger.info(fileName + // + " will be skipped from adding to dataset by second API due to missing + // values "); + // countError++; + // } + } else { + myLogger.info(fileName + + " will be skipped from adding to dataset in the final AddReplaceFileHelper.addFiles() call. "); + countError++; + } + } - String result = addFilesFuture.get(); + String newjsonData = addFilesJsonData.build().toString(); - return result; - } + myLogger.info("Successfully generated new JsonData for addFiles call"); - private String addFiles(String curlCommand, Logger globusLogger) { - ProcessBuilder processBuilder = new ProcessBuilder(); - Process process = null; - String line; - String status = ""; + myLogger.info("Files passed to /addGlobusFiles: " + countAll); + myLogger.info("Files processed successfully: " + countSuccess); + myLogger.info("Files failures to process: " + countError); + + if (countSuccess < 1) { + // We don't have any valid entries to call addFiles() for; so, no + // need to proceed. + notificationErrorMessage = "Failed to successfully process any of the file entries, " + + "out of the " + countAll + " total as submitted to Dataverse"; + userNotificationService.sendNotification((AuthenticatedUser) authUser, + new Timestamp(new Date().getTime()), UserNotification.Type.GLOBUSUPLOADREMOTEFAILURE, + dataset.getId(), notificationErrorMessage, true); + return; + } else if (countSuccess < countAll) { + notificationErrorMessage = "Out of the " + countAll + " file entries submitted to /addGlobusFiles " + + "only " + countSuccess + " could be successfully parsed and processed. "; + } + + // A new AddReplaceFileHelper implementation, replacing the old one that + // was relying on calling /addFiles api via curl: + + // Passing null for the HttpServletRequest to make a new DataverseRequest. + // The parent method is always executed asynchronously, so the real request + // that was associated with the original API call that triggered this upload + // cannot be obtained. + DataverseRequest dataverseRequest = new DataverseRequest(authUser, (HttpServletRequest)null); + + AddReplaceFileHelper addFileHelper = new AddReplaceFileHelper( + dataverseRequest, + this.ingestSvc, + this.datasetSvc, + this.dataFileSvc, + this.permissionSvc, + this.commandEngine, + this.systemConfig + ); + + // The old code had 2 sec. of sleep, so ... + Thread.sleep(2000); + + Response addFilesResponse = addFileHelper.addFiles(newjsonData, dataset, authUser); + + if (addFilesResponse == null) { + logger.info("null response from addFiles call"); + //@todo add this case to the user notification in case of error + return; + } + + JsonObject addFilesJsonObject = JsonUtil.getJsonObject(addFilesResponse.getEntity().toString()); + + // @todo null check? + String addFilesStatus = addFilesJsonObject.getString("status", null); + myLogger.info("addFilesResponse status: " + addFilesStatus); + + if (ApiConstants.STATUS_OK.equalsIgnoreCase(addFilesStatus)) { + if (addFilesJsonObject.containsKey("data") && addFilesJsonObject.getJsonObject("data").containsKey("Result")) { + + //Integer countAddFilesTotal = addFilesJsonObject.getJsonObject("data").getJsonObject("Result").getInt(ApiConstants.API_ADD_FILES_COUNT_PROCESSED, -1); + countAddFilesSuccess = addFilesJsonObject.getJsonObject("data").getJsonObject("Result").getInt(ApiConstants.API_ADD_FILES_COUNT_SUCCESSFUL, -1); + myLogger.info("Files successfully added by addFiles(): " + countAddFilesSuccess); - try { - globusLogger.info("Call to : " + curlCommand); - processBuilder.command("bash", "-c", curlCommand); - process = processBuilder.start(); - process.waitFor(); - - BufferedReader br = new BufferedReader(new InputStreamReader(process.getInputStream())); - - StringBuilder sb = new StringBuilder(); - while ((line = br.readLine()) != null) - sb.append(line); - globusLogger.info(" API Output : " + sb.toString()); - JsonObject jsonObject = null; - jsonObject = JsonUtil.getJsonObject(sb.toString()); - - status = jsonObject.getString("status"); - } catch (Exception ex) { - if (ex instanceof JsonParsingException) { - globusLogger.log(Level.SEVERE, "Error parsing dataset json."); } else { - globusLogger.log(Level.SEVERE, - "******* Unexpected Exception while executing api/datasets/:persistentId/add call ", ex); + myLogger.warning("Malformed addFiles response json: " + addFilesJsonObject.toString()); + notificationErrorMessage = "Malformed response received when attempting to add the files to the dataset. "; } + + myLogger.info("Completed addFiles call "); + } else if (ApiConstants.STATUS_ERROR.equalsIgnoreCase(addFilesStatus)) { + String addFilesMessage = addFilesJsonObject.getString("message", null); + + myLogger.log(Level.SEVERE, + "******* Error while executing addFiles ", newjsonData); + myLogger.log(Level.SEVERE, "****** Output from addFiles: ", addFilesMessage); + notificationErrorMessage += "Error response received when attempting to add the files to the dataset: " + addFilesMessage + " "; + + } else { + myLogger.log(Level.SEVERE, + "******* Error while executing addFiles ", newjsonData); + notificationErrorMessage += "Unexpected error encountered when attemptingh to add the files to the dataset."; + } + + // if(!taskSkippedFiles) + if (countAddFilesSuccess == countAll) { + userNotificationService.sendNotification((AuthenticatedUser) authUser, + new Timestamp(new Date().getTime()), UserNotification.Type.GLOBUSUPLOADCOMPLETED, + dataset.getId(), countSuccess + " files added out of " + countAll, true); + } else if (countAddFilesSuccess > 0) { + // success, but partial: + userNotificationService.sendNotification((AuthenticatedUser) authUser, + new Timestamp(new Date().getTime()), + UserNotification.Type.GLOBUSUPLOADCOMPLETEDWITHERRORS, dataset.getId(), + countSuccess + " files added out of " + countAll + notificationErrorMessage, true); + } else { + notificationErrorMessage = "".equals(notificationErrorMessage) + ? " No additional information is available." : notificationErrorMessage; + userNotificationService.sendNotification((AuthenticatedUser) authUser, + new Timestamp(new Date().getTime()), + UserNotification.Type.GLOBUSUPLOADLOCALFAILURE, dataset.getId(), + notificationErrorMessage, true); } - return status; } - + @Asynchronous public void globusDownload(String jsonData, Dataset dataset, User authUser) throws MalformedURLException { @@ -958,7 +1137,7 @@ public void globusDownload(String jsonData, Dataset dataset, User authUser) thro // If the rules_cache times out, the permission will be deleted. Presumably that // doesn't affect a // globus task status check - GlobusTask task = getTask(endpoint.getClientToken(), taskIdentifier, globusLogger); + GlobusTaskState task = getTask(endpoint.getClientToken(), taskIdentifier, globusLogger); String ruleId = getRuleId(endpoint, task.getOwner_id(), "r"); if (ruleId != null) { logger.fine("Found rule: " + ruleId); @@ -974,7 +1153,8 @@ public void globusDownload(String jsonData, Dataset dataset, User authUser) thro logger.warning("ruleId not found for taskId: " + taskIdentifier); } task = globusStatusCheck(endpoint, taskIdentifier, globusLogger); - String taskStatus = getTaskStatus(task); + // @todo null check? + String taskStatus = GlobusUtil.getTaskStatus(task); // Transfer is done (success or failure) so delete the rule if (ruleId != null) { @@ -1008,76 +1188,29 @@ public void globusDownload(String jsonData, Dataset dataset, User authUser) thro Executor executor = Executors.newFixedThreadPool(10); - private GlobusTask globusStatusCheck(GlobusEndpoint endpoint, String taskId, Logger globusLogger) + private GlobusTaskState globusStatusCheck(GlobusEndpoint endpoint, String taskId, Logger globusLogger) throws MalformedURLException { - boolean taskCompletion = false; - String status = ""; - GlobusTask task = null; + boolean taskCompleted = false; + GlobusTaskState task = null; int pollingInterval = SystemConfig.getIntLimitFromStringOrDefault( settingsSvc.getValueForKey(SettingsServiceBean.Key.GlobusPollingInterval), 50); do { try { globusLogger.info("checking globus transfer task " + taskId); Thread.sleep(pollingInterval * 1000); + // Call the (centralized) Globus API to check on the task state/status: task = getTask(endpoint.getClientToken(), taskId, globusLogger); - if (task != null) { - status = task.getStatus(); - if (status != null) { - // The task is in progress. - if (status.equalsIgnoreCase("ACTIVE")) { - if (task.getNice_status().equalsIgnoreCase("ok") - || task.getNice_status().equalsIgnoreCase("queued")) { - taskCompletion = false; - } else { - taskCompletion = true; - // status = "FAILED" + "#" + task.getNice_status() + "#" + - // task.getNice_status_short_description(); - } - } else { - // The task is either succeeded, failed or inactive. - taskCompletion = true; - // status = status + "#" + task.getNice_status() + "#" + - // task.getNice_status_short_description(); - } - } else { - // status = "FAILED"; - taskCompletion = true; - } - } else { - // status = "FAILED"; - taskCompletion = true; - } + taskCompleted = GlobusUtil.isTaskCompleted(task); } catch (Exception ex) { ex.printStackTrace(); } - } while (!taskCompletion); + } while (!taskCompleted); globusLogger.info("globus transfer task completed successfully"); return task; } - - private String getTaskStatus(GlobusTask task) { - String status = null; - if (task != null) { - status = task.getStatus(); - if (status != null) { - // The task is in progress but is not ok or queued - if (status.equalsIgnoreCase("ACTIVE")) { - status = "FAILED" + "#" + task.getNice_status() + "#" + task.getNice_status_short_description(); - } else { - // The task is either succeeded, failed or inactive. - status = status + "#" + task.getNice_status() + "#" + task.getNice_status_short_description(); - } - } else { - status = "FAILED"; - } - } else { - status = "FAILED"; - } - return status; - } - + public JsonObject calculateMissingMetadataFields(List inputList, Logger globusLogger) throws InterruptedException, ExecutionException, IOException { @@ -1133,27 +1266,39 @@ private FileDetailsHolder calculateDetails(String id, Logger globusLogger) String fileName = id.split("IDsplit")[2]; // ToDo: what if the file does not exist in s3 + // (L.A.) - good question. maybe it should call .open and .exists() here? + // otherwise, there doesn't seem to be any diagnostics as to which + // files uploaded successfully and which failed (?) + // ... however, any partially successful upload cases should be + // properly handled later, during the .addFiles() call - only + // the files that actually exists in storage remotely will be + // added to the dataset permanently then. // ToDo: what if checksum calculation failed + // (L.A.) - this appears to have been addressed - by using "Not available in Dataverse" + // in place of a checksum. - do { - try { - StorageIO dataFileStorageIO = DataAccess.getDirectStorageIO(fullPath); - in = dataFileStorageIO.getInputStream(); - checksumVal = FileUtil.calculateChecksum(in, DataFile.ChecksumType.MD5); - count = 3; - } catch (IOException ioex) { - count = 3; - logger.fine(ioex.getMessage()); - globusLogger.info( - "DataFile (fullPath " + fullPath + ") does not appear to be accessible within Dataverse: "); - } catch (Exception ex) { - count = count + 1; - ex.printStackTrace(); - logger.info(ex.getMessage()); - Thread.sleep(5000); - } + String storageDriverId = DataAccess.getDriverIdAndStorageLocation(fullPath)[0]; - } while (count < 3); + if (StorageIO.isDataverseAccessible(storageDriverId)) { + do { + try { + StorageIO dataFileStorageIO = DataAccess.getDirectStorageIO(fullPath); + in = dataFileStorageIO.getInputStream(); + checksumVal = FileUtil.calculateChecksum(in, DataFile.ChecksumType.MD5); + count = 3; + } catch (IOException ioex) { + count = 3; + logger.fine(ioex.getMessage()); + globusLogger.info( + "DataFile (fullPath " + fullPath + ") does not appear to be accessible within Dataverse: "); + } catch (Exception ex) { + count = count + 1; + ex.printStackTrace(); + logger.info(ex.getMessage()); + Thread.sleep(5000); + } + } while (count < 3); + } if (checksumVal.length() == 0) { checksumVal = "Not available in Dataverse"; @@ -1261,7 +1406,7 @@ public void writeGuestbookAndStartTransfer(GuestbookResponse guestbookResponse, Long fileId = Long.parseLong(idAsString); // If we need to create a GuestBookResponse record, we have to // look up the DataFile object for this file: - df = dataFileService.findCheapAndEasy(fileId); + df = dataFileSvc.findCheapAndEasy(fileId); selectedFiles.add(df); if (!doNotSaveGuestbookResponse) { guestbookResponse.setDataFile(df); @@ -1281,5 +1426,58 @@ public void writeGuestbookAndStartTransfer(GuestbookResponse guestbookResponse, } } } + + public List findAllOngoingTasks() { + return em.createQuery("select object(o) from GlobusTaskInProgress as o order by o.startTime", GlobusTaskInProgress.class).getResultList(); + } + + public void deleteTask(GlobusTaskInProgress task) { + GlobusTaskInProgress mergedTask = em.merge(task); + em.remove(mergedTask); + } + + public List findExternalUploadsByTaskId(String taskId) { + return em.createNamedQuery("ExternalFileUploadInProgress.findByTaskId").setParameter("taskId", taskId).getResultList(); + } + + public void processCompletedTask(GlobusTaskInProgress globusTask, boolean taskSuccess, String taskStatus, Logger taskLogger) { + String ruleId = globusTask.getRuleId(); + Dataset dataset = globusTask.getDataset(); + AuthenticatedUser authUser = globusTask.getLocalUser(); + if (authUser == null) { + // @todo log error message; do nothing + return; + } + + if (GlobusTaskInProgress.TaskType.UPLOAD.equals(globusTask.getTaskType())) { + List fileUploadsInProgress = findExternalUploadsByTaskId(globusTask.getTaskId()); + if (fileUploadsInProgress == null || fileUploadsInProgress.size() < 1) { + // @todo log error message; do nothing + // (will this ever happen though?) + return; + } + + JsonArrayBuilder filesJsonArrayBuilder = Json.createArrayBuilder(); + + for (ExternalFileUploadInProgress pendingFile : fileUploadsInProgress) { + String jsonInfoString = pendingFile.getFileInfo(); + JsonObject fileObject = JsonUtil.getJsonObject(jsonInfoString); + filesJsonArrayBuilder.add(fileObject); + } + + JsonArray filesJsonArray = filesJsonArrayBuilder.build(); + + processCompletedUploadTask(dataset, filesJsonArray, authUser, ruleId, taskLogger, taskSuccess, taskStatus); + } else { + // @todo eventually, extend this async. framework to handle Glonus downloads as well + } + + } + + public void deleteExternalUploadRecords(String taskId) { + em.createNamedQuery("ExternalFileUploadInProgress.deleteByTaskId") + .setParameter("taskId", taskId) + .executeUpdate(); + } } diff --git a/src/main/java/edu/harvard/iq/dataverse/globus/GlobusTaskInProgress.java b/src/main/java/edu/harvard/iq/dataverse/globus/GlobusTaskInProgress.java new file mode 100644 index 00000000000..8644bca6143 --- /dev/null +++ b/src/main/java/edu/harvard/iq/dataverse/globus/GlobusTaskInProgress.java @@ -0,0 +1,202 @@ +package edu.harvard.iq.dataverse.globus; + +import edu.harvard.iq.dataverse.Dataset; +import edu.harvard.iq.dataverse.authorization.users.AuthenticatedUser; +import jakarta.persistence.Column; +import jakarta.persistence.EnumType; +import jakarta.persistence.Enumerated; +import jakarta.persistence.ManyToOne; +import jakarta.persistence.Table; +import jakarta.persistence.UniqueConstraint; +import java.io.Serializable; +import java.sql.Timestamp; +import java.util.Arrays; +import jakarta.persistence.Entity; +import jakarta.persistence.GeneratedValue; +import jakarta.persistence.GenerationType; +import jakarta.persistence.Id; +import jakarta.persistence.JoinColumn; + +/** + * + * @author landreev + */ +@Entity +@Table(uniqueConstraints = {@UniqueConstraint(columnNames = "taskid")}) +public class GlobusTaskInProgress implements Serializable { + + private static final long serialVersionUID = 1L; + @Id + @GeneratedValue(strategy = GenerationType.IDENTITY) + private Long id; + + /** + * Globus-side identifier of the task in progress, upload or download + */ + @Column(nullable=false, unique = true) + private String taskId; + + /** + * I was considering giving this enum type a more specific name "TransferType" + * - but maybe there will be another use case where we need to keep track of + * Globus tasks that are not data transfers (?) + */ + public enum TaskType { + + UPLOAD("UPLOAD"), + DOWNLOAD("DOWNLOAD"); + + private final String text; + + private TaskType(final String text) { + this.text = text; + } + + public static TaskType fromString(String text) { + if (text != null) { + for (TaskType taskType : TaskType.values()) { + if (text.equals(taskType.text)) { + return taskType; + } + } + } + throw new IllegalArgumentException("TaskType must be one of these values: " + Arrays.asList(TaskType.values()) + "."); + } + + @Override + public String toString() { + return text; + } + } + + @Column(nullable=false) + @Enumerated(EnumType.STRING) + private TaskType taskType; + + /** + * Globus API token that should be used to monitor the status of the task + */ + @Column(nullable=false) + private String globusToken; + + /** + * This is the the user who initiated the Globus task + */ + @ManyToOne + @JoinColumn + private AuthenticatedUser user; + + @Column(nullable=false) + private String ruleId; + + @JoinColumn(nullable = false) + @ManyToOne + private Dataset dataset; + + @Column + private Timestamp startTime; + + public GlobusTaskInProgress() { + } + + GlobusTaskInProgress(String taskId, TaskType taskType, Dataset dataset, String globusToken, AuthenticatedUser authUser, String ruleId, Timestamp startTime) { + this.taskId = taskId; + this.taskType = taskType; + this.dataset = dataset; + this.globusToken = globusToken; + this.user = authUser; + this.ruleId = ruleId; + this.startTime = startTime; + } + + + + public Long getId() { + return id; + } + + public void setId(Long id) { + this.id = id; + } + + public String getTaskId() { + return taskId; + } + + public void setTaskId(String taskId) { + this.taskId = taskId; + } + + public TaskType getTaskType() { + return taskType; + } + + public void setTaskType(TaskType taskType) { + this.taskType = taskType; + } + + public String getGlobusToken() { + return globusToken; + } + + public void setGlobusToken(String clientToken) { + this.globusToken = clientToken; + } + + public AuthenticatedUser getLocalUser() { + return user; + } + + public void setLocalUser(AuthenticatedUser authUser) { + this.user = authUser; + } + + public String getRuleId() { + return ruleId; + } + + public void setRuleId(String ruleId) { + this.ruleId = ruleId; + } + public Dataset getDataset() { + return dataset; + } + + public void setDataset(Dataset dataset) { + this.dataset = dataset; + } + + public Timestamp getStartTime() { + return startTime; + } + + public void setStartTime(Timestamp startTime) { + this.startTime = startTime; + } + + @Override + public int hashCode() { + int hash = 0; + hash += (id != null ? id.hashCode() : 0); + return hash; + } + + @Override + public boolean equals(Object object) { + // TODO: Warning - this method won't work in the case the id fields are not set + if (!(object instanceof GlobusTaskInProgress)) { + return false; + } + GlobusTaskInProgress other = (GlobusTaskInProgress) object; + if ((this.id == null && other.id != null) || (this.id != null && !this.id.equals(other.id))) { + return false; + } + return true; + } + + @Override + public String toString() { + return "edu.harvard.iq.dataverse.globus.GlobusTaskInProgress[ id=" + id + " ]"; + } + +} diff --git a/src/main/java/edu/harvard/iq/dataverse/globus/GlobusTask.java b/src/main/java/edu/harvard/iq/dataverse/globus/GlobusTaskState.java similarity index 93% rename from src/main/java/edu/harvard/iq/dataverse/globus/GlobusTask.java rename to src/main/java/edu/harvard/iq/dataverse/globus/GlobusTaskState.java index c2b01779f4a..b5db20d46c1 100644 --- a/src/main/java/edu/harvard/iq/dataverse/globus/GlobusTask.java +++ b/src/main/java/edu/harvard/iq/dataverse/globus/GlobusTaskState.java @@ -1,6 +1,10 @@ package edu.harvard.iq.dataverse.globus; -public class GlobusTask { +/** + * This class is used to store the state of an ongoing Globus task (transfer) + * as reported by the Globus task API. + */ +public class GlobusTaskState { private String DATA_TYPE; private String type; diff --git a/src/main/java/edu/harvard/iq/dataverse/globus/GlobusUtil.java b/src/main/java/edu/harvard/iq/dataverse/globus/GlobusUtil.java index 92cf8ac7704..652898591ac 100644 --- a/src/main/java/edu/harvard/iq/dataverse/globus/GlobusUtil.java +++ b/src/main/java/edu/harvard/iq/dataverse/globus/GlobusUtil.java @@ -30,4 +30,64 @@ public static JsonObject getFilesMap(List dataFiles, Dataset d) { } return filesBuilder.build(); } + + public static boolean isTaskCompleted(GlobusTaskState task) { + if (task != null) { + String status = task.getStatus(); + if (status != null) { + if (status.equalsIgnoreCase("ACTIVE")) { + if (task.getNice_status().equalsIgnoreCase("ok") + || task.getNice_status().equalsIgnoreCase("queued")) { + return false; + } + } + } + } + return true; + } + + public static boolean isTaskSucceeded(GlobusTaskState task) { + String status = null; + if (task != null) { + status = task.getStatus(); + if (status != null) { + status = status.toUpperCase(); + if (status.equals("ACTIVE") || status.startsWith("FAILED") || status.startsWith("INACTIVE")) { + // There are cases where a failed task may still be showing + // as "ACTIVE". But it is definitely safe to assume that it + // has not completed *successfully*. + return false; + } + return true; + } + } + return false; + } + /** + * Produces a human-readable Status label of a completed task + * @param GlobusTaskState task - a looked-up state of a task as reported by Globus API + */ + public static String getTaskStatus(GlobusTaskState task) { + String status = null; + if (task != null) { + status = task.getStatus(); + if (status != null) { + // The task is in progress but is not ok or queued + // (L.A.) I think the assumption here is that this method is called + // exclusively on tasks that have already completed. So that's why + // it is safe to assume that "ACTIVE" means "FAILED". + if (status.equalsIgnoreCase("ACTIVE")) { + status = "FAILED" + "#" + task.getNice_status() + "#" + task.getNice_status_short_description(); + } else { + // The task is either succeeded, failed or inactive. + status = status + "#" + task.getNice_status() + "#" + task.getNice_status_short_description(); + } + } else { + status = "FAILED"; + } + } else { + status = "FAILED"; + } + return status; + } } \ No newline at end of file diff --git a/src/main/java/edu/harvard/iq/dataverse/globus/TaskMonitoringServiceBean.java b/src/main/java/edu/harvard/iq/dataverse/globus/TaskMonitoringServiceBean.java new file mode 100644 index 00000000000..fdb2b222804 --- /dev/null +++ b/src/main/java/edu/harvard/iq/dataverse/globus/TaskMonitoringServiceBean.java @@ -0,0 +1,131 @@ +package edu.harvard.iq.dataverse.globus; + +import edu.harvard.iq.dataverse.settings.JvmSettings; +import edu.harvard.iq.dataverse.settings.SettingsServiceBean; +import edu.harvard.iq.dataverse.util.SystemConfig; +import jakarta.annotation.PostConstruct; +import jakarta.annotation.Resource; +import jakarta.ejb.EJB; +import jakarta.ejb.Singleton; +import jakarta.ejb.Startup; +import jakarta.enterprise.concurrent.ManagedScheduledExecutorService; +import java.io.File; +import java.io.IOException; +import java.text.SimpleDateFormat; +import java.util.Date; +import java.util.List; +import java.util.concurrent.TimeUnit; +import java.util.logging.FileHandler; +import java.util.logging.Logger; + +/** + * + * This Singleton monitors ongoing Globus tasks by checking with the centralized + * Globus API on the status of all the registered ongoing tasks. + * When a successful completion of a task is detected, the service triggers + * the execution of the associated tasks (for example, finalizing adding datafiles + * to the dataset on completion of a remote Globus upload). When a task fails or + * terminates abnormally, a message is logged and the task record is deleted + * from the database. + * + * @author landreev + */ +@Singleton +@Startup +public class TaskMonitoringServiceBean { + private static final Logger logger = Logger.getLogger("edu.harvard.iq.dataverse.globus.TaskMonitoringServiceBean"); + + @Resource + ManagedScheduledExecutorService scheduler; + + @EJB + SystemConfig systemConfig; + @EJB + SettingsServiceBean settingsSvc; + @EJB + GlobusServiceBean globusService; + + private static final SimpleDateFormat logFormatter = new SimpleDateFormat("yyyy-MM-dd'T'HH-mm-ss"); + + @PostConstruct + public void init() { + if (JvmSettings.GLOBUS_TASK_MONITORING_SERVER.lookupOptional(Boolean.class).orElse(false)) { + logger.info("Starting Globus task monitoring service"); + int pollingInterval = SystemConfig.getIntLimitFromStringOrDefault( + settingsSvc.getValueForKey(SettingsServiceBean.Key.GlobusPollingInterval), 600); + this.scheduler.scheduleWithFixedDelay(this::checkOngoingTasks, + 0, pollingInterval, + TimeUnit.SECONDS); + } else { + logger.info("Skipping Globus task monitor initialization"); + } + } + + /** + * This method will be executed on a timer-like schedule, continuously + * monitoring all the ongoing external Globus tasks (transfers). + */ + public void checkOngoingTasks() { + logger.fine("Performing a scheduled external Globus task check"); + List tasks = globusService.findAllOngoingTasks(); + + tasks.forEach(t -> { + FileHandler taskLogHandler = getTaskLogHandler(t); + Logger taskLogger = getTaskLogger(t, taskLogHandler); + + GlobusTaskState retrieved = globusService.getTask(t.getGlobusToken(), t.getTaskId(), taskLogger); + if (GlobusUtil.isTaskCompleted(retrieved)) { + // Do our thing, finalize adding the files to the dataset + globusService.processCompletedTask(t, GlobusUtil.isTaskSucceeded(retrieved), GlobusUtil.getTaskStatus(retrieved), taskLogger); + // Whether it finished successfully, or failed in the process, + // there's no need to keep monitoring this task, so we can + // delete it. + //globusService.deleteExternalUploadRecords(t.getTaskId()); + globusService.deleteTask(t); + } + + if (taskLogHandler != null) { + // @todo it should be prudent to cache these loggers and handlers + // between monitoring runs (should be fairly easy to do) + taskLogHandler.close(); + } + }); + } + + private FileHandler getTaskLogHandler(GlobusTaskInProgress task) { + if (task == null) { + return null; + } + + Date startDate = new Date(task.getStartTime().getTime()); + String logTimeStamp = logFormatter.format(startDate); + + String logFileName = System.getProperty("com.sun.aas.instanceRoot") + File.separator + "logs" + File.separator + "globusUpload_" + task.getDataset().getId() + "_" + logTimeStamp + + ".log"; + FileHandler fileHandler; + try { + fileHandler = new FileHandler(logFileName); + } catch (IOException | SecurityException ex) { + // @todo log this error somehow? + fileHandler = null; + } + return fileHandler; + } + + private Logger getTaskLogger(GlobusTaskInProgress task, FileHandler logFileHandler) { + if (logFileHandler == null) { + return null; + } + Date startDate = new Date(task.getStartTime().getTime()); + String logTimeStamp = logFormatter.format(startDate); + + Logger taskLogger = Logger.getLogger( + "edu.harvard.iq.dataverse.upload.client.DatasetServiceBean." + "GlobusUpload" + logTimeStamp); + taskLogger.setUseParentHandlers(false); + + taskLogger.addHandler(logFileHandler); + + return taskLogger; + } + +} diff --git a/src/main/java/edu/harvard/iq/dataverse/ingest/IngestServiceBean.java b/src/main/java/edu/harvard/iq/dataverse/ingest/IngestServiceBean.java index 9bacafd173f..b42fd950528 100644 --- a/src/main/java/edu/harvard/iq/dataverse/ingest/IngestServiceBean.java +++ b/src/main/java/edu/harvard/iq/dataverse/ingest/IngestServiceBean.java @@ -345,12 +345,13 @@ public List saveAndAddFilesToDataset(DatasetVersion version, StorageIO dataAccess = DataAccess.getStorageIO(dataFile); //Populate metadata dataAccess.open(DataAccessOption.READ_ACCESS); - + // (the .open() above makes a remote call to check if + // the file exists and obtains its size) confirmedFileSize = dataAccess.getSize(); // For directly-uploaded files, we will perform the file size // limit and quota checks here. Perform them *again*, in - // some cases: a directly uploaded files have already been + // some cases: files directly uploaded via the UI have already been // checked (for the sake of being able to reject the upload // before the user clicks "save"). But in case of direct // uploads via API, these checks haven't been performed yet, diff --git a/src/main/java/edu/harvard/iq/dataverse/settings/FeatureFlags.java b/src/main/java/edu/harvard/iq/dataverse/settings/FeatureFlags.java index 2bfda69247a..33e828e619d 100644 --- a/src/main/java/edu/harvard/iq/dataverse/settings/FeatureFlags.java +++ b/src/main/java/edu/harvard/iq/dataverse/settings/FeatureFlags.java @@ -101,6 +101,10 @@ public enum FeatureFlags { * @since Dataverse 6.4 */ DISABLE_DATASET_THUMBNAIL_AUTOSELECT("disable-dataset-thumbnail-autoselect"), + /** + * Feature flag for the new Globus upload framework. + */ + GLOBUS_USE_EXPERIMENTAL_ASYNC_FRAMEWORK("globus-use-experimental-async-framework"), ; final String flag; diff --git a/src/main/java/edu/harvard/iq/dataverse/settings/JvmSettings.java b/src/main/java/edu/harvard/iq/dataverse/settings/JvmSettings.java index 0acc5d3267f..d7eea970b8a 100644 --- a/src/main/java/edu/harvard/iq/dataverse/settings/JvmSettings.java +++ b/src/main/java/edu/harvard/iq/dataverse/settings/JvmSettings.java @@ -51,6 +51,7 @@ public enum JvmSettings { DOCROOT_DIRECTORY(SCOPE_FILES, "docroot"), GUESTBOOK_AT_REQUEST(SCOPE_FILES, "guestbook-at-request"), GLOBUS_CACHE_MAXAGE(SCOPE_FILES, "globus-cache-maxage"), + GLOBUS_TASK_MONITORING_SERVER(SCOPE_FILES, "globus-monitoring-server"), //STORAGE DRIVER SETTINGS SCOPE_DRIVER(SCOPE_FILES), diff --git a/src/main/java/edu/harvard/iq/dataverse/timer/DataverseTimerServiceBean.java b/src/main/java/edu/harvard/iq/dataverse/timer/DataverseTimerServiceBean.java index 6eb3a8df0bc..a783b211b36 100644 --- a/src/main/java/edu/harvard/iq/dataverse/timer/DataverseTimerServiceBean.java +++ b/src/main/java/edu/harvard/iq/dataverse/timer/DataverseTimerServiceBean.java @@ -120,13 +120,13 @@ public void handleTimeout(jakarta.ejb.Timer timer) { } try { - logger.log(Level.INFO,"Handling timeout on " + InetAddress.getLocalHost().getCanonicalHostName()); + logger.log(Level.FINE,"Handling timeout on " + InetAddress.getLocalHost().getCanonicalHostName()); } catch (UnknownHostException ex) { Logger.getLogger(DataverseTimerServiceBean.class.getName()).log(Level.SEVERE, null, ex); } if (timer.getInfo() instanceof MotherTimerInfo) { - logger.info("Behold! I am the Master Timer, king of all timers! I'm here to create all the lesser timers!"); + logger.fine("Behold! I am the Master Timer, king of all timers! I'm here to create all the lesser timers!"); removeHarvestTimers(); for (HarvestingClient client : harvestingClientService.getAllHarvestingClients()) { createHarvestTimer(client); diff --git a/src/main/java/edu/harvard/iq/dataverse/util/MailUtil.java b/src/main/java/edu/harvard/iq/dataverse/util/MailUtil.java index 36c249de834..f81ce093815 100644 --- a/src/main/java/edu/harvard/iq/dataverse/util/MailUtil.java +++ b/src/main/java/edu/harvard/iq/dataverse/util/MailUtil.java @@ -99,6 +99,23 @@ public static String getSubjectTextBasedOnNotification(UserNotification userNoti } catch (Exception e) { return BundleUtil.getStringFromBundle("notification.email.globus.uploadCompletedWithErrors.subject", rootDvNameAsList); } + case GLOBUSUPLOADREMOTEFAILURE: + try { + DatasetVersion version = (DatasetVersion)objectOfNotification; + List dsNameAsList = Arrays.asList(version.getDataset().getDisplayName()); + return BundleUtil.getStringFromBundle("notification.email.globus.uploadFailedRemotely.subject", dsNameAsList); + + } catch (Exception e) { + return BundleUtil.getStringFromBundle("notification.email.globus.uploadFailedRemotely.subject", rootDvNameAsList); + } + case GLOBUSUPLOADLOCALFAILURE: + try { + DatasetVersion version = (DatasetVersion)objectOfNotification; + List dsNameAsList = Arrays.asList(version.getDataset().getDisplayName()); + return BundleUtil.getStringFromBundle("notification.email.globus.uploadFailedLocally.subject", dsNameAsList); + } catch (Exception e) { + return BundleUtil.getStringFromBundle("notification.email.globus.uploadFailedLocally.subject", rootDvNameAsList); + } case GLOBUSDOWNLOADCOMPLETEDWITHERRORS: try { DatasetVersion version = (DatasetVersion)objectOfNotification; diff --git a/src/main/java/edu/harvard/iq/dataverse/util/PersonOrOrgUtil.java b/src/main/java/edu/harvard/iq/dataverse/util/PersonOrOrgUtil.java index f68957ad060..80e32184731 100644 --- a/src/main/java/edu/harvard/iq/dataverse/util/PersonOrOrgUtil.java +++ b/src/main/java/edu/harvard/iq/dataverse/util/PersonOrOrgUtil.java @@ -123,7 +123,7 @@ public static JsonObject getPersonOrOrganization(String name, boolean organizati if (!name.replaceFirst(",", "").contains(",")) { // contributorName=, String[] fullName = name.split(", "); - givenName = fullName[1]; + givenName = fullName.length > 1 ? fullName[1] : null; familyName = fullName[0]; } } diff --git a/src/main/java/edu/harvard/iq/dataverse/util/SystemConfig.java b/src/main/java/edu/harvard/iq/dataverse/util/SystemConfig.java index 5cc28e4b225..60967b13131 100644 --- a/src/main/java/edu/harvard/iq/dataverse/util/SystemConfig.java +++ b/src/main/java/edu/harvard/iq/dataverse/util/SystemConfig.java @@ -545,7 +545,7 @@ public boolean isTimerServer() { } return false; } - + public String getFooterCopyrightAndYear() { return BundleUtil.getStringFromBundle("footer.copyright", Arrays.asList(Year.now().getValue() + "")); } diff --git a/src/main/java/propertyFiles/Bundle.properties b/src/main/java/propertyFiles/Bundle.properties index 4ba5e4a9f14..5f3e4c33e0b 100644 --- a/src/main/java/propertyFiles/Bundle.properties +++ b/src/main/java/propertyFiles/Bundle.properties @@ -263,11 +263,16 @@ notification.mail.import.filesystem=Dataset {2} ({0}/dataset.xhtml?persistentId= notification.mail.globus.upload.completed=Globus transfer to Dataset {2} was successful. File(s) have been uploaded and verified.

{3}
notification.mail.globus.download.completed=Globus transfer of file(s) from the dataset {2} was successful.

{3}
notification.mail.globus.upload.completedWithErrors=Globus transfer to Dataset {2} is complete with errors.

{3}
+notification.mail.globus.upload.failedRemotely=Remote data transfer between Globus endpoints for Dataset {2} failed, as reported via Globus API.

{3}
+notification.mail.globus.upload.failedLocally=Dataverse received a confirmation of a successful Globus data transfer for Dataset {2}, but failed to add the files to the dataset locally.

{3}
notification.mail.globus.download.completedWithErrors=Globus transfer from the dataset {2} is complete with errors.

{3}
notification.import.filesystem=Dataset {1} has been successfully uploaded and verified. notification.globus.upload.completed=Globus transfer to Dataset {1} was successful. File(s) have been uploaded and verified. notification.globus.download.completed=Globus transfer from the dataset {1} was successful. notification.globus.upload.completedWithErrors=Globus transfer to Dataset {1} is complete with errors. +notification.globus.upload.failedRemotely=Remote data transfer between Globus collections for Dataset {2} failed, reported via Globus API.

{3}
+notification.globus.upload.failedLocally=Dataverse received a confirmation of a successful Globus data transfer for Dataset {2}, but failed to add the files to the dataset locally.

{3}
+ notification.globus.download.completedWithErrors=Globus transfer from the dataset {1} is complete with errors. notification.import.checksum={1}, dataset had file checksums added via a batch job. removeNotification=Remove Notification @@ -833,8 +838,8 @@ notification.email.datasetWasMentioned.subject={0}: A Dataset Relationship has b notification.email.globus.uploadCompleted.subject={0}: Files uploaded successfully via Globus and verified notification.email.globus.downloadCompleted.subject={0}: Files downloaded successfully via Globus notification.email.globus.uploadCompletedWithErrors.subject={0}: Uploaded files via Globus with errors -notification.email.globus.downloadCompletedWithErrors.subject={0}: Downloaded files via Globus with errors - +notification.email.globus.uploadFailedRemotely.subject={0}: Failed to upload files via Globus +notification.email.globus.uploadFailedLocally.subject={0}: Failed to add files uploaded via Globus to dataset # dataverse.xhtml dataverse.name=Dataverse Name dataverse.name.title=The project, department, university, professor, or journal this dataverse will contain data for. @@ -1779,6 +1784,7 @@ file.fromWebloaderAfterCreate.tip=An option to upload a folder of files will be file.fromWebloader=Upload a Folder file.api.httpDisabled=File upload via HTTP is not available for this installation of Dataverse. +file.api.globusUploadDisabled=File upload via Globus is not available for this installation of Dataverse. file.api.alreadyHasPackageFile=File upload via HTTP disabled since this dataset already contains a package file. file.replace.original=Original File file.editFiles=Edit Files diff --git a/src/main/webapp/dataset-license-terms.xhtml b/src/main/webapp/dataset-license-terms.xhtml index 255e63fbfc2..03173faf989 100644 --- a/src/main/webapp/dataset-license-terms.xhtml +++ b/src/main/webapp/dataset-license-terms.xhtml @@ -12,7 +12,7 @@ or !empty termsOfUseAndAccess.originalArchive or !empty termsOfUseAndAccess.availabilityStatus or !empty termsOfUseAndAccess.contactForAccess or !empty termsOfUseAndAccess.sizeOfCollection or !empty termsOfUseAndAccess.studyCompletion - or termsOfUseAndAccess.fileAccessRequest}"/> + }"/>
+ + + + + + + + + + + + + + diff --git a/src/main/webapp/metadataFragment.xhtml b/src/main/webapp/metadataFragment.xhtml index 43d54f64c43..723f95148cd 100755 --- a/src/main/webapp/metadataFragment.xhtml +++ b/src/main/webapp/metadataFragment.xhtml @@ -126,11 +126,11 @@ - + - - + + @@ -251,7 +251,7 @@
- + #{dsf.datasetFieldType.localeTitle}

-