Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Add API endpoint for comparing Dataset Versions #10888

Open
ekraffmiller opened this issue Sep 27, 2024 · 7 comments · May be fixed by #10945
Open

Feature Request: Add API endpoint for comparing Dataset Versions #10888

ekraffmiller opened this issue Sep 27, 2024 · 7 comments · May be fixed by #10945
Labels
Feature: API FY25 Sprint 8 FY25 Sprint 8 (2024-10-09 - 2024-10-23) GREI Re-arch Issues related to the GREI Dataverse rearchitecture Original size: 30 Size: 30 A percentage of a sprint. 21 hours. (formerly size:33) SPA.Q4.10 Resolve TODOs and tech debt SPA These changes are required for the Dataverse SPA Type: Feature a feature request

Comments

@ekraffmiller
Copy link
Contributor

ekraffmiller commented Sep 27, 2024

Overview of the Feature Request
Need an API endpoint that will compare two dataset versions and return a list of differences between the versions. This is needed to support the SPA Dataset Page

What kind of user is the feature intended for?
(Example users roles: API User, Curator, Depositor, Guest, Superuser, Sysadmin)
API User

What inspired the request?
IQSS/dataverse-client-javascript#197
IQSS/dataverse-frontend#511

What existing behavior do you want changed?
None

Any brand new behavior do you want to add to Dataverse?
New Dataverse API endpoint

Any open or closed issues related to this feature request?

Are you thinking about creating a pull request for this feature?
Help is always welcome, is this feature something you or your organization plan to implement?

@ekraffmiller ekraffmiller added Type: Feature a feature request Feature: API SPA These changes are required for the Dataverse SPA GREI Re-arch Issues related to the GREI Dataverse rearchitecture labels Sep 27, 2024
@GPortas GPortas added SPA.Q4.10 Resolve TODOs and tech debt Size: 30 A percentage of a sprint. 21 hours. (formerly size:33) Original size: 30 FY25 Sprint 8 FY25 Sprint 8 (2024-10-09 - 2024-10-23) labels Oct 9, 2024
@stevenwinship stevenwinship self-assigned this Oct 10, 2024
@stevenwinship
Copy link
Contributor

There are multiple options for how the response could be formatted:

Option 1. json list with two objects. each object contains only the modified fields.
ex. [{'id'=versionAid, 'subject'='version A subject', 'subtitle'=''},{'id'=versionBid, 'subject'='New subject', 'subtitle'='new subtitle'}]

Option 2. json response with before and after values:
ex. {'subject'= {'versionAid' = 'version A subject', 'versionBid'='New subject'}, 'subtitle'={'versionAid' = '', 'versionBid'='new subtitle'}}

I'm sure there could be more options. @ekraffmiller could you let me know what format would make the most sense for the SPA code?

@qqmyers
Copy link
Member

qqmyers commented Oct 10, 2024

FWIW: I think the outputs from the DatasetVersionDifference class are more like option 2. Similarly, I think that's the format closer to how we display the differences in the dataset page version table.

@stevenwinship
Copy link
Contributor

stevenwinship commented Oct 15, 2024

@ekraffmiller
Here is the Json formatted output that I believe will work well in a table on the UI. Please let me know if this works or if changes are needed.

{
    "status": "OK",
    "data": {
        "Metadata": {
            "Author": {
                "0": "Finch, Fiona; (Birds Inc.)",
                "1": "Finch, Fiona; (Birds Inc.); Poe, Edgar Allen; (Baltimore Poets); Mulligan, Hercules; (Sons of Liberty)"
            },
            "Subject": {
                "0": "Medicine, Health and Life Sciences",
                "1": "Medicine, Health and Life Sciences; Astronomy and Astrophysics; Other"
            },
            "Producer": {
                "0": "",
                "1": "Allen, Irwin; (MGM); Spielberg, Stephen; (ILM)"
            },
            "Design Type": {
                "0": "",
                "1": "Parallel Group Design; Nested Case Control Design"
            }
        },
        "Files": {
            "added": [
                {
                    "description": "",
                    "label": "dataverseproject.png",
                    "restricted": false,
                    "version": 1,
                    "datasetVersionId": 4,
                    "dataFile": {
                        "id": 11,
                        "persistentId": "",
                        "filename": "dataverseproject.png",
                        "contentType": "image/png",
                        "friendlyType": "PNG Image",
                        "filesize": 12918,
                        "description": "",
                        "storageIdentifier": "local://19296b38e55-71601b050f3d",
                        "rootDataFileId": -1,
                        "md5": "e55e66ff785045154875c4b6841eb527",
                        "checksum": {
                            "type": "MD5",
                            "value": "e55e66ff785045154875c4b6841eb527"
                        },
                        "tabularData": false,
                        "creationDate": "2024-10-16",
                        "fileAccessRequest": true
                    }
                }
            ],
            "removed": [
                {
                    "description": "",
                    "label": "dataverseproject_logo.jpg",
                    "restricted": false,
                    "version": 1,
                    "datasetVersionId": 3,
                    "dataFile": {
                        "id": 10,
                        "persistentId": "",
                        "filename": "dataverseproject_logo.jpg",
                        "contentType": "image/jpeg",
                        "friendlyType": "JPEG Image",
                        "filesize": 4462,
                        "description": "",
                        "storageIdentifier": "local://19296b371ed-ea4ec196219e",
                        "rootDataFileId": -1,
                        "md5": "c1edbefa86a55c5037873370ae7fd7b6",
                        "checksum": {
                            "type": "MD5",
                            "value": "c1edbefa86a55c5037873370ae7fd7b6"
                        },
                        "tabularData": false,
                        "creationDate": "2024-10-16",
                        "publicationDate": "2024-10-16",
                        "fileAccessRequest": true
                    }
                }
            ],
            "modified": [
                {
                    "fileMetadata": {
                        "description": "",
                        "label": "dataverse-icon-1200.png",
                        "restricted": false,
                        "version": 1,
                        "datasetVersionId": 3,
                        "dataFile": {
                            "id": 9,
                            "persistentId": "",
                            "filename": "dataverse-icon-1200.png",
                            "contentType": "image/png",
                            "friendlyType": "PNG Image",
                            "filesize": 27650,
                            "description": "",
                            "storageIdentifier": "local://19296b370c7-b90cd887fd36",
                            "rootDataFileId": -1,
                            "md5": "a23eb44803d9127bc6e055f77b869816",
                            "checksum": {
                                "type": "MD5",
                                "value": "a23eb44803d9127bc6e055f77b869816"
                            },
                            "tabularData": false,
                            "creationDate": "2024-10-16",
                            "publicationDate": "2024-10-16",
                            "fileAccessRequest": true
                        }
                    },
                    "isRestricted": {
                        "0": "false",
                        "1": "true"
                    }
                }
            ]
        },
        "TermsOfAccess": {
            "Data Access Place": {
                "0": "",
                "1": "Somewhere"
            }
        }
    }
}

@ekraffmiller
Copy link
Contributor Author

thanks @stevenwinship I will review the SPA requirements today

@stevenwinship stevenwinship removed their assignment Oct 21, 2024
@ekraffmiller
Copy link
Contributor Author

Hi @stevenwinship sorry for the late reply, for the Compare Version Details Popup, we will need the changes grouped by metadata block. Also it would be more flexible in the UI to have the changed values in an array (for "multiple" type fields.)

Here is an example:

{
  "oldVersion": {
    "versionNumber": "1.0",
    "createdDate": "2023-01-15T08:00:00Z"
  },
  "newVersion": {
    "versionNumber": "1.1",
    "createdDate": "2024-01-20T08:00:00Z"
  },
  "metadataChanges": [
    {
      "blockName": "citation",
      "changed": [
        {
          "fieldName": "title",
          "oldValue": ["Initial Dataset Title"],
          "newValue": ["Updated Dataset Title"]
        },
        {
          "fieldName": "author",
          "oldValue": ["John Doe"],
          "newValue": ["John Doe", "Jane Smith"]
        }
      ]
    },
    {
      "blockName": "socialscience",
      "changed": [
        {
          "fieldName": "studyDesignType",
          "oldValue": ["design type 1","design type 2"],
          "newValue": ["design type 1a", "design type 1b", "design type 1c"]
        }
      ]
    }
  ],

    "fileChanges": [
      {
        "fileName": "data.csv",
        "changes": [
          {
            "fieldName": "filePath",
            "oldValue": "/oldpath/data_v1.csv",
            "newValue": "/newpathdata_v2.csv"
          }
        ]
      },
      {
        "fileName": "readme.txt",
        "changes": [
          {
            "fieldName": "description",
            "oldValue": "Basic dataset info",
            "newValue": "Updated dataset info with more details"
          }
        ]
      }
    ]

}

@ekraffmiller
Copy link
Contributor Author

I'm sorry I realized there is some missing file information in the JSON example I sent you, here is an updated example. I have added fields to the file elements. I also included a 'filesReplaced" array.
Other changes:

  • using lastUpdatedDate rather than createDate
  • using metadata name rather than displayName
  • returning multiple metadatavalues as array elements rather than separated by ';'

jsonexample.json

{
  "oldVersion": {
    "versionNumber": "1.0",
    "lastUpdatedDate": "2023-01-15T08:00:00Z"
  },
  "newVersion": {
    "versionNumber": "1.1",
    "lastUpdatedDate": "2024-01-20T08:00:00Z"
  },
  "metadataChanges": [
    {
      "blockName": "citation",
      "changed": [
        {
          "fieldName": "title",
          "oldValue": ["Initial Dataset Title"],
          "newValue": ["Updated Dataset Title"]
        },
        {
          "fieldName": "author",
          "oldValue": ["John Doe"],
          "newValue": ["John Doe", "Jane Smith"]
        }
      ]
    },
    {
      "blockName": "socialscience",
      "changed": [
        {
          "fieldName": "studyDesignType",
          "oldValue": ["design type 1", "design type 2"],
          "newValue": ["design type 1a", "design type 1b", "design type 1c"]
        }
      ]
    }
  ],
  "filesAdded": [
    {
      "fileName": "teacher_survey.tab",
      "md5": "1234567890",
      "type": "Tab-Delimited",
      "fileId": 3,
      "tags": ["Documentation"],
      "description": "my file description",
      "isRestricted": false
    },
    {
      "fileName": "biomedical.json",
      "md5": "1234567890",
      "type": "JSON",
      "fileId": 4,
      "tags": ["Documentation", "Data"],
      "description": "my json file description",
      "isRestricted": true
    }
  ],
  "filesReplaced": [
    {
      "oldFile": {
        "fileName": "teacher_survey.tab",
        "md5": "1234567890",
        "type": "Tab-Delimited",
        "fileId": 3,
        "tags": ["Documentation", "Data"],
        "description": "my json file description",
        "isRestricted": false
      },
      "newFile": {
        "fileName": "biomedical.json",
        "md5": "1234567890",
        "type": "JSON",
        "fileId": 4,
        "tags": ["Documentation", "Data"],
        "description": "my json file description",
        "isRestricted": true
      }
    },
    {
      "oldFile": {
        "fileName": "test1.json",
        "md5": "1234567890",
        "type": "JSON",
        "fileId": 3,
        "isRestricted": false
      },
      "newFile": {
        "fileName": "test2.json",
        "md5": "1234567890",
        "type": "JSON",
        "fileId": 4,
        "isRestricted": true
      }
    }
  ],
  "filesChanged": [
    {
      "fileName": "data.csv",
      "md5": "1234567890",
      "fileId": 1,
      "changes": [
        {
          "fieldName": "filePath",
          "oldValue": "/oldpath/data_v1.csv",
          "newValue": "/newpathdata_v2.csv"
        }
      ]
    },
    {
      "fileName": "readme.txt",
      "md5": "1234567890",
      "fileId": 2,
      "changes": [
        {
          "fieldName": "description",
          "oldValue": "Basic dataset info",
          "newValue": "Updated dataset info with more details"
        }
      ]
    }
  ]
 "TermsOfAccess": {
            "changed": [
                {
                    "fieldName": "dataAccessPlace",
                    "oldValue": "",
                    "newValue": "Somewhere"
                }
            ]
        }
}

@stevenwinship
Copy link
Contributor

stevenwinship commented Oct 22, 2024

Here is an example of the latest json format:

{
    "status": "OK",
    "data": {
        "oldVersion": {
            "versionNumber": "1.0",
            "lastUpdatedDate": "2024-10-24T15:17:11Z"
        },
        "newVersion": {
            "versionNumber": "DRAFT",
            "lastUpdatedDate": "2024-10-24T15:17:16Z"
        },
        "metadataChanges": [
            {
                "blockName": "Citation Metadata",
                "changed": [
                    {
                        "fieldName": "Author",
                        "oldValue": "Finch, Fiona; (Birds Inc.)",
                        "newValue": "Finch, Fiona; (Birds Inc.); Poe, Edgar Allen; (Baltimore Poets); Mulligan, Hercules; (Sons of Liberty)"
                    },
                    {
                        "fieldName": "Subject",
                        "oldValue": "Medicine, Health and Life Sciences",
                        "newValue": "Medicine, Health and Life Sciences; Astronomy and Astrophysics; Other"
                    },
                    {
                        "fieldName": "Producer",
                        "oldValue": "",
                        "newValue": "Allen, Irwin; (MGM); Spielberg, Stephen; (ILM)"
                    }
                ]
            },
            {
                "blockName": "Life Sciences Metadata",
                "changed": [
                    {
                        "fieldName": "Design Type",
                        "oldValue": "",
                        "newValue": "Parallel Group Design; Nested Case Control Design"
                    }
                ]
            }
        ],
        "filesAdded": [
            {
                "fileName": "test.tab",
                "filePath": "data/subdir1",
                "MD5": "77c7f03a7d7772907b43f0b322cef723",
                "type": "text/tab-separated-values",
                "fileId": 42,
                "description": "my description",
                "isRestricted": false,
                "categories": [
                    "Data"
                ],
                "tags": [
                    "Survey"
                ]
            }
        ],
        "filesRemoved": [
            {
                "fileName": "dataverseproject_logo.jpg",
                "filePath": "data/subdir1",
                "MD5": "c1edbefa86a55c5037873370ae7fd7b6",
                "type": "image/jpeg",
                "fileId": 40,
                "description": "my description",
                "isRestricted": false,
                "categories": [
                    "Data"
                ]
            }
        ],
        "filesReplaced": [
            {
                "oldFile": {
                    "fileName": "favicon-16x16.png",
                    "filePath": "data/subdir1",
                    "MD5": "d3c852e7ecb92fd105ba4018116a9be8",
                    "type": "image/png",
                    "fileId": 41,
                    "description": "my description",
                    "isRestricted": false,
                    "categories": [
                        "Data"
                    ]
                },
                "newFile": {
                    "fileName": "favicon-32x32.png",
                    "filePath": "data/subdir1",
                    "MD5": "c931f7add8b6a1f9a691046b77c231fa",
                    "type": "image/png",
                    "fileId": 43,
                    "description": "my description",
                    "isRestricted": false,
                    "categories": [
                        "Data"
                    ]
                }
            }
        ],
        "fileChanges": [
            {
                "fileName": "dataverse-icon-1200.png",
                "MD5": "a23eb44803d9127bc6e055f77b869816",
                "fileId": 39,
                "changed": [
                    {
                        "fieldName": "isRestricted",
                        "oldValue": "false",
                        "newValue": "true"
                    }
                ]
            }
        ],
        "TermsOfAccess": {
            "changed": [
                {
                    "fieldName": "Data Access Place",
                    "oldValue": "",
                    "newValue": "Somewhere"
                }
            ]
        }
    }
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature: API FY25 Sprint 8 FY25 Sprint 8 (2024-10-09 - 2024-10-23) GREI Re-arch Issues related to the GREI Dataverse rearchitecture Original size: 30 Size: 30 A percentage of a sprint. 21 hours. (formerly size:33) SPA.Q4.10 Resolve TODOs and tech debt SPA These changes are required for the Dataverse SPA Type: Feature a feature request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants