Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API development: read sensitive data and write privacy preserving metadata for OpenDP privacy tools #7158

Closed
tercer opened this issue Aug 3, 2020 · 5 comments
Assignees
Labels
Feature: API Type: Feature a feature request User Role: API User Makes use of APIs

Comments

@tercer
Copy link

tercer commented Aug 3, 2020

New APIs are needed for using OpenDP to provide differentially private releases of metadata to Dataverse. First we need to define and finalize the APIs needed for the immediate minimum viable product under development for October. This issue is the placeholder to further define this need.

Currently anticipate OpenDP tool with read private metadata and dataset, and later write privacy-preserving (public) metadata in three formats (full form in JSON, reduced form in DDI schema in XML, and PDF/A file presenting a human readable summary report).

@raprasad
Copy link
Contributor

raprasad commented Aug 4, 2020

Regarding writing privacy-preserving metadata, this doc describes several considerations: https://docs.google.com/document/d/1q0Upg3Pa5lP1Gq5TxEzm9BYtA5G8PaqAx7o4WfKPo4U/edit

@raprasad
Copy link
Contributor

raprasad commented Aug 13, 2020

DP = Differentially Private

DP-related APIs for Dataverse

(1) API: Deposit DP Release Data

  • Allow the depositing of DP Release data
  • This data is specific to a Dataverse file
  • There are three data formats to deposit for each release:
    • DP JSON format (full form)
    • DP PDF Report (human readable summary)
    • DP DDI (XML)

(1a) Origination/Provenance

  • The DP Deposit endpoint will require origination data to distinguish DP Releases created by the OpenDP from DP Data created by other sources.
  • The origination data should be JSON format in order to include rich metadata used for tracking the release.
    • The data will include: how the release was created (DP Tool), version of the DP Tool, release id (created by the DP Tool), version of the release metadata, execution engine information, timestamp, etc.
  • The Dataverse display of OpenDP Tool created releases should be distinct from DP Releases created by other methods

(1b) Future consideration: Multiple Releases

In the next phase, multiple releases may be created for each Dataverse file. There should be provision to store data/files for each release. Example:

  • DP Release 1
    • DP statistic: mean of age
    • DP release data generated
      • DP JSON format
      • DP PDF Report
      • DP DDI
  • DP Release 2
    • DP Statistic: OLS regression of age/sex
    • DP release data generated
      • DP JSON format
      • DP PDF Report
      • DP DDI

(1c) Future consideration: Building on Previous DDIs

Discuss whether DP DDIs should be “cumulative”, building on previous DP DDIs. In the (simplistic) example above:

  • Release 1, DP DDI
    • mean of age
  • Release 2, DP DDI
    • OLS regression of age/sex

An example of a “cumulative” Release 2, DP DDI would be:

  • Release 2, DP DDI
    • OLS regression of age/sex
    • mean of age from Release 1

(2) API: Retrieve DP Release Data

  • API to retrieve deposited DP Data
    • Allow retrieval of each type of DP release data type separately, e.g.:
      • DP JSON
      • DP PDF Report
      • DP DDI
    • Allow retrieval of all 3 data types in one API call?
  • Design API for “Future considerations” described in (1) API: Deposit DP Release Data

(3) API: Retrieve User Information

  • Via a user’s apiToken, retrieve the user’s data
  • Output is the same as the existing “List Single User” API endpoint

(4) API: Messaging

  • The creation of a DP Release may take seconds, minutes, or hours.
  • Via API, notify a Dataverse user via the existing Dataverse messaging system, including email, that a release:
    • (a) Has been deposited or
    • (b) Failed to be created
  • Possible alternatives.
    • Dataverse user notification is a function of successfully depositing DP release data in (1) API: Deposit DP Release Data
    • DP Tool has its own email system.
      • Needs discussion, e.g. contacting Dataverse users, etc

@scolapasta
Copy link
Contributor

I just added #7275 as a more generic solution for this API - I wanted to leave this issue as being focused on Open DP. Once we have #7275, we can see what else is needed here.

@pdurbin pdurbin added Type: Feature a feature request User Role: API User Makes use of APIs labels Oct 9, 2023
@pdurbin
Copy link
Member

pdurbin commented Nov 11, 2023

Closing in favor of this issue:

@pdurbin pdurbin closed this as not planned Won't fix, can't repro, duplicate, stale Nov 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature: API Type: Feature a feature request User Role: API User Makes use of APIs
Projects
None yet
Development

No branches or pull requests

4 participants