To ensure the traceability, reproducibility and standardization for all ML datasets and models generated and consumed within Toyota Research Institute (TRI), we developed the Dataset-Governance-Policy (DGP) that codifies the schema and maintenance of all TRI's Autonomous Vehicle (AV) datasets.
- Schema: Protobuf-based schemas for raw data, annotations and dataset management.
- DataLoaders: Universal PyTorch DatasetClass to load all DGP-compliant datasets.
- CLI: Main CLI for handling DGP datasets and the entrypoint of visulization tools.
Please see Getting Started for environment setup.
Getting started is as simple as initializing a dataset-class with the relevant dataset JSON, raw data sensor names, annotation types, and split information. Below, we show a few examples of initializing a Pytorch dataset for multi-modal learning from 2D bounding boxes, and 3D bounding boxes.
from dgp.datasets import SynchronizedSceneDataset
# Load synchronized pairs of camera and lidar frames, with 2d and 3d
# bounding box annotations.
dataset = SynchronizedSceneDataset('<dataset_name>_v0.0.json',
datum_names=('camera_01', 'lidar'),
requested_annotations=('bounding_box_2d', 'bounding_box_3d'),
split='train')
A list of starter scripts are provided in the examples directory.
- examples/load_dataset.py: Simple example script to load a multi-modal dataset based on the Getting Started section above.
You can build the base docker image and run the tests within docker container via:
make docker-build
make docker-run-tests
We appreciate all contributions to DGP! To learn more about making a contribution to DGP, please see Contribution Guidelines.
Job | CI | Notes |
---|---|---|
docker-build | Docker build and push to container registry | |
pre-merge | Pre-merge testing | |
doc-gen | GitHub Pages doc generation | |
coverage | Code coverage metrics and badge generation |
Type | Platforms |
---|---|
🚨 Bug Reports | GitHub Issue Tracker |
🎁 Feature Requests | GitHub Issue Tracker |
DGP is developed and currently maintained by Quincy Chen, Arjun Bhargava, Chao Fang, Chris Ochoa and Kuan-Hui Lee from ML-Engineering team at Toyota Research Institute (TRI), with contributions coming from ML-Research team at TRI, Woven Planet and Parallel Domain.