Skip to content
Lukas Mueller edited this page Oct 4, 2021 · 8 revisions

Datasets

Introduction to Datasets

A special data type, called "Datasets", is available to create fine-grained definitions of data to be used in an analysis tool such as solGS, Heritability tool or Stability tool.

User interfaces for Datasets

Datasets are usually generated in the Wizard. In the Wizard, up to 4 dimensions can be selected, and among these dimensions, individual items can be selected, to specify, for example, accessions that have been grown in certain locations and seasons. The Dataset specifies the intersection of all the dimensions and selected items.

Database implementation

User defined datasets are stored in the sgn_people.sp_dataset table. The dataset info is stored as a jsonb string.

Perl Classes

The major Perl class to deal with Datasets is CXGN::Dataset.

CXGN::Dataset contains accessors to define the dimensions and the items in each dimension.

The dataset can retrieve any other dimension that corresponds to the selected criteria using the retrieve_ functions. For example, retrieve_years will retrieve all the years that are in the dataset. This will work if years have previously been defined as a dimensions with specific items (years) in it, but it will also work if it has not been been selected as a dimension. In that case, the dimension will be calculated to match all the selected dimensions. For example, if a list of locations has been selected and a list of accessions, the retrieve_years call will retrieve only years in which the given accessions have been on fields in the given locations.

Clone this wiki locally