Skip to content

GenotypingProtocols

Nicolas Morales edited this page Dec 14, 2020 · 2 revisions

Publication: Morales N, Bauchet GJ, Tantikanjana T, Powell AF, Ellerbrock BJ, Tecle IY, et al. (2020) High density genotype storage for plant breeding in the Chado schema of Breedbase. PLoS ONE 15(11): e0240059. https://doi.org/10.1371/journal.pone.0240059

Breedbase can store high density genotyping data; to begin describing this it is first important to start with the genotyping data project. For clarity, the genotyping plate described above is used for managing the 96 or 384 well plates which are then sent to the genotyping vendor, and once the genotyping data is generated and returned, the genotyping data can be stored in Breedbase. The genotyping data project is an entry stored in the project table and serves the purpose of grouping genotyping data into an easily queried structure. The genotyping data project is defined simply with a unique name, a description, a year, a genotyping facility, a location, and a breeding program; the year, genotyping facility, and location are stored as entries in the projectprop table in an EAV model using type name from the ‘project_property’ controlled vocabulary. The genotyping data project is linked to its breeding program via the project_relationship table using the type name ‘breeding_program_trial_relationship’ from the ‘project_relationship’ controlled vocabulary.

A genotyping protocol must be defined before the genotyping data can be stored in Breedbase. To define a genotyping protocol the researcher provides a unique name, a description, the reference genome name, the species of the samples, and a location where the data was generated. The genotyping protocol encompasses information about all of the markers which were genotyped on a set of samples, including their name, base pair position, chromosome number, reference allele, alternate alleles, quality score, filter information, additional information, and their scoring format; these information fields are identical to those found in the VCF specification, on which the Breedbase genotyping storage is modeled. The genotyping protocol is filled directly from the uploaded VCF or Intertek genotyping data results file; thereby, the uploader only needs to provide the unique protocol name, description, reference genome name, species of the sample, a location, and the VCF or Intertek formatted genotyping data file.

The genotyping protocol is stored in the nd_protocol table with a single entry in the nd_protocolprop table via an EAV model. The entry in the nd_protocolprop table uses the type name ‘vcf_map_details’ from the ‘protocol_property’ controlled vocabulary; this entry has a JSON encoded string containing all information about the genotyping protocol, including all information about the markers in the genotyping protocol. In the next section, this JSON encoded string is described in detail.

Once the high density genotyping data is returned from the genotyping vendor in either a VCF format or the custom Intertek format, it can be loaded into Breedbase. The researcher needs only specify a genotyping data project and a genotyping protocol name for which to store the genotyping data results to in Breedbase, as described above. How the high density genotyping data is then stored in the database is described in the following section.

Clone this wiki locally