-
Notifications
You must be signed in to change notification settings - Fork 27
Outputs
Depending on how exactly you configure your CLI execution, one should expect to see these files in the output/final/
folder:
*.A.txt.gz
*.C.txt.gz
*.G.txt.gz
*.T.txt.gz
*.coverage.txt.gz
*.depthTable.txt
**_refAllele.txt
*.rds
*.signac.rds
*.variant_stats.tsv.gz
*.cell_heteroplasmic_df.tsv.gz
*.vmr_strand_plot.png
In order, the *{A,C,G,T}.txt.gz
files will be formatted as sparse matrices, indicating the position, cell, and then forward / reverse strand count abundances of that letter for that cell / position. These files enumerate all of the sequenced alleles for all cells in the mitochondrial DNA and are the minimal units to be utilized from mgatk
. After mitochondrial genotypes for each cell is determined, mgatk calls variants and computes some useful statistics for each variant, which are organized in *.variant_stats.tsv.gz
. For variants confidently detected in at least three cells, heteroplasmic ratio is computed for all cells passing minimum mean per mitochondrial base coverage threshold (default 10) and organized in *.cell_heteroplasmic_df.tsv.gz
, where rows are cells, columns are variants, and entries are heteroplasmic ratios. Also the strand correlation of these variants are plotted against their variance mean ratio in *.vmr_strand_plot.png
, with recommended thresholds presented as dashed lines. These thresholds work well for most datasets and we recommend considering only variants that pass these thresholds for downstream analysis.
For convenience, the tool also emits a mean per cell depth in the *.depthTable.txt
file. The is computed as the (total bases accounted for) / (length of mtDNA contig). Additionally, the *.coverage.txt.gz
provides a sparse matrix representation of the per-cell, per-position coverage.
To orient these abundances in the context of potential mutations, the **_refAllele.txt
file shows the reference alleles for the contig used in alignment/processing. This file will be independent of your source data and purely a function of the chosen reference.
Finally, two .rds
files are automatically emitted that synthesize these files. The *.signac.rds
file contains an S3 object that can be rapidly integrated in the Signac R package (see vignettes here: https://satijalab.org/signac/). The other *.rds
file is a RangedSummarizedExperiment
that similarly summarizes all data in a slightly different S4 file object. Either of these can be rapidly integrated into existing scATAC-seq workflows, depending on your analysis method of choice.
Please raise an issue here