-
Notifications
You must be signed in to change notification settings - Fork 27
Finding subclonal variants
The Seurat/Signac packages provide compatible interactive workflows for mtDNA variant analysis with mtscATAC-seq. Specifically, we recommend these functions:
-
ReadMGATK imports files from the
mgatk
execution and stores them in the Seurat object. - IdentifyVariants utilizes the strand concordance and VMR statistics over an mtscATAC-seq library to identify high-quality subclonal variants.
- FindClonotypes takes the high-confidence variants from the preceding function to then infer clones via a cell-cell neighbor graph construction in heteroplasmy space.
- AlleleFreq then enables computing allele frequencies per cell/variant
As of version 0.6.0, we've implemented automated sub-clonal variant calling into the standard execution for the mgatk tenx
mode, which should be the go-to for mtscATAC-seq libraries (noting that this mode of variant calling isn't applicable for scRNA-derived libraries; see note below).
A plot of stand correlation and variance-mean ratio is the most informative to identify informative mtDNA variants and is reported in the “.vmr_strand_plot.png” plot as part of the default output. Specifically, the x-axis represents the Pearson correlation between a variant's forward and reverse strand read counts across cells. This metric effectively separates low quality variants from high quality ones based on the overall concordance of heteroplasmy between strands. Overall, we expect to identify a pattern of substitutions where some variants are more common than others (specifically transitions rather than transversions). This variant signature plot can be generated rapidly from the “.variant_stats.tsv.gz” and "refAllele.txt" files returned in the mgatk output.
An example of this workflow is provided at the vignette here: CRC tumor vignette. The vignette contains several sections specifically related to the dataset at hand, but skipping to the Find mtDNA variants
section will get you going from base mgatk execution -> high quality variants most quickly.
The core function for performing variant calling is available as a source
-able Rscript. You can quickly stream this file like so:
wget https://gist.githubusercontent.com/caleblareau/baee9629b9bf4c8ada1a833174ddef3e/raw/7e280c170128404789e0e62a1e1ef0dce1bdb09b/variant_calling.R
Important: this approach doesn't really work with droplet scRNA-seq since only one strand is being sequenced. Thus, the whole philosophy of strand concordance dissolves!! I don't know of the best way to de novo find subclonal variants in droplet-based single-cell RNA-seq, and generally would advise against even trying... it's cost me more hours of pain than I care to admit!
Please raise an issue here