Comparing cell groupings between experiments across species

In Single-cell Expression Atlas we're interesting in relating cell groupings (clusters, cell types) between experiments and across species boundaries, which we can do via the 'marker' genes of each group. Since each set of marker genes is a product of the context in which it was derived (sorted cell population, sub-tissue, tissue, whole organism), that context must be matched for comparison of marker gene sets (via ortholog relationships) to be valid. With that in mind this workflow will:

Take anndata objects from SCXA analysis for two experiments containing comparable 'organism' parts, even if they're not labelled at the same granularity.
Match the organism parts beween experiments, using the Uberon ontology to re-label where the granularity of organism part annotation is not consistent.
Subset the experiments to only the common organism parts.
Re-filter, re-normalise, and re-derive marker genes using Scanpy.
Derive mapped pairs of cell groupings between the two experiments.

Snakemake workflow

The Snakemake workflow in this repository performs the above steps given the following inputs:

Two annData files with .project.h5ad extensions, stored in an 'inputs' directory
The two species names
A .obo ontology file in the inputs directory.
An ortholog mapping file in the inputs directory.

Ortholog mappings can be deried from BiomaRt and should look like:

homo_sapiens_gene_id	homo_sapiens_gene_name	mus_musculus_gene_id	mus_musculus_gene_name
ENSG00000198888	MT-ND1	ENSMUSG00000064341	mt-Nd1
ENSG00000198763	MT-ND2	ENSMUSG00000064345	mt-Nd2
ENSG00000198804	MT-CO1	ENSMUSG00000064351	mt-Co1
ENSG00000198712	MT-CO2	ENSMUSG00000064354	mt-Co2
ENSG00000198899	MT-ATP6	ENSMUSG00000064357	mt-Atp6
ENSG00000198938	MT-CO3	ENSMUSG00000064358	mt-Co3
ENSG00000198840	MT-ND3	ENSMUSG00000064360	mt-Nd3
ENSG00000212907	MT-ND4L	ENSMUSG00000065947	mt-Nd4l
ENSG00000198886	MT-ND4	ENSMUSG00000064363	mt-Nd4

So the file should be tab-delimited and have 'gene_id' fields prefixed by the two input species names.

The Snakemake config file can then be constructed like the example, and the pipeline run like:

Snakemake --use-conda --cores 2

Name		Name	Last commit message	Last commit date
Latest commit History 83 Commits
bin		bin
envs		envs
example_outputs		example_outputs
marker_context		marker_context
marker_selection_params		marker_selection_params
samap-workflow @ f959ad0		samap-workflow @ f959ad0
scripts		scripts
.gitmodules		.gitmodules
README.md		README.md
Snakefile		Snakefile
config.yaml		config.yaml
dag.pdf		dag.pdf
dag.png		dag.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Comparing cell groupings between experiments across species

Snakemake workflow

About

Releases

Packages

Languages

YY-SONG0718/cross-species-cellgroup-comparison

Folders and files

Latest commit

History

Repository files navigation

Comparing cell groupings between experiments across species

Snakemake workflow

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages