This is the repository for the group project of Team Bloodies.
Project: Data-driven analysis of the potential candidate transcription factors in hematopoietic stem cell differentiation into multiple progenitor compartments.
Links to:
Proposal
Progress Report
Poster
Members and division of labor
Name | Initial work assignment | Affiliation | Expertise |
---|---|---|---|
Annie Cavalla | TF motif enrichment analysis | Bioinformatics | Cancer genomics |
Rawnak Hoque | RNA-seq analysis and TF motif enrichment analysis | Genome Science and Technology | Genome scale data analysis |
Fangwu Wang | DNA methylation analysis, TF clustering | Medical Genetics | Stem cell biology |
Somdeb Paul | DNA methylation analysis | Genome Science and Technology | Transcriptomics |
Rationale: Human hematopoietic stem cells (HSCs) hold great clinical promises for curative HSC transplantation therapies for numerous hematologic malignancies and diseases. Understanding the mechanisms regulating the self-renewal and lineage restriction of HSCs is crucial for improving transplantation regimens. HSC is thought to acquire multi-step lineage restriction through going down multiple progenitor populations, during which process the myeloid vs.lymphoid binary decision is made with subsequent progeny restricted to either fate. In this project, we are interested in the epigenomic status of HSCs and other progenitor populations and how it interacts with transcription factor binding to regulate lineage differentiation program.
Our Dataset includes matched DNA methylation (bisulfite-seq) and RNA-seq data from HSCs and 5 other progenitor cell types, obtained from a recent publication (Farlik M. et al, Cell, 2016) which characterized the differentiation path of HSCs based on cell DNA methylation profiles.
Different strategy from the published paper: To more rigorously identify TFs with a potential function in cell differentiation, we annotated DNA methylation using both promoters and customized enhancers. The enhancer regions were defined from two hematopoietic cell lines (K562, GM12878) from the Genome Segment ChromHMM tracks (UCSC table browser).
Data replicate summary:
Cell Type | Replicates for Methylation | Replicates for RNA |
---|---|---|
HSC | 3 | 1 |
MPP | 3 | 2 |
MLP | 3 | 2 |
CMP | 3 | 1 |
GMP | 3 | 2 |
CLP | 3 | 1 |
Workflow: We first analyzed differential DNA methylation of 5 pairwise comparisons in the annotated promoter and enhancer regions using RnBeads. The biological meaning of the 5 pairwise comparisons:
Comparison | Biological Meaning |
---|---|
HSC-MPP | loss of long-term regeneation potential |
MPP-CMP | multipotent to myeloid commitment |
MPP-MLP | multipotent to lymphoid commitment |
CMP-MLP | difference between myeloid and lymphoid on the CMP-MLP level |
GMP-CLP | difference between myeloid and lymphoid on the GMP-MLP level |
We then used low methylated regions of each cell type from each comparison (defined by the > 40% difference from pairwise comparison) to find enriched transcription factor binding motifs using HOMER findingmotif tools, and generated a list of our data-driven candidate TFs for each population from each comparison.
We analyzed the overlapped genes of DNA methylation and RNA expression to see if there is any correlation between low methylation and high expression of genes. We inspected the expression of TFs identified from motif enrichment to see if they are highly expressed in the corresponding population. Then we used the expression of TFs identified from CMP/MLP comparison (representing the myeloid and lymphoid lineages) to cluster the leukemia samples to see whether the samples from the same leukemia type group together.
RnBeads analysis of pairwise comparison:
a. Beta-value distribution and variation
b. PCA
c. Clustering
d. Differential methylated regions
e. Correlation with RNA expression
Methods:
f. Data preparation: replicate merging
g. Enhancer annotation-code
h. RnBeads: all samples and pairwise comparison (CLP-GMP as an example)
i. intersection between DNA/RNA gene lists-code
a. Sanity check:sample-sample correlation, heatmap clustering
b. Differential expression gene lists
Methods:
c. Data processing and gene id conversion
a. Results
Differential gene table
Methods:
b. limma
a. TFs found at Enhancer
b. TFs found at Promoter
Methods
c. Input files
d. HOMER Findingmotif tool
a. Normal samples CMP/MLP
b. Leukemia samples AML/CLL
Methods
c. TF list feeding into expression