Analyzing RNASeq data without replicates #6

rawnakhoque · 2017-02-20T21:43:15Z

@singha53 @santina
Hi,
Here https://github.com/STAT540-UBC/team_Bloodies/tree/master/Data/RNA-seq/Normal/
is a GSE87195_rnaseq_ensT_all.csv file for RNASeq count data from ~60000 transcripts of 13 samples. Since I do not have replicates, I would like to perform only pairwise comparison. Do I need to perform any statistical analysis before comparison? Could you please mention some tools/statistical approach I can do at this point? I see many of the cells contain zero value. Should I get rid of the zero? Thanks.

ppavlidis · 2017-02-20T22:08:28Z

Barging in here ...

What do you mean by "pairwise comparison"? (Pairs of what? Genes? Samples? Datasets? What kind of comparison?). It's might help to put your question in terms of a goal like "find which genes are most variable" or "cluster samples".

Genes with no counts at all in any sample would not be much fun to analyze. Genes that have some samples with zero counts are a different story. In any case, wouldn't you want to do some exploratory data analysis first to determine how/if to filter? Looking at the CSV file is only a good first step. Can you think of some other ways to characterize the data to help you assess your question?

singha53-zz · 2017-02-20T22:54:44Z

@rawnakhoque Do the 13 samples correspond to 13 cell-types? or are there multiple samples per cell-type? Can you upload the paper associated with this dataset? I am interested in knowing what this dataset was used for. The following paper used LIMMA to compare each cell-type with all other cell-types (http://www.bloodjournal.org/content/115/26/5376) but they had multiple replicates per cell-type.

rawnakhoque · 2017-02-20T23:14:42Z

Hi Paul,
Sorry for the unclear statement. I would like to perform pairwise comparison of differential gene expression between each differentiated progenitors (the columns in the csv file) in human "hematopoietic stem cell differentiation" process. The file has read counts at transcript level. I looked at the manual for DESeq2 and they mentioned at 1.3.1 paragraph that the package needs raw counts of gene i in sample j. I was wondering how could I convert the reads from transcript level to gene level. Or do I need to convert the transcripts to genes at all.

rawnakhoque · 2017-02-20T23:34:52Z

@singha53 The 13 samples contain 7 cell types i.e. there are multiple samples for some cell type. I would like to compare between HSC vs MPP, MPP vs CLP, MPP vs CMP, CLP vs MLP, CMP vs GMP, CMP vs MEP (directly related populations). The paper associate with our analysis is here- https://github.com/STAT540-UBC/team_Bloodies/blob/master/Background%20papers/2016%20DNA%20Methylation%20Dynamics%20of%20Human%20Hematopoietic%20Stem%20Cell%20Differentiation.pdf
I am not sure if I can use LIMMA since I don't have replicates. In the edgeR user guide at paragraph 2.11. they have some statements on data without replicates but I am not quite clear about the procedure.

ppavlidis · 2017-02-21T01:09:04Z

When you say "multiple samples for some cell type" isn't that replication? Glancing at the GEO record it's not exactly clear what is truly replicates, but naively only CLP and CMP seem to not have replicates at all. You should be able to proceed - I mean, it will generate results, but it's far from ideal. You'll probably get crummy p-values but you'll generate a ranking - maybe it will turn out useable and better than just "fold change".

About transcripts vs genes: If you're asking if DESeq2 requires using genes not transcripts, the answer is no, it doesn't (it's just numbers...). Whether you should collapse transcripts to genes depends on what you are trying to do. But if you want to combine them, you'd just sum the counts. The logic being that this is the total number of reads associated with the gene.

singha53-zz · 2017-02-21T04:15:36Z

@rawnakhoque The paper groups cell-types based on their progenitor status, ie. myeloid progenitors (CMP, GMP) vs. lymphoid progenitors (CLP, MLP0, MLP1, MLP2, MLP3) using DEseq2, see Figure 6. The following post addresses how to use DEseq2 with no biological replicates:
http://seqanswers.com/forums/showthread.php?t=31036 --> although it states this should only be used for exploratory purposes:
see post by Michael Love:
"Working without replicates
DESeq allows analysis of experiments with no biological replicates in one or even both of the conditions. While one may not want to draw strong conclusions from such an analysis, it may still be useful for exploration and hypothesis generation. If replicates are available only for one of the conditions, one might choose to assume that the variance-mean dependence estimated from the data for that condition holds as well for the unreplicated one. If neither condition has replicates, one can still perform an analysis based on the assumption that for most genes, there is no true differential expression, and that a valid mean-variance relationship can be estimated from treating the two samples as if they were replicates. A minority of differentially abundant genes will act as outliers; however, they will not have a severe impact on the gamma-family GLM fit, as the gamma distribution for low values of the shape parameter has a heavy right-hand tail. Some overestimation of the variance may be expected, which will make that approach conservative."

rawnakhoque · 2017-02-21T19:28:37Z

@ppavlidis @singha53
Thanks for your help!

santina · 2017-02-26T07:19:32Z

You can explore the data without replicates, but you can't really make a proper statistical inference about the data. See the vignette for DESeq2 p.57, under 5.8

Thank you, Paul and Amrit, for following up on the question.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Analyzing RNASeq data without replicates #6

Analyzing RNASeq data without replicates #6

rawnakhoque commented Feb 20, 2017 •

edited

Loading

ppavlidis commented Feb 20, 2017

singha53-zz commented Feb 20, 2017

rawnakhoque commented Feb 20, 2017

rawnakhoque commented Feb 20, 2017

ppavlidis commented Feb 21, 2017

singha53-zz commented Feb 21, 2017

rawnakhoque commented Feb 21, 2017

santina commented Feb 26, 2017

Analyzing RNASeq data without replicates #6

Analyzing RNASeq data without replicates #6

Comments

rawnakhoque commented Feb 20, 2017 • edited Loading

ppavlidis commented Feb 20, 2017

singha53-zz commented Feb 20, 2017

rawnakhoque commented Feb 20, 2017

rawnakhoque commented Feb 20, 2017

ppavlidis commented Feb 21, 2017

singha53-zz commented Feb 21, 2017

rawnakhoque commented Feb 21, 2017

santina commented Feb 26, 2017

rawnakhoque commented Feb 20, 2017 •

edited

Loading