-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Analyzing RNASeq data without replicates #6
Comments
Barging in here ... What do you mean by "pairwise comparison"? (Pairs of what? Genes? Samples? Datasets? What kind of comparison?). It's might help to put your question in terms of a goal like "find which genes are most variable" or "cluster samples". Genes with no counts at all in any sample would not be much fun to analyze. Genes that have some samples with zero counts are a different story. In any case, wouldn't you want to do some exploratory data analysis first to determine how/if to filter? Looking at the CSV file is only a good first step. Can you think of some other ways to characterize the data to help you assess your question? |
@rawnakhoque Do the 13 samples correspond to 13 cell-types? or are there multiple samples per cell-type? Can you upload the paper associated with this dataset? I am interested in knowing what this dataset was used for. The following paper used LIMMA to compare each cell-type with all other cell-types (http://www.bloodjournal.org/content/115/26/5376) but they had multiple replicates per cell-type. |
Hi Paul, |
@singha53 The 13 samples contain 7 cell types i.e. there are multiple samples for some cell type. I would like to compare between HSC vs MPP, MPP vs CLP, MPP vs CMP, CLP vs MLP, CMP vs GMP, CMP vs MEP (directly related populations). The paper associate with our analysis is here- https://github.com/STAT540-UBC/team_Bloodies/blob/master/Background%20papers/2016%20DNA%20Methylation%20Dynamics%20of%20Human%20Hematopoietic%20Stem%20Cell%20Differentiation.pdf |
When you say "multiple samples for some cell type" isn't that replication? Glancing at the GEO record it's not exactly clear what is truly replicates, but naively only CLP and CMP seem to not have replicates at all. You should be able to proceed - I mean, it will generate results, but it's far from ideal. You'll probably get crummy p-values but you'll generate a ranking - maybe it will turn out useable and better than just "fold change". About transcripts vs genes: If you're asking if DESeq2 requires using genes not transcripts, the answer is no, it doesn't (it's just numbers...). Whether you should collapse transcripts to genes depends on what you are trying to do. But if you want to combine them, you'd just sum the counts. The logic being that this is the total number of reads associated with the gene. |
@rawnakhoque The paper groups cell-types based on their progenitor status, ie. myeloid progenitors (CMP, GMP) vs. lymphoid progenitors (CLP, MLP0, MLP1, MLP2, MLP3) using DEseq2, see Figure 6. The following post addresses how to use DEseq2 with no biological replicates: |
@ppavlidis @singha53 |
You can explore the data without replicates, but you can't really make a proper statistical inference about the data. See the vignette for DESeq2 p.57, under 5.8 Thank you, Paul and Amrit, for following up on the question. |
@singha53 @santina
Hi,
Here https://github.com/STAT540-UBC/team_Bloodies/tree/master/Data/RNA-seq/Normal/
is a GSE87195_rnaseq_ensT_all.csv file for RNASeq count data from ~60000 transcripts of 13 samples. Since I do not have replicates, I would like to perform only pairwise comparison. Do I need to perform any statistical analysis before comparison? Could you please mention some tools/statistical approach I can do at this point? I see many of the cells contain zero value. Should I get rid of the zero? Thanks.
The text was updated successfully, but these errors were encountered: