-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What do you mean by coverage? #10
Comments
It might help to refer specifically to a citation/quote. But I think I know what is implied. For methylation microarrays, coverage would (probably) refer to how many CpGs are assayed. There are something like 20 million CpGs in the human genome. 450,000 of those are "covered" by the common Illumina platform (more for the new version). As a fraction it would be between 0 and 1. For sequencing it means something analogous, but there's "depth" (number of reads per base) as well as "breadth" (in this context, how many CpGs are "detected" with sufficient data to make a call), whereas for the microarrays we're only talking about breadth. I hope that helps ... |
Thanks for the information! It clears my confusion to some extent. For your reference, this is one of the papers which I was looking into. In the section 2.3 they mention Gene Coverage, and in 2.4 CpG Island coverage. So, I assume that it would mean - number of reads (methylation signal) that the analysis outputs, in an entire gene in the first scenario, and in the regions of CG (or CpG) in the second scenario. |
They're talking about a microarray, so there's no reads. There are probes, and the signal is from fluorescence of the labeled DNA that is hybridized to the array. (If you are confused about the difference between microarrays and sequencing you should ask someone to go over this with you.) They do a sequencing-based assay for "validation" but I don't think they use the term coverage there - they're just comparing their beta values to show how great the microarray is (treating sequencing as the gold standard). It can also be confusing because Illumina makes both microarrays and sequencers. Anyway the definition of coverage is what I used for microarrays, not for sequencing. For a genome feature (here meaning a contiguous span of nucleotides), they mean "the platform has at least one probe for a CpG that lies within that feature". Thus for gene coverage, the feature is a gene - that needs to be defined too, because it's not like genes have little green lights to show you where they "start" and "end". They seem to mean some coordinates taken from Refseq, but I don't see where they say clear. That's a detail but obviously if they change the definition of "gene" the "coverage" would change. For CpG islands (regions that are relatively rich in CpG islands without clearly defined borders, often but not necessarily near the 5' end of genes), same idea. |
@singha53
I am doing DNA methylation analysis on my dataset. A lot of places have mentioned coverage. But I am not quite able to grasp the concept of what coverage is in a DNA Methylation data. I tried looking up literature but I am not able to get a clear meaning for the same.
Thanks!
The text was updated successfully, but these errors were encountered: