Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What do you mean by coverage? #10

Open
psomdeb25 opened this issue Mar 15, 2017 · 3 comments
Open

What do you mean by coverage? #10

psomdeb25 opened this issue Mar 15, 2017 · 3 comments

Comments

@psomdeb25
Copy link
Collaborator

@singha53

I am doing DNA methylation analysis on my dataset. A lot of places have mentioned coverage. But I am not quite able to grasp the concept of what coverage is in a DNA Methylation data. I tried looking up literature but I am not able to get a clear meaning for the same.

Thanks!

@ppavlidis
Copy link

It might help to refer specifically to a citation/quote. But I think I know what is implied.

For methylation microarrays, coverage would (probably) refer to how many CpGs are assayed. There are something like 20 million CpGs in the human genome. 450,000 of those are "covered" by the common Illumina platform (more for the new version). As a fraction it would be between 0 and 1.

For sequencing it means something analogous, but there's "depth" (number of reads per base) as well as "breadth" (in this context, how many CpGs are "detected" with sufficient data to make a call), whereas for the microarrays we're only talking about breadth.

I hope that helps ...

@psomdeb25
Copy link
Collaborator Author

Thanks for the information! It clears my confusion to some extent.

For your reference, this is one of the papers which I was looking into.
High_Density_DNA-Meth_Array.pdf

In the section 2.3 they mention Gene Coverage, and in 2.4 CpG Island coverage. So, I assume that it would mean - number of reads (methylation signal) that the analysis outputs, in an entire gene in the first scenario, and in the regions of CG (or CpG) in the second scenario.

@ppavlidis
Copy link

They're talking about a microarray, so there's no reads. There are probes, and the signal is from fluorescence of the labeled DNA that is hybridized to the array. (If you are confused about the difference between microarrays and sequencing you should ask someone to go over this with you.)

They do a sequencing-based assay for "validation" but I don't think they use the term coverage there - they're just comparing their beta values to show how great the microarray is (treating sequencing as the gold standard). It can also be confusing because Illumina makes both microarrays and sequencers.

Anyway the definition of coverage is what I used for microarrays, not for sequencing.

For a genome feature (here meaning a contiguous span of nucleotides), they mean "the platform has at least one probe for a CpG that lies within that feature".

Thus for gene coverage, the feature is a gene - that needs to be defined too, because it's not like genes have little green lights to show you where they "start" and "end". They seem to mean some coordinates taken from Refseq, but I don't see where they say clear. That's a detail but obviously if they change the definition of "gene" the "coverage" would change.

For CpG islands (regions that are relatively rich in CpG islands without clearly defined borders, often but not necessarily near the 5' end of genes), same idea.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants