Not an issue, but I am confused ... #6

DRL · 2017-08-01T16:30:22Z

Hi AndreasHeger,

Problem:

I want to calculate whether certain annotation features (genes, repeats, etc) are enriched/depleted in a particular subset of contigs in an assembly

--workspace: BED file of all regions in genome (excluding regions composed of N's)
--segments: BED file of annotations in subset of contigs

contig_1001    21      792     RepeatMasker
contig_1001    27      34      dust
contig_1001    93      159     dust
contig_1001    246     255     dust
contig_1001    266     339     dust
contig_1001    415     422     dust

--annotation: BED file of annotations across the whole genome (same as above but for whole genome)

The output I get when running:

gat-run.py --ignore-segment-tracks --segments=segments.bed --annotations=annotations.bed --workspace=workspace.bed --num-samples=100 --log=gat.log --num-threads=8 > gat.out

is

track   annotation        observed  expected      CI95low       CI95high      stddev     fold    l2fold  pvalue      qvalue      track_nsegments  track_size  track_density  annotation_nsegments  annotation_size  annotation_density  overlap_nsegments  overlap_size  overlap_density  percent_overlap_nsegments_track  percent_overlap_size_track  percent_overlap_nsegments_annotation  percent_overlap_size_annotation
merged  ncrnas_predicted  2913      1709.1200     1300.0000     1994.0000     209.0009   1.7040  0.7689  1.0000e-02  1.0000e-02  62983            6935174     6.6911e+00     1025                  163283           1.5754e-01          30                 2913          2.8105e-03       0.0476                           0.0420                      2.9268                                1.7840
merged  gene              389744    170648.2000   163172.0000   177856.0000   5359.9760  2.2839  1.1915  1.0000e-02  1.0000e-02  62983            6935174     6.6911e+00     18574                 37934616         3.6599e+01          278                389744        3.7603e-01       0.4414                           5.6198                      1.4967                                1.0274
merged  tandem            368130    158513.4400   154952.0000   162625.0000   2399.6840  2.3224  1.2156  1.0000e-02  1.0000e-02  62983            6935174     6.6911e+00     47134                 4562430          4.4018e+00          4994               368130        3.5517e-01       7.9291                           5.3082                      10.5953                               8.0687
merged  RepeatMasker      1492404   610641.4800   602042.0000   620429.0000   6353.3404  2.4440  1.2892  1.0000e-02  1.0000e-02  62983            6935174     6.6911e+00     117147                21502336         2.0745e+01          8705               1492404       1.4399e+00       13.8212                          21.5193                     7.4308                                6.9407
merged  dust              3200967   1182955.4000  1172992.0000  1190872.0000  4343.2429  2.7059  1.4361  1.0000e-02  1.0000e-02  62983            6935174     6.6911e+00     382880                14706492         1.4189e+01          63463              3200967       3.0883e+00       100.7621                         46.1555                     16.5752                               21.7657

I am confused:

shouldn't percent_overlap_size_track and co be 100% for all?

Thank you in advance.

cheers,

dom

The text was updated successfully, but these errors were encountered:

AndreasHeger · 2017-08-02T22:18:41Z

Good question. From memory, I think percent_overlap_size_track is the proportion of nucleotides in 'segments' that overlap annotations within the workspace.

It might well be a bug, are your segments non-overlapping?

There is also the --ignore-segment-tracks option, which merges all the segments. The 46% might mean that 46% of the nucleotides are in DUST segments, though I then would assume the total to be 100%. Need to go through the code to remember what happened.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Not an issue, but I am confused ... #6

Not an issue, but I am confused ... #6

DRL commented Aug 1, 2017

AndreasHeger commented Aug 2, 2017

Not an issue, but I am confused ... #6

Not an issue, but I am confused ... #6

Comments

DRL commented Aug 1, 2017

AndreasHeger commented Aug 2, 2017