Due to the extreme demographic history of Magnetic Island we found that
it was not possible to distinguish selective sweeps from demographic
effects in that location. Consequently our interpretation of SweepFinder
results is restricted to Northern reefs. We examined loci with
significant sweep scores in two ways. Firstly we looked at the entire
Northern population as a whole for which sweeps could be interpreted as
being due to adaptations required across all inshore sites. Secondly we
looked at the difference between Marine (Fitzroy Island, Pelorus Island)
and Plume (Pandora Reef, Dunk Island) sites. Sweeps designated as
Marine Only
or Plume Only
were identified using bedtools (see
11_marine_vs_plume.sh ). This
produced the following files;
nomi_10_sweeps.gff
containing contiguous regions with SweepFinder scores > 10 using pooled allele frequencies across all northern reefs.marine_only.gff
containing contiguous regions with SweepFinder scores > 10 that were in marine sites and not in plumeplume_only.gff
containing contiguous regions with SweepFinder scores > 10 that were in plume sites and not in marine
Using the best fitting dadi model (isolation_asym_mig
) as neutral
background a CLR threshold of 100 gives an FDR of approximately 10%. The
Manhattan plot below shows that these sites are distributed across the
genome.
Genes associated with these significant loci were identified using
bedtools
window.
This reports all overlaps between sweep loci (encoded as
nomi_10_sweeps.gff
and gene models). (see
07_genes_in_sweeps.sh for details).
A hand annotated version of this table is included as supplementary
information with the paper.
Gene Ontology annotations were obtained for these genes through GO terms assigned to conserved domains (via Interproscan) and the results were used to search for terms that might be enriched in the sweep set compared to background. GO term enrichment analysis was done using the R package topGO version 2.36.0 (Alexa, Rahnenführer, and Lengauer 2006) using genes associated with sweeps (scores > 100) as the target set and all other annotated genes as background. topGO uses a weighting scheme (we used the weight01 scheme) to downweight genes that are also attached to related terms in the GO graph. Significance testing was performed using Fisher’s exact test based on weighted gene counts. As outlined in the topGO manual) there is no clear way to apply a formal multiple-testing corrections for this p-value.
A single GO term, GO:0005509 calcium ion binding
was significantly
enriched among sweep genes. This term was associated with SOMPs as well
as EGF domain containing genes, both of which were abundant in the
target
set.
GO.ID | Term | Annotated | Significant | Expected | classic | ontology |
---|---|---|---|---|---|---|
GO:0005509 | calcium ion binding | 667 | 9 | 1.77 | 4.5e-05 | MF |
Genes annotated with the GO term GO:0005509 calcium ion binding
.
geneid | CLR | UniprotID | Protein Name |
---|---|---|---|
aten_0.1.m1.7647 | 436.0 | FBN2_HUMAN | Fibrillin-2 [Cleaved into: Fibrillin-2 C-terminal peptide] |
aten_0.1.m1.7648 | 412.5 | FBN1_MOUSE | Fibrillin-1 [Cleaved into: Asprosin] |
aten_0.1.m1.9801 | 188.4 | USOM5_ACRMI | Uncharacterized skeletal organic matrix protein 5 (Uncharacterized SOMP-5) |
aten_0.1.m1.4638 | 186.6 | SVEP1_MOUSE | Sushi, von Willebrand factor type A, EGF and pentraxin domain-containing protein 1 (Polydom) |
aten_0.1.m1.18937 | 157.8 | HMCN2_HUMAN | Hemicentin-2 |
aten_0.1.m1.14842 | 156.0 | EGF_RAT | Pro-epidermal growth factor (EGF) [Cleaved into: Epidermal growth factor] |
aten_0.1.m1.31478 | 115.4 | LRP4_HUMAN | Low-density lipoprotein receptor-related protein 4 (LRP-4) (Multiple epidermal growth factor-like domains 7) |
aten_0.1.m1.29866 | 106.7 | FBN1_HUMAN | Fibrillin-1 [Cleaved into: Asprosin] |
aten_0.1.m1.14286 | 103.3 | FRPC_NEIMB | Iron-regulated protein FrpC |
Sweep loci unique to either Marine or Plume were used to extract overlapping genes. A list of these genes is shown below.
score | geneid | wq | UniprotID | protein_name | CLR |
---|---|---|---|---|---|
332.2 | aten_0.1.m1.6387 | marine | CNG3_CHICK | Cyclic nucleotide-gated channel rod photoreceptor subunit alpha (CNG channel 3) (CNG-3) (CNG3) | 332.2 |
304.0 | aten_0.1.m1.27260;aten_0.1.m1.27263;aten_0.1.m1.27264 | marine | GRCR2_DROME;PLI2A_ARATH;PTPC1_DANRE | Glutaredoxin domain-containing cysteine-rich protein CG12206;LIM domain-containing protein PLIM2a (Pollen-expressed LIM protein 2) (AtPLIM2);Protein tyrosine phosphatase domain-containing protein 1 (EC 3.1.3.-) | 304.0 |
129.3 | aten_0.1.m1.27261;aten_0.1.m1.27266 | marine | NUD24_ARATH | Nudix hydrolase 24, chloroplastic (AtNUDT24) (EC 3.6.1.-) | 129.3 |
81.6 | aten_0.1.m1.30070 | marine | GPN2_MOUSE | GPN-loop GTPase 2 (ATP-binding domain 1 family member B) | 81.6 |
79.8 | aten_0.1.m1.24850;aten_0.1.m1.24859;aten_0.1.m1.24860 | marine | PIPNB_RAT;DAF36_CAEEL;SPCS2_DANRE | Phosphatidylinositol transfer protein beta isoform (PI-TP-beta) (PtdIns transfer protein beta) (PtdInsTP beta);Cholesterol 7-desaturase (EC 1.14.19.21) (Cholesterol desaturase daf-36);Probable signal peptidase complex subunit 2 (EC 3.4.-.-) (Microsomal signal peptidase 25 kDa subunit) (SPase 25 kDa subunit) | 79.8 |
77.1 | aten_0.1.m1.27227 | marine | TENX_HUMAN | Tenascin-X (TN-X) (Hexabrachion-like protein) | 77.1 |
71.2 | aten_0.1.m1.31614 | marine | 71.2 | ||
66.9 | aten_0.1.m1.31548;aten_0.1.m1.31552;aten_0.1.m1.31553 | plume | LTXA_AGGAC;THAP2_MOUSE | Leukotoxin (Lkt);THAP domain-containing protein 2 | 66.9 |
60.6 | aten_0.1.m1.31611 | marine | PARK7_CHICK | Protein/nucleic acid deglycase DJ-1 (EC 3.1.2.-) (EC 3.5.1.-) (EC 3.5.1.124) (Maillard deglycase) (Parkinson disease protein 7 homolog) (Parkinsonism-associated deglycase) (Protein DJ-1) (DJ-1) | 60.6 |
60.5 | aten_0.1.m1.6404 | marine | 60.5 | ||
59.7 | aten_0.1.m1.36632 | plume | MLP_ACRMI | Mucin-like protein (Fragment) | 59.7 |
56.3 | aten_0.1.m1.2697 | marine | MUC24_RAT | Sialomucin core protein 24 (MUC-24) (Endolyn) (Multi-glycosylated core protein 24) (MGC-24) (MGC-24v) | 56.3 |
53.4 | aten_0.1.m1.3491;aten_0.1.m1.3498 | plume | EDC4_XENLA;LYS1_SCHPO | Enhancer of mRNA-decapping protein 4;Saccharopine dehydrogenase [NAD(+), L-lysine-forming] (SDH) (EC 1.5.1.7) (Lysine–2-oxoglutarate reductase) | 53.4 |
52.8 | aten_0.1.m1.29942 | plume | NLRC3_HUMAN | NLR family CARD domain-containing protein 3 (CARD15-like protein) (Caterpiller protein 16.2) (CLR16.2) (NACHT, LRR and CARD domains-containing protein 3) (Nucleotide-binding oligomerization domain protein 3) | 52.8 |
52.1 | aten_0.1.m1.5129 | plume | HEX_VIBVL | Beta-hexosaminidase (EC 3.2.1.52) (Beta-N-acetylhexosaminidase) (Chitobiase) (N-acetyl-beta-glucosaminidase) | 52.1 |
Alexa, Adrian, Jörg Rahnenführer, and Thomas Lengauer. 2006. “Improved Scoring of Functional Groups from Gene Expression Data by Decorrelating GO Graph Structure.” Bioinformatics 22 (13): 1600–1607.