Skip to content

Latest commit

 

History

History
115 lines (97 loc) · 12.8 KB

06_sf2.md

File metadata and controls

115 lines (97 loc) · 12.8 KB

Interpretation of SweepFinder results

Due to the extreme demographic history of Magnetic Island we found that it was not possible to distinguish selective sweeps from demographic effects in that location. Consequently our interpretation of SweepFinder results is restricted to Northern reefs. We examined loci with significant sweep scores in two ways. Firstly we looked at the entire Northern population as a whole for which sweeps could be interpreted as being due to adaptations required across all inshore sites. Secondly we looked at the difference between Marine (Fitzroy Island, Pelorus Island) and Plume (Pandora Reef, Dunk Island) sites. Sweeps designated as Marine Only or Plume Only were identified using bedtools (see 11_marine_vs_plume.sh ). This produced the following files;

  • nomi_10_sweeps.gff containing contiguous regions with SweepFinder scores > 10 using pooled allele frequencies across all northern reefs.
  • marine_only.gff containing contiguous regions with SweepFinder scores > 10 that were in marine sites and not in plume
  • plume_only.gff containing contiguous regions with SweepFinder scores > 10 that were in plume sites and not in marine

Using the best fitting dadi model (isolation_asym_mig) as neutral background a CLR threshold of 100 gives an FDR of approximately 10%. The Manhattan plot below shows that these sites are distributed across the genome.

Genes associated with these significant loci were identified using bedtools window. This reports all overlaps between sweep loci (encoded as nomi_10_sweeps.gff and gene models). (see 07_genes_in_sweeps.sh for details). A hand annotated version of this table is included as supplementary information with the paper.

Gene Ontology annotations were obtained for these genes through GO terms assigned to conserved domains (via Interproscan) and the results were used to search for terms that might be enriched in the sweep set compared to background. GO term enrichment analysis was done using the R package topGO version 2.36.0 (Alexa, Rahnenführer, and Lengauer 2006) using genes associated with sweeps (scores > 100) as the target set and all other annotated genes as background. topGO uses a weighting scheme (we used the weight01 scheme) to downweight genes that are also attached to related terms in the GO graph. Significance testing was performed using Fisher’s exact test based on weighted gene counts. As outlined in the topGO manual) there is no clear way to apply a formal multiple-testing corrections for this p-value.

A single GO term, GO:0005509 calcium ion binding was significantly enriched among sweep genes. This term was associated with SOMPs as well as EGF domain containing genes, both of which were abundant in the target set.

GO.ID Term Annotated Significant Expected classic ontology
GO:0005509 calcium ion binding 667 9 1.77 4.5e-05 MF

Genes annotated with the GO term GO:0005509 calcium ion binding.

geneid CLR UniprotID Protein Name
aten_0.1.m1.7647 436.0 FBN2_HUMAN Fibrillin-2 [Cleaved into: Fibrillin-2 C-terminal peptide]
aten_0.1.m1.7648 412.5 FBN1_MOUSE Fibrillin-1 [Cleaved into: Asprosin]
aten_0.1.m1.9801 188.4 USOM5_ACRMI Uncharacterized skeletal organic matrix protein 5 (Uncharacterized SOMP-5)
aten_0.1.m1.4638 186.6 SVEP1_MOUSE Sushi, von Willebrand factor type A, EGF and pentraxin domain-containing protein 1 (Polydom)
aten_0.1.m1.18937 157.8 HMCN2_HUMAN Hemicentin-2
aten_0.1.m1.14842 156.0 EGF_RAT Pro-epidermal growth factor (EGF) [Cleaved into: Epidermal growth factor]
aten_0.1.m1.31478 115.4 LRP4_HUMAN Low-density lipoprotein receptor-related protein 4 (LRP-4) (Multiple epidermal growth factor-like domains 7)
aten_0.1.m1.29866 106.7 FBN1_HUMAN Fibrillin-1 [Cleaved into: Asprosin]
aten_0.1.m1.14286 103.3 FRPC_NEIMB Iron-regulated protein FrpC

Marine vs Plume

Sweep loci unique to either Marine or Plume were used to extract overlapping genes. A list of these genes is shown below.

score geneid wq UniprotID protein_name CLR
332.2 aten_0.1.m1.6387 marine CNG3_CHICK Cyclic nucleotide-gated channel rod photoreceptor subunit alpha (CNG channel 3) (CNG-3) (CNG3) 332.2
304.0 aten_0.1.m1.27260;aten_0.1.m1.27263;aten_0.1.m1.27264 marine GRCR2_DROME;PLI2A_ARATH;PTPC1_DANRE Glutaredoxin domain-containing cysteine-rich protein CG12206;LIM domain-containing protein PLIM2a (Pollen-expressed LIM protein 2) (AtPLIM2);Protein tyrosine phosphatase domain-containing protein 1 (EC 3.1.3.-) 304.0
129.3 aten_0.1.m1.27261;aten_0.1.m1.27266 marine NUD24_ARATH Nudix hydrolase 24, chloroplastic (AtNUDT24) (EC 3.6.1.-) 129.3
81.6 aten_0.1.m1.30070 marine GPN2_MOUSE GPN-loop GTPase 2 (ATP-binding domain 1 family member B) 81.6
79.8 aten_0.1.m1.24850;aten_0.1.m1.24859;aten_0.1.m1.24860 marine PIPNB_RAT;DAF36_CAEEL;SPCS2_DANRE Phosphatidylinositol transfer protein beta isoform (PI-TP-beta) (PtdIns transfer protein beta) (PtdInsTP beta);Cholesterol 7-desaturase (EC 1.14.19.21) (Cholesterol desaturase daf-36);Probable signal peptidase complex subunit 2 (EC 3.4.-.-) (Microsomal signal peptidase 25 kDa subunit) (SPase 25 kDa subunit) 79.8
77.1 aten_0.1.m1.27227 marine TENX_HUMAN Tenascin-X (TN-X) (Hexabrachion-like protein) 77.1
71.2 aten_0.1.m1.31614 marine 71.2
66.9 aten_0.1.m1.31548;aten_0.1.m1.31552;aten_0.1.m1.31553 plume LTXA_AGGAC;THAP2_MOUSE Leukotoxin (Lkt);THAP domain-containing protein 2 66.9
60.6 aten_0.1.m1.31611 marine PARK7_CHICK Protein/nucleic acid deglycase DJ-1 (EC 3.1.2.-) (EC 3.5.1.-) (EC 3.5.1.124) (Maillard deglycase) (Parkinson disease protein 7 homolog) (Parkinsonism-associated deglycase) (Protein DJ-1) (DJ-1) 60.6
60.5 aten_0.1.m1.6404 marine 60.5
59.7 aten_0.1.m1.36632 plume MLP_ACRMI Mucin-like protein (Fragment) 59.7
56.3 aten_0.1.m1.2697 marine MUC24_RAT Sialomucin core protein 24 (MUC-24) (Endolyn) (Multi-glycosylated core protein 24) (MGC-24) (MGC-24v) 56.3
53.4 aten_0.1.m1.3491;aten_0.1.m1.3498 plume EDC4_XENLA;LYS1_SCHPO Enhancer of mRNA-decapping protein 4;Saccharopine dehydrogenase [NAD(+), L-lysine-forming] (SDH) (EC 1.5.1.7) (Lysine–2-oxoglutarate reductase) 53.4
52.8 aten_0.1.m1.29942 plume NLRC3_HUMAN NLR family CARD domain-containing protein 3 (CARD15-like protein) (Caterpiller protein 16.2) (CLR16.2) (NACHT, LRR and CARD domains-containing protein 3) (Nucleotide-binding oligomerization domain protein 3) 52.8
52.1 aten_0.1.m1.5129 plume HEX_VIBVL Beta-hexosaminidase (EC 3.2.1.52) (Beta-N-acetylhexosaminidase) (Chitobiase) (N-acetyl-beta-glucosaminidase) 52.1

Alexa, Adrian, Jörg Rahnenführer, and Thomas Lengauer. 2006. “Improved Scoring of Functional Groups from Gene Expression Data by Decorrelating GO Graph Structure.” Bioinformatics 22 (13): 1600–1607.