Skip to content

Latest commit

 

History

History
91 lines (68 loc) · 3.16 KB

09_repeats.md

File metadata and controls

91 lines (68 loc) · 3.16 KB

Repetitive Elements in Acroporid Genomes

Repeat content was analysed for the following Acroporid genomes;

  • Acropora millepora from short reads (amil) (Ying et al. 2019)
  • Acropora millepora from long reads (amilv2) obtained from https://przeworskilab.com/data/
  • Acropora tenuis this paper (aten)
  • Acropora digitifera version 1.1 from short reads (adi) obtained from genbank (Shinzato et al. 2011)

Each of these genomes was scanned for repeats as follows (using amil.fasta as an example genome);

  1. RepeatModeller version 1.0.8 was used to discover repeats
BuildDatabase amil.fasta -name amil
RepeatModeler-1.0.8 -database amil -engine ncbi -pa 10
  1. RepeatMasker version 4.0.7 was run using the database of repeats (amil-families.fa, output from step
    1. identified with RepeatModeller as follows;(-a option to writes alignments to a file)
mkdir -p amil_repeat_masker_out
RepeatMasker -pa 40 \
    -xsmall \
    -a -gff \
    -lib amil-families.fa \
    -dir amil_repeat_masker_out \
    -e ncbi \
    amil.fasta
  1. The script calcDivergenceFromAlign.pl included with RepeatMasker was used to calculate divergences from alignments created in step 2.

For each genome this process produces a file with extension .align.divsum which includes weighted average Kimura divergences for each repeat family. Assuming that individual copies within a family diverge neutrally at a constant rate these divergences are a proxy for the age of the family.

A broad overview of major repeat classes indicates that the overall composition of repeats is largely unchanged across Acroporid genomes. Unknown repeats comprise a large fraction of elements. One interesting pattern is that genomes from long reads (amil2, aten) seem to have higher counts of long repeats (longer than 100kb) which likely reflects improved ability to assemble these. In the figure below columns divide data by repeat length and rows represent the genomes (see above for genome codes)

Unknown repeats dominate this plot so we eliminate those in the next plot in order looking in detail at major classes of known repeats. There are very few noticeable changes at this level suggesting that the broad repeat composition is relatively invariant across Acroporid genomes.

Shinzato, Chuya, Eiichi Shoguchi, Takeshi Kawashima, Mayuko Hamada, Kanako Hisata, Makiko Tanaka, Manabu Fujie, et al. 2011. “Using the Acropora Digitifera Genome to Understand Coral Responses to Environmental Change.” Nature 476 (7360): 320–23.

Ying, Hua, David C Hayward, Ira Cooke, Weiwen Wang, Aurelie Moya, Kirby R Siemering, Susanne Sprungala, Eldon E Ball, Sylvain Forêt, and David J Miller. 2019. “The Whole-Genome Sequence of the Coral Acropora Millepora.” Genome Biol. Evol. 11 (5): 1374–9.