(c) 2009 - 2012 The Authors, see LICENSE.txt for details.
Dent Earl, Benedict Paten, Mark Diekhans
The evolver team is responsible for items in external/ : George Asimenos and Robert C. Edgar, Serafim Batzoglou and Arend Sidow.
A jobTree based simulation manager for the Evolver genome evolution simulation tool suite.
evolverSimControl (eSC) can be used to simulate multi-chromosome genome evolution on an arbitrary phylogeny (Newick format). In addition to simply running evolver, eSC also automatically creates statistical summaries of the simulation as it runs including text and image files. Also included are convenience scripts to: check on a running simulation and see detailed status and logging information; extract fasta sequence files from the leaf nodes of a completed simulation; extract pairwise multiple alignment files (.maf) from leaf and branch nodes from a completed simulation and with the help of mafJoin, join them together into a single maf covering the entire simulation.
The use of jobTree means that you can run eSC on a cluster running a jobTree supported batch system, on a multi-cored server or on your laptop.
- sonLib: https://github.com/benedictpaten/sonLib/
- jobTree: https://github.com/benedictpaten/jobTree/
- evolver: http://www.drive5.com/evolver/ Specifically, make sure that the Evolver tools are on your
PATH
environmental variable and that their names are preceeded withevolver_
. Specifically all of the following list of files need to be on yourPATH
.evolver_cvt
evolver_evo
evolver_transalign
- trf: http://tandem.bu.edu/trf/trf.html Tandem Repeats Finder.
- mafJoin: https://github.com/dentearl/mafTools Not necessary for simple simulations, mafJoin (part of mafTools) is only needed if you wish to create a maf alignment of all sequences following a simulation.
- R: http://cran.r-project.org/ Only necessary if you wish to use the
simCtrl_postSimAnnotDistExtractor.py
script to view annotation size distributions following a simulation.
- Linux on i86 Intel. This is due to core Evolver executables being distributed as pre-compiled binaries.
- Download the package. Consider making it a sibling directory to
jobTree/
andsonLib/
. cd
into the directory.- Type
make
. - Edit your
PYTHONPATH
environmental variable to contain the parent directory of theevolverSimControl/
directory. - Type
make test
.
This example will work you through a small simulation using the toy test example available at http://soe.ucsc.edu/~dearl/software/evolverSimControl/. If you want to create your own infile you can use evolverInfileGeneration to generate your own infile set.
- Download and expand the toy archive. For simplicity I'll assume that both
root/
andparams/
are in the working directory, i.e../
. - Next we run the runSim program:
$ simCtrl_runSim.py --inputNewick '(Knife:0.004, (Fork:0.003, (Ladle:0.002, (Spoon:0.001, Teaspoon:0.001)S-TS:.001)S-TS-L:.001)S-TS-L-F:0.001);' --outDir toyExampleSim --rootDir root/ --rootName hg18 --paramsDir params/ --jobTree jobTreeToyExampleSim --maxThreads 32 --seed 3571
- You can check on a running simulation by using
simCtrl_checkSimStatus.py
, use--help
for options.
- Post simulation you can run
simCtrl_postSimFastaExtractor.py
to extract fasta sequence files from the genomes. - You may also wish to run
simCtrl_postSimAnnotDistExtractor.py
which will use the ggplot2 package for R to display the length distributions of some of the annotations. - You may also wish to construct a single maf for the simulation using
simCtrl_postSimMafExtractor.py
which will use mafJoin to join the pairwise maf output from Evolver into a single simulation wide maf. This process is extremely memory intensive with the 120Mb Mammal simulation eventually requiring aprroximately 250Gb of memory.
In order to run eSC you will need an infile set, a parameter set, a phylogenetic tree and optionally a mobile element library and mobile element parameter set. Infile sets can be created using evolverInfileGenerator or from scratch. Parameter sets can be generated by reading primary literature and coming up with reasonable values. Phylogenetic trees need to be in Newick format.
Available options for running a simulation are listed below.
$ bin/simCtrl_runSim.py --help
Usage: simCtrl_runSim.py --rootName=name --rootDir=/path/to/dir --paramsDir=/path/to/dir --tree=newickTree --stepLength=stepLength --outDir=/path/to/dir --jobTree=/path/to/dir [options]
simCtrl_runSim.py is used to initiate an evolver simulation using jobTree/scriptTree.
Options:
-h, --help
show this help message and exit--rootDir=ROOTINPUTDIR
Input root directory.--rootName=ROOTNAME
name of the root genome, to differentiate it from the input Newick. default=root--inputNewick=INPUTNEWICK
Newick tree. http://evolution.genetics.washington.edu/phylip/newicktree.html--stepLength=STEPLENGTH
stepLength for each cycle. default=0.001--paramsDir=PARAMSDIR
Parameter directory.--outDir=OUTDIR
Out directory.--seed=SEED
Random seed, either an int or "stochastic". default=stochastic--noMEs
Turns off all mobile element and RPG modules in the sim. default=False--noBurninMerge
Turns off checks for an aln.rev file in the root dir. default=False--noGeneDeactivation
Turns off the gene deactivation step. default=False--maxThreads=MAXTHREADS
The maximum number of threads to use when running in single machine mode. default=4- ... and all other jobTree standard options.
To check on a running simulation you can use the simCtrl_checkSimStatus.py
script.
$ bin/simCtrl_checkSimStatus.py --help
Usage: simCtrl_checkSimStatus.py --simDir path/to/dir [options]
simCtrl_checkSimStatus.py can be used to check on the status of a running or completed evolverSimControl simulation.
Options:
-h, --help
show this help message and exit--simDir=SIMDIR
Parent directory.--drawText, --drawTree
prints an ASCII representation of the current tree status. default=False--curCycles
prints out the list of currently running cycles. default=False--stats
prints out the statistics for cycle steps. default=False--cycleStem
prints out a stem and leaf plot for completed cycle runtimes, in seconds. default=False--cycleStemHours
prints out a stem and leaf plot for completed cycle runtimes, in hours. default=False--printChrTimes
prints a table of chromosome lengths (bp) and times (sec) for intra chromosome evolution step (CycleStep2).--cycleList
prints out a list of all completed cycle runtimes. default=False--html
prints output in HTML format for use as a cgi. default=False--htmlDir=HTMLDIR
prefix for html links.
To extract fasta sequences from a completed simulation you can use the simCtrl_postSimFastaExtractor.py
script.
$ bin/simCtrl_postSimFastaExtractor.py --help
Usage: simCtrl_postSimFastaExtractor.py --simDir path/to/dir [options]
simCtrl_postSimFastaExtractor.py takes in a simulation directory and then extracts the sequences of leaf nodes in fasta format and stores them in the respective step's directory.
Options:
-h, --help
show this help message and exit--simDir=SIMDIR
the simulation directory.--allCycles
extract fastas from all cycles, not just leafs. default=False
To create a single maf reflecting the evolutionary history of the entire simulation simCtrl_postSimFastaExtractor.py
script.
$ bin/simCtrl_postSimMafExtractor.py --help
Usage: simCtrl_postSimMafExtractor.py --simDir path/to/dir [options]
simCtrl_postSimMafExtractor.py requires mafJoin which is part of mafTools and is available at https://github.com/dentearl/mafTools/ .
Options:
-h, --help
show this help message and exit--simDir=SIMDIR
Simulation directory.--maxBlkWidth=MAXBLKWIDTH
Maximum mafJoin maf block output size. May be reduced towards 250 for complicated phylogenies. default=10000--maxInputBlkWidth=MAXINPUTBLKWIDTH
Maximum mafJoin maf block input size. mafJoin will cut inputs to size, may result in long runs for very simple joins. May be reduced towards 250 for complicated phylogenies. default=1000--noBurninMerge
Will not perform a final merge of simulation to the burnin. default=False--maxThreads=MAXTHREADS
The maximum number of threads to use when running in single machine mode. default=4- ... and all other jobTree standard options.