Skip to content

SortMeRNA: next-generation sequence filtering and alignment tool

License

GPL-3.0, GPL-3.0 licenses found

Licenses found

GPL-3.0
LICENSE.txt
GPL-3.0
COPYING
Notifications You must be signed in to change notification settings

thuanguyen/sortmerna

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

sortmerna

Build Status

SortMeRNA is a local sequence alignment tool for filtering, mapping and clustering.

The core algorithm is based on approximate seeds and allows for sensitive analysis of NGS reads. The main application of SortMeRNA is filtering rRNA from metatranscriptomic data. SortMeRNA takes as input a file of reads (fasta or fastq format) and one or multiple rRNA database file(s), and sorts apart aligned and rejected reads into two files specified by the user. Additional applications include clustering and taxonomy assignation available through QIIME v1.9.1 (http://qiime.org). SortMeRNA works with Illumina, Ion Torrent and PacBio data, and can produce SAM and BLAST-like alignments.

Visit http://bioinfo.lifl.fr/RNA/sortmerna/ for more information.

Table of Contents

Support

For questions and comments, please use the SortMeRNA forum.

Documentation

If you have Doxygen installed, you can generate the documentation by modifying the following lines in doxygen_configure.txt:

INPUT = /path/to/sortmerna/include /path/to/sortmerna/src
IMAGE_PATH = /path/to/sortmerna/algorithm

and running the following command:

doxygen doxygen_configure.txt

This command will generate a folder html in the directory from which the command was run.

Getting Started

SortMeRNA can be built and run on Windows, Linux, and Mac.

There are 3 methods to get SortMeRNA:

  1. GitHub repository Build development version from sources (master branch) ...* Installation instructions
  2. GitHub releases Build Release version from sources (tar balls, zip) ...* on Linux ...* on Mac OS ...* on Windows OS
  3. GitHub releases Use pre-built Release binaries.

Option (3) is the simplest, as it provides access to pre-compiled binaries.

SortMeRNA Compilation

The OS we use for development:

  1. Linux: Ubuntu 16.04 LTS Xenial with GCC 7.3.0
  2. Windows: 10 with Visual Studio 15 2017 Win64
  3. MAC: macOS 10.13 High Sierra (64-bit) with AppleClang 9.0.0.9000039

Other environments we tested:

  • Centos 6.6 with GCC 7.3.0. Getting latest GCC on old Centos requires building GCC from sources - a lengthy process (around 10 hours on Centos VM running on VBox Windows 10 host). Upgrading GCC on Ubuntu for comparison is easy through PPA packages.

CMake is used for generating the build files and should be installed prior the build. CMake distributions are available for all major operating systems. Please visit CMake project website for download and installation instructions.

The following Flags can be used when generating the build files (-D<FLAG>=VALUE):

  • WITH_TESTS (build unit tests)
  • ROCKSDB_INCLUDE_DIR (path to RocksDB include directory)
  • ROCKSDB_LIB_DEBUG (path to RocksDB library for Debug)
  • ROCKSDB_LIB_RELEASE (path to RocksDB library for Release)
  • ZLIB_LIB_DEBUG (path to ZLib debug library location. Use if location is custom)
  • ZLIB_LIB_RELEASE (path to ZLib release library locations. Use if location is custom)
  • SRC_ZLIB (download Zlib sources. Use if ZLib is to be built)
  • SRC_ROCKSDB (download RocksDB sources. Use if RocksDB is to be built)
  • SRC_RAPIDJSON (download RapidJson sources. Use if 'apt install rapidjson' not available)
  • SET_ROCKSDB (set to 1 to indicate RocksDB was built from sources. Not nesessary of RocksDB is installed using packager)
  • SET_ZLIB (set to 1 to indicate ZLib was built from sources.)

The above flags can be ignored if the dependencies (zlib, rocksdb, rapidjson) are installed using a standard packager like 'apt' (on Linux) or 'homebrew' (on Mac)

Linux OS

(1) Install GCC if not already installed. SortmeRNA is C++14 compliant, so the GCC needs to be fairly new e.g. 5.4.0 works OK.

gcc --version
	gcc (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609

(2) Install pre-requisites (CMake, Git, Zlib, RocksDB, RapidJson)

sudo apt update
sudo apt install cmake
sudo apt install git
suod apt install zlib
sudo apt install rocksdb
sudo apt install rapidjson

If the dependencies cannot be installed using a package manager, they need to be built (read below).

(3) Clone the Git repository

git clone https://github.com/biocore/sortmerna.git

(4) Generate the build files using CMake:

mkdir -p $SMR_HOME/build/Release
pushd $SMR_HOME/build/Release

(4.1) If all the dependencies are available on the system

cmake -G "Unix Makefiles" -DCMAKE_BUILD_TYPE=Release ../..

(4.2) If RocksDB and RapidJson have to be installed from sources (see the flags description above)

cmake -G "Unix Makefiles" -DCMAKE_BUILD_TYPE=Release -DSRC_ROCKSDB=1 -DSRC_RAPIDJSON=1 -DSET_ROCKSDB=1 ../..

The above will download RocksDB and RapidJson into default locations ($SMR_HOME/3rdparty/rocksdb) and ($SMR_HOME/3rdparty/rapidjson) correspondingly.

OR with custom values for RocksDB include/lib

cmake -G "Unix Makefiles" -DCMAKE_BUILD_TYPE=Release -DSRC_ROCKSDB=1 -DSRC_RAPIDJSON=1 -DSET_ROCKSDB=1 -DROCKSDB_INCLUDE_DIR=$SOME_DIR/rocksdb/include -DROCKSDB_LIB_RELEASE=$SOME_DIR/rocksdb/build/Release ../..

NOTE: $SMR_HOME is the top directory where sortmerna code (e.g. git repo) is located.

Other compiler/linker flags that might be necessary depending on the system:

  • -DEXTRA_CXX_FLAGS_RELEASE="-lrt" (had to use this on Centos 6.6 + GCC 7.3.0)

The above commands will perform necessary system check-ups, dependencies, and generate Makefile.

(5) Compile and build executables:

(5.1) If RocksDB needs to be built

mdir -p SMR_HOME/3rdparty/rocksdb/build/Release
pushd SMR_HOME/3rdparty/rocksdb/build/Release
cmake -G "Unix Makefiles" -DCMAKE_BUILD_TYPE=Release -DPORTABLE=1 -DWITH_ZLIB=1 -DWITH_TESTS=0 -DWITH_TOOLS=0 ../..
make
popd

(5.2)

Build SortMeRNA

make

The binaries are created in $SMR_HOME/build/Release/src/indexdb and $SMR_HOME/build/Release/src/sortmerna Simply add the build binaries to the PATH e.g.

export PATH="$SMR_HOME/build/Release/src/indexdb:$SMR_HOME/build/Release/src/sortmerna:$PATH"

Mac OS

We tested the build on macOS 10.13 High Sierra (64-bit). We recommend the Homebrew - an excellent packager for Mac [1], which has all the latest packages required to build SortmeRNA. The build can be performed using either Clang or GCC.

(1) Install Homebrew:

/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)" [1]

brew --version
brew help

(2) Install pre-requisites (CMake, Git, Zlib, RocksDB, RapidJson)

brew install cmake
brew install git
brew install zlib
brew install rocksdb
brew install rapidjson

(3) Clone the GIt repository

git clone https://github.com/biocore/sortmerna.git

(4) Generate the build files:

mkdir -p $SMR_HOME/build/Release
pushd $SMR_HOME/build/Release
cmake -G "Unix Makefiles" -DCMAKE_BUILD_TYPE=Release -DEXTRA_CXX_FLAGS_RELEASE="-pthread" ../..
	-- The CXX compiler identification is AppleClang 9.0.0.9000039
	-- The C compiler identification is AppleClang 9.0.0.9000039
	-- Check for working CXX compiler: /Library/Developer/CommandLineTools/usr/bin/c++
	-- Check for working CXX compiler: /Library/Developer/CommandLineTools/usr/bin/c++ -- works
	-- Detecting CXX compiler ABI info
	-- Detecting CXX compiler ABI info - done
	-- Detecting CXX compile features
	-- Detecting CXX compile features - done
	-- Check for working C compiler: /Library/Developer/CommandLineTools/usr/bin/cc
	-- Check for working C compiler: /Library/Developer/CommandLineTools/usr/bin/cc -- works
	-- Detecting C compiler ABI info
	-- Detecting C compiler ABI info - done
	-- Detecting C compile features
	-- Detecting C compile features - done
	CMAKE_CXX_COMPILER_ID = AppleClang
	CMAKE_CONFIGURATION_TYPES =
	CMAKE_CXX_FLAGS_RELEASE: -O3 -DNDEBUG
	EXTRA_CXX_FLAGS_RELEASE: -pthread
	Cloning into 'concurrentqueue'...
	Checking out files: 100% (1613/1613), done.
	-- Configuring done
	-- Generating done
	-- Build files have been written to: /Users/bc/sortmerna/build/Release

Note: $SMR_HOME is the top directory where sortmerna code (e.g. git repo) is located.

CMake will perform necessary system check-ups, dependencies, and generate Makefile.

(5) Compile and build executables:

make

The binaries are created in $SMR_HOME/build/Release/src/indexdb and $SMR_HOME/build/Release/src/sortmerna

Simply add the build binaries to the PATH e.g.

export PATH="$SMR_HOME/build/Release/src/indexdb:$SMR_HOME/build/Release/src/sortmerna:$PATH"

Mac OS - configure shell to use Clang compiler

(1) Check if you have Clang installed:

clang --version

(2a) If Clang is installed, set your compiler to Clang:

export CC=clang
export CXX=clang++

(2b) If Clang is not installed, see Clang for Mac OS for installation instructions.

Mac OS - configure shell to use GCC compiler

(1) Check if you have GCC installed:

gcc --version

(2a) If GCC is installed, set your compiler to GCC:

export CC=gcc-mp-5.4
export CXX=g++-mp-5.4

(2b) If GCC is not installed, it can be installed through Homebrew or MacPorts.

brew tap homebrew/versions
brew install [flags] gcc54

To list available flags

brew options gcc54

Mac OS - install Clang

Installing Xcode (free through the App Store) and Xcode command line tools will automatically install the latest version of Clang supported with Xcode.

After installing Xcode, the Xcode command line tools may be installed via:

Xcode -> Preferences -> Downloads

Under "Components", click to install "Command Line Tools"

Windows OS

MS Visual Studio Community edition and CMake for Windows are required for building SortMeRNA.

We tested the build using Visual Studio 15 2017 Win64 and Visual Studio 14 2015 Win64

(1) Download and Install VS Community edition from Visual Studio community website

(2) Install CMake

CMake can be installed using either Windows Installer or binaries from archive. Download binary distributions from here

If you choose portable binaries (not the installer) e.g. cmake-3.11.0-rc1-win64-x64.zip, just download and extract the archive in a directory of your choice e.g.

C:\libs\cmake-3.11.0-rc1-win64-x64\
	bin\
	doc\
	man\
	share\

The bin directory above contains cmake.exe and cmake-gui.exe. Add the bin directory to your PATH Start cmd and

set PATH=C:\libs\cmake-3.11.0-rc1-win64-x64\bin;%PATH%
cmake --version
cmake-gui

(3) Install Git for Windows

Download binary distribution either portable or the installer from here

The portable distribution is a self-extracting archive that can be installed in a directory of your choice e.g.

C:\libs\git-2.16.2-64\
	bin\
	cmd\
	dev\
	etc\
	mingw64\
	tmp\
	usr\

You can use either bash.exe or native Windows CMD cmd.exe.

If you choose to work with CMD, add the following to your path:

set GIT_HOME=C:\libs\git-2.16.2-64
set PATH=%GIT_HOME%\bin;%GIT_HOME%\usr\bin;%GIT_HOME%\mingw64\bin;%PATH%

git --version

(4) Clone the GIt repository

git clone https://github.com/biocore/sortmerna.git

(5) Prepare the build files:

On Windows we recommend using the cmake-gui utility. Either navigate to CMake installation directory (using Windows Explorer) and double-click cmake-gui, or launch it from command line as shown below:

set PATH=C:\libs\cmake-3.11.0-rc1-win64-x64\bin;%PATH%
cmake-gui

In the CMake GUI

  • click Browse source button and navigate to the directory where Sortmerna sources are located (SMR_HOME).
  • click Browse Build and navigate to the directory where to build the binaries e.g. %SMR_HOME%\build
  • at the prompt select the Generator from the list e.g. "Visual Studio 15 2017 Win64"
  • click Configure
  • Set the following variables:
    • ZLIB_INCLUDE_DIR=%SMR_HOME%/3rdparty/zlib
    • ZLIB_LIB_DEBUG=%SMR_HOME%/3rdparty/zlib/build/Debug
    • ZLIB_LIB_RELEASE=%SMR_HOME%/3rdparty/zlib/build/Release
    • ROCKSDB_INCLUDE_DIR=%SMR_HOME%/3rdparty/rocksdb/include
    • ROCKSDB_LIB_DEBUG=%SMR_HOME%/3rdparty/rocksdb/build/Debug
    • ROCKSDB_LIB_RELEASE=%SMR_HOME%/3rdparty/rocksdb/build/Release
  • click Configure again
  • click Generate if all variables were set OK (no red background)

The Generate generates VS project files in %SMR_HOME%\build\ directory. %SMR_HOME% is the top directory where SortMeRNA source distribution (e.g. Git repo) is installed.

(6) Configure and build Zlib library

When Cmake-gui Configure is run it downloads required 3rd party source packages into %SMR_HOME%\3rdparty\ directory.

In Cmake-gui:

  • click Browse Source... and select %SMR_HOME%\3rdparty\zlib\
  • click Browse Build... and select %SMR_HOME%\3rdparty\zlib\build\ (confirm to create the build directory if not already exists)
  • click Configure and set the required variables or accept defaults
  • click Generate

In Visual Studio

  • File -> Open -> Project/Solution and select %SMR_HOME%\3rdparty\zlib\build\zlib.sln
  • In Solution Explorer right-click ALL_BUILD and select build from drop-down menu

(7) COnfigure and build RockDB library

In Cmake-gui:

  • click Browse Source... and select %SMR_HOME%\3rdparty\rocksdb\
  • click Browse Build... and select %SMR_HOME%\3rdparty\rocksdb\build\ (confirm to create the build directory if not already exists)
  • click Configure and set the following variables:
    • Ungrouped Entries
      • PORTABLE (checkbox)
      • GIT_EXECUTABLE (select path to git.exe e.g. C:/libs/git-2.16.2-64/bin/git.exe
    • WITH
      • WITH_MD_LIBRARY
      • WITH_ZLIB
      • Accept defaults for the rest
  • click Generate

In Visual Studio

  • File -> Open -> Project/Solution and select %SMR_HOME%\3rdparty\rocksdb\build\rocksdb.sln
  • In Solution Explorer right-click ALL_BUILD and select build from drop-down menu

(8) Build SormeRNA

In Visual Studio:

  • File -> Open -> Project/Solution .. open %SMR_HOME%\build\sortmerna.sln
  • Select desired build type: Release | Debug | RelWithDebInfo | MinSizeRel.
  • In Solution explorer right-click ALL_BUILD' and select build` in drop-down menu.

Depending on the build type the binaries are generated in %SMR_HOME%\build\src\sortmerna\Release (or Debug | RelWithDebInfo | MinSizeRel).

(9) Add sortmerna executables to PATH

set PATH=%SMR_HOME%\build\src\indexdb\Release;%SMR_HOME%\build\src\sortmerna\Release;%PATH%

Running

Python code is provided for running integration tests in $SRM_HOME/tests (%SRM_HOME%\tests) and requires Python 3.5 or higher.

Tests can be run with the following command:

python ./tests/test_sortmerna.py

OR individual tests

python ./tests/test_sortmerna.py SortmernaTests.test_simulated_amplicon_generic_buffer

Tests on compressed data files

python ./tests/test_sortmerna_zlib.py

Users require scikit-bio 0.5.0 to run the tests.

Third-party libraries

Various features in SortMeRNA are dependent on third-party libraries, including:

  • ALP: computes statistical parameters for Gumbel distribution (K and Lambda)
  • CMPH: C Minimal Perfect Hashing Library
  • Zlib: reading compressed Reads files
  • RocksDB: storage for SortmeRNA alignment results
  • RapidJson: serialization of Reads objects to store in RocksDB
  • Concurrent Queue: Lockless buffer for Reads accessed from multiple processing threads

Wrappers and Packages

Galaxy

Thanks to Björn Grüning and Nicola Soranzo, a Galaxy wrapper exists for SortMeRNA 2.1. Please visit Björn's github page for installation.

Debian

Thanks to the Debian Med team, SortMeRNA 2.0 is now a package in Debian. Thanks to Andreas Tille for the sortmerna and indexdb_rna man pages (version 2.0). These have been updated for 2.1 in the master repository.

GNU Guix

Thanks to Ben Woodcroft for adding SortMeRNA 2.1 to GNU Guix, find the package here.

QIIME

SortMeRNA 2.0 can be used in QIIME's pick_closed_reference_otus.py, pick_open_reference_otus.py and assign_taxonomy.py scripts.

Note: At the moment, only 2.0 is compatible with QIIME.

Taxonomies

The folder rRNA_databases/silva_ids_acc_tax.tar.gz contains SILVA taxonomy strings (extracted from XML file generated by ARB) for each of the reference sequences in the representative databases. The format of the files is three tab-separated columns, the first being the reference sequence ID, the second being the accession number and the final column is the taxonomy.

Citation

If you use SortMeRNA, please cite: Kopylova E., Noé L. and Touzet H., "SortMeRNA: Fast and accurate filtering of ribosomal RNAs in metatranscriptomic data", Bioinformatics (2012), doi: 10.1093/bioinformatics/bts611.

Contributors

See AUTHORS for a list of contributors to this project.

References

  1. Homebrew

About

SortMeRNA: next-generation sequence filtering and alignment tool

Resources

License

GPL-3.0, GPL-3.0 licenses found

Licenses found

GPL-3.0
LICENSE.txt
GPL-3.0
COPYING

Stars

Watchers

Forks

Packages

No packages published

Languages

  • C++ 61.1%
  • Python 12.3%
  • C 11.3%
  • TeX 7.0%
  • Shell 4.3%
  • CMake 1.9%
  • Other 2.1%