Convert HAL to vg-compatible sequence graph.
Supports the three sequence graph formats in libbdsg:
- PackedGraph (default)
- ODGI
- HashGraph
- Each sequence in the HAL is added as a thread to a Pinch Graph.
- Exact pairwise alignment blocks (no gaps or substitutions) are extracted from each branch in the HAL tree and "pinched" in the graph
- For each branch, bases in the child that have substitutions in the parent (snps) are aligned across the tree using the column iterator and all exact matches are extracted and pinched.
- Pinch graph is cleaned up by merging trivial joins
- Each HAL sequence is traced through the pinch graph, adding nodes and edges to the output sequence graph. A table is maintained to map each pinch graph block to a sequence graph node.
- Sort the output with
vg ids --sort
.
You can download a standalone binary for the latest release here.
You can use the the Dockerfile as a guide to see how all dependencies are installed with apt
on Ubuntu. More details on installing HDF5 can be found in the HAL README
Cloning: Don't forget to clone submodules with the --recursive
option:
git clone https://github.com/glennhickey/hal2vg.git --recursive
Compiling:
make
It is required to use the --inMemory
option for all but trivial inputs.
vg
has been tuned to work best on graphs with nodes chopped to at most 32 bases. It is therefore recommended to use the --chop 32
option.
hal2vg input.hal --inMemory --chop 32 --progress > output.pg
Note: The output graph is only readable by vg version 1.24.0 and greater.
Copyright (C) 2020 by UCSC Computational Genomics Lab