You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Faster k-mer counting / comparison algorithms (update: I used suffix arrays, although the way I used them could probs be sped up)
Use input sequence size to determine which k-mer counting method to use (the older naive, memory-inefficient method is faster for small inputs than the suffix-array-based method)
Do the conversions of s1, s2, and rc(s2) to bytes at the start of matrix construction, and then do everything thereafter in bytes? (At the very least, don't convert both s2 and rc(s2) to bytes separately; that's silly.)
Use Cython / etc.
Support FASTA files as input and then process them in chunks or something -- removes need to store massively long sequences in memory (not sure how this would work with pydivsufsort, tho)
If both sequences are equal (i.e. we're creating a self dot plot), use this to speed up dot plot construction. Some ideas:
Don't bother creating an extra suffix array
Only fill in one half of the matrix triangle, since the upper and lower triangle in a self dot plot should be symmetric? (this might be hard to do using the suffix array approach, tho)
The text was updated successfully, but these errors were encountered:
Ideas for speeding this up
s1
,s2
, andrc(s2)
to bytes at the start of matrix construction, and then do everything thereafter in bytes? (At the very least, don't convert boths2
andrc(s2)
to bytes separately; that's silly.)rc()
function withstr.maketrans
: https://bioinformatics.stackexchange.com/questions/3583/what-is-the-fastest-way-to-get-the-reverse-complement-of-a-dna-sequence-in-pythoThe text was updated successfully, but these errors were encountered: