Skip to content
This repository has been archived by the owner on Jul 20, 2021. It is now read-only.

local_alignment_search todo comment #255

Open
corburn opened this issue Feb 28, 2017 · 0 comments
Open

local_alignment_search todo comment #255

corburn opened this issue Feb 28, 2017 · 0 comments

Comments

@corburn
Copy link
Contributor

corburn commented Feb 28, 2017

The following is a snippet from 2.2.4 A complete homology search function:

Spend a minute looking at this function to understand what it's doing.

# then we reverse-sort them by score, and return the n highest
# scoring alignments (this needs to be updated so we only
# ever keep track of the n highest scoring alignments)
best_hits = sorted(hits, key=lambda e: e[1], reverse=True)[:n]

The comment says the code 'needs to be updated to keep track of the highest scoring alignments', but it appears this is already accomplished by slicing the list with[:n]. Is this an old comment that can be removed or is the goal to replace the hits list with something like a heap such that only the n highest scoring elements are kept in memory at any one time?

  • option 1: remove the comment
  • option 2: replace the list with a heap
  • option n: ???

Assuming option 2, the following should restrict memory usage to the top n results:

from heapq import nlargest

def yield_hits(query, reference_db):
    for reference in reference_db:
        aln, score, _ = aligner(query, reference)
        yield [reference.metadata['id'], score, aln, reference.metadata['taxonomy']]

best_hits = nlargest(n, yield_hits(query, reference_db), key=lambda e: e[1])

If option 2 looks good I will submit a pull request.

https://github.com/caporaso-lab/An-Introduction-To-Applied-Bioinformatics/blob/e1e4beb1750b5be179470ee37fd42c55bce9f889/iab/algorithms/__init__.py#L633-L636

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant