Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bond deletion #4763

Open
wants to merge 8 commits into
base: develop
Choose a base branch
from

Conversation

lilyminium
Copy link
Member

@lilyminium lilyminium commented Oct 25, 2024

Fixes #4762

Changes made in this Pull Request:

PR Checklist

  • Tests?
  • Docs?
  • CHANGELOG updated?
  • Issue raised/referenced?

Developers certificate of origin


📚 Documentation preview 📚: https://mdanalysis--4763.org.readthedocs.build/en/4763/

@pep8speaks
Copy link

pep8speaks commented Oct 25, 2024

Hello @lilyminium! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2024-10-26 04:28:34 UTC

@@ -146,7 +146,7 @@ def parse():
struc = parse()

assert hasattr(struc, 'bonds')
assert len(struc.bonds.values) == 4
assert len(struc.bonds.values) == 2
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These used to be bonds between [(0, 1), (0, 1), (1, 2), (1, 2)] -- so two bonds repeated. This change makes it so each bond is unique.

@@ -158,7 +158,7 @@ def parse():
with pytest.warns(UserWarning):
struc = parse()
assert hasattr(struc, 'bonds')
assert len(struc.bonds.values) == 2
assert len(struc.bonds.values) == 1
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to above.

Copy link

codecov bot commented Oct 25, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 93.57%. Comparing base (d7153b9) to head (fa68282).

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #4763      +/-   ##
===========================================
- Coverage    93.59%   93.57%   -0.03%     
===========================================
  Files          177      189      +12     
  Lines        21708    22774    +1066     
  Branches      3052     3052              
===========================================
+ Hits         20318    21311     +993     
- Misses         943     1016      +73     
  Partials       447      447              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@IAlibay
Copy link
Member

IAlibay commented Oct 25, 2024

@lilyminium this PR is marked as draft - is it meant to be up for review or is there more work that needs to be done?

Copy link
Member

@IAlibay IAlibay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks good to me - however there's one likely outcome of this "behaviour change" that might have a real impact.

Some software intentionally misuse the PDB format's CONECT records to represent different bond orders - PyMol is a good example of this iirc. In those cases, my understanding is that with this change you would lose that information?

As far as I'm aware, the PDB writer would lose that information even if you have duplicate bonds because we call a set on the bonds of all the atoms (which should squash out duplicates?). So there's probably a good case to be made that support for multi-conect PDBs is not something we want to do anyways.

I'm not particularly sure that I care about the behaviour change, but I do want to cc @MDAnalysis/coredevs (especially @orbeckst @yuxuanzhuang and @RMeli ) who might have some more immediate thoughts on the whole thing.

@IAlibay
Copy link
Member

IAlibay commented Oct 25, 2024

Failures aren't happening in other PRs - it looks like something that has been specifically introduced here.

Copy link
Member

@IAlibay IAlibay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should try to revert to a set for the not in comparison, or do some other kind of trick to avoid the performance loss.

for v, t, g, o in zip(values, types, guessed, order):
if v not in existing:
if v not in self.values:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is likely the culprit - doing a not in operation on a list is orders of magnitude slower than on a set.

Example:

In [1]: import random

In [2]: lister = [random.random() for i in range(10000)]

In [3]: setter = set(lister)

In [4]: len(setter)
Out[4]: 10000

In [5]: len(lister)
Out[5]: 10000

In [6]: def notin(a):
   ...:     for v in range(10000):
   ...:         if v not in a:
   ...:             pass
   ...:     return
   ...: 

In [7]: %timeit notin(lister)
734 ms ± 5.85 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [8]: %timeit notin(setter)
129 μs ± 166 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

@lilyminium
Copy link
Member Author

lilyminium commented Oct 26, 2024

Hmm. Yes that would be the case that multi-bonds can't be sketchily represented by digging into Topology internals, which downstream people may well be doing.

I can think of two solutions to this in this PR:

  • put that information in the order attribute (which TopologyObjects should have) -- a bit of work, may have time later today to push it through but not guaranteed. Also, even more changes in current behaviour
  • This PR fixes the deletion issue twice over. It first only counts identical bonds once when creating /adding to the topology object. It secondly also deletes all identical bonds when we try to delete. If we take out the first and just ensure that deletion works, we should still fix the original issue. This is a lot less work

@lilyminium
Copy link
Member Author

Pushed a commit that removes the "addition" fixes and preserves previous behaviour (should be easy to revert if discussion goes the other way)

@lilyminium
Copy link
Member Author

I'm leaning towards the simpler fix of just ensuring that all identical bonds are deleted when deletion is caused, as it's less disruptive to current behaviour, and potentially fixing the addition of identical bonds in a separate issue/PR later, so I've updated the changelog too to reflect that -- should still be easy to revert if there's any disagreement

@lilyminium lilyminium marked this pull request as ready for review October 26, 2024 04:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

delete_bonds not deleting all bonds
3 participants