-
Notifications
You must be signed in to change notification settings - Fork 648
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix bond deletion #4763
base: develop
Are you sure you want to change the base?
Fix bond deletion #4763
Conversation
Hello @lilyminium! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2024-10-26 02:20:35 UTC |
@@ -146,7 +146,7 @@ def parse(): | |||
struc = parse() | |||
|
|||
assert hasattr(struc, 'bonds') | |||
assert len(struc.bonds.values) == 4 | |||
assert len(struc.bonds.values) == 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These used to be bonds between [(0, 1), (0, 1), (1, 2), (1, 2)] -- so two bonds repeated. This change makes it so each bond is unique.
@@ -158,7 +158,7 @@ def parse(): | |||
with pytest.warns(UserWarning): | |||
struc = parse() | |||
assert hasattr(struc, 'bonds') | |||
assert len(struc.bonds.values) == 2 | |||
assert len(struc.bonds.values) == 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to above.
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## develop #4763 +/- ##
===========================================
- Coverage 93.59% 93.59% -0.01%
===========================================
Files 177 177
Lines 21708 21702 -6
Branches 3052 3049 -3
===========================================
- Hits 20318 20312 -6
Misses 943 943
Partials 447 447 ☔ View full report in Codecov by Sentry. |
@lilyminium this PR is marked as draft - is it meant to be up for review or is there more work that needs to be done? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this looks good to me - however there's one likely outcome of this "behaviour change" that might have a real impact.
Some software intentionally misuse the PDB format's CONECT records to represent different bond orders - PyMol is a good example of this iirc. In those cases, my understanding is that with this change you would lose that information?
As far as I'm aware, the PDB writer would lose that information even if you have duplicate bonds because we call a set on the bonds of all the atoms (which should squash out duplicates?). So there's probably a good case to be made that support for multi-conect PDBs is not something we want to do anyways.
I'm not particularly sure that I care about the behaviour change, but I do want to cc @MDAnalysis/coredevs (especially @orbeckst @yuxuanzhuang and @RMeli ) who might have some more immediate thoughts on the whole thing.
Failures aren't happening in other PRs - it looks like something that has been specifically introduced here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should try to revert to a set for the not in
comparison, or do some other kind of trick to avoid the performance loss.
for v, t, g, o in zip(values, types, guessed, order): | ||
if v not in existing: | ||
if v not in self.values: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is likely the culprit - doing a not in
operation on a list is orders of magnitude slower than on a set.
Example:
In [1]: import random
In [2]: lister = [random.random() for i in range(10000)]
In [3]: setter = set(lister)
In [4]: len(setter)
Out[4]: 10000
In [5]: len(lister)
Out[5]: 10000
In [6]: def notin(a):
...: for v in range(10000):
...: if v not in a:
...: pass
...: return
...:
In [7]: %timeit notin(lister)
734 ms ± 5.85 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [8]: %timeit notin(setter)
129 μs ± 166 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
Hmm. Yes that would be the case that multi-bonds can't be sketchily represented by digging into Topology internals, which downstream people may well be doing. I can think of two solutions to this in this PR:
|
Pushed a commit that removes the "addition" fixes and preserves previous behaviour (should be easy to revert if discussion goes the other way) |
Fixes #4762
Changes made in this Pull Request:
PR Checklist
Developers certificate of origin
📚 Documentation preview 📚: https://mdanalysis--4763.org.readthedocs.build/en/4763/