-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Guessing atom attributes based on atom names can lead to misinterpretations #452
Comments
I guess I shouldn't have used the bug issue... and now I can't seem to edit labels to mark it as a feature request. oops |
@tubiana you may want to have some input on this - as you use a lot of martini files |
Thanks for adding the label, Brady. To add a few examples: A calcium ion might have the atom names of CA, CAL, or CA2+, such as in a calmodulin (e.g. 1CKK). Without defined element information in the PDB file, the biotite guesser (here) would assign that atom to be a carbon atom. This happens for a number of other non-carbon atom names. Assuming a structure with cadmium present would label the cadmium atom as CD, this naming is ambiguous since CD is also used for the delta carbon in amino acid sidechains. A similar assignment would happen from biotite for coarse-grained particles that begin with the letter If a trajectory is loaded into Blender via MN, then MDAnalysis is used, which has its own method of guessing missing information from atom names (https://github.com/MDAnalysis/mdanalysis/blob/0582265996b392da382f658b7f0805ca250e1233/package/MDAnalysis/topology/guessers.py#L184-L230 and other functions therein). The guesser is a bit better because MDA has a dictionary that acknowledges a few instances of ambiguousness by mapping atom names to element symbols (https://github.com/MDAnalysis/mdanalysis/blob/0582265996b392da382f658b7f0805ca250e1233/package/MDAnalysis/topology/tables.py#L81-L173). Would it be worth implementing a single guesser approach rather two separate approaches depending on how files are loaded into Blender? Neither biotite nor MDA guessers require using the objects associated with those modules. They take in strings and return strings. So, for example, when biotite is used to load PDB/CIF files, it will run its own guesser within the In regards to the logging of guessing instances, I've found the Console Window for Blender where warnings/logging could be printed. I'm not certain if this is a commonly used though. Maybe a verbosity level could be set in MN preferences that creates a log file to check when structures are loaded in and manipulated via MN nodes/functions? Just an idea. |
Depending on the files being read into blender to instantiate a Molecule object, various atom attributes are likely to be missing that the MN Molecule object is expecting to have. For example, most if not all input file types accepted by MN will lack information about the vDW radii of atoms in the system. Since this and other attributes are used in MN visualization nodes, they are expected attributes that must be included upon creation of the Molecule object. But, when information about atoms is missing, then guesses are made. And, since there is a massive amount of diversity in atom naming and in formatting of the various input file types, the guesses can lead to incorrect atom attributes being assigned.
For example, a PDB file may or may not include element strings in the 76-78 column position. If the element information is not present, then that information is guessed at during the
biotite.structure.io.pdb.get_structure()
call:MolecularNodes/molecularnodes/io/parse/pdb.py
Lines 20 to 28 in 0a9d54a
atomic_number
,vdw_radii
, andmass
attributes are tied to the element assignment given by biotite. If the wrong element is guessed, then those attributes will be assigned incorrect values.When the MD import method is used, MDAnalysis is used to parse the input files and create the Molecule object instead of biotite. As above, if information is missing from the input files, then MDA has its own suite of guesser functions that are used to fill in missing information: https://docs.mdanalysis.org/2.7.0/documentation_pages/topology/guessers.html#module-MDAnalysis.topology.guessers. There are numerous issues (ex: MDAnalysis/mdanalysis#3704) and a recent pull request (MDAnalysis/mdanalysis#3753) for improving the guesser functions. The PR is expected to be included in the upcoming (soonTM) MDA 3.0 version.
All of this is to highlight that guesses are being made for attributes that propagate into visualizations. The use of guessers is a nuanced and complex cheminformatics problem made harder by a plethora of naming conventions, force fields, and coarse-grained models.
I definitely don't think its on MolecularNodes to solve this issue. But, potential improvements could be made in logging/reporting instances where guesses are made or suspected to have been made, so that users can check the results for themselves. Is there a message interface or log file that MN users can look at to see warning messages?
For example, biotite has implemented a warning
that reports the number of instances where element guesses were made. I'm not sure where this print statement would be visible when loading a structure using MN. And since element guesses propagate to
atomic_number
,vdw_radii
, andmass
attributes in MN, there's the opportunity to warn users that guesses may have affected those attributes.I think this is low priority but important to highlight, especially since incorrect guesses can propagate to incorrect visualizations.
The text was updated successfully, but these errors were encountered: