Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More Flexible Vasprun Parsing #4075

Open
kavanase opened this issue Sep 20, 2024 · 0 comments
Open

More Flexible Vasprun Parsing #4075

kavanase opened this issue Sep 20, 2024 · 0 comments
Labels
enhancement A new feature or improvement to an existing one

Comments

@kavanase
Copy link
Contributor

kavanase commented Sep 20, 2024

Feature Requested

Currently the Vasprun parser crashes (with ParseError) if the file is not complete. If one sets exception_on_bad_xml=False this can be avoided, but then most of the information is not parsed (even though it is still present in the file). This can occur fairly often when large VASP calculations complete the SCF cycles, but then crash at the last moment when writing long eigenvalue outputs or wavefunctions, due to memory or filespace issues etc.
It would be very useful if this information was still able to be pulled from the vasprun file, particularly in the case of large calculations where re-running the whole calculation just to get a properly-formatted output can be quite inefficient (e.g. hybrid+SOC singleshots on a large supercell which crashed at the last moment, with no wavefunction output).

Proposed Solution

This functionality should be achievable relatively easily, by smartly handling the XML elements which aren't fully complete.

As a rough demonstration of one possible approach, this code can be used to determine the current tag stack:

from xml.etree.ElementTree import iterparse

def validate_tags(file_path):
    tag_stack = []
    try:
        with open(file_path, 'r') as file:
            for event, elem in iterparse(file, events=("start", "end")):
                if event == "start":
                    tag_stack.append(elem.tag)
                elif event == "end":
                    if tag_stack and tag_stack[-1] == elem.tag:
                        tag_stack.pop()
                    else:
                        print(f"Mismatched tag found: {elem.tag}")
                        break

    except ET.ParseError as e:
        print(f"Parse error: {e}. Missing closing tag for {tag_stack[-1]} if stack is not empty.")
        if tag_stack:
            print(f"Current tag stack: {tag_stack}")

which in the example partially-complete vasprun.xml I've provided gives:

Parse error: no element found: line 9455, column 0. Missing closing tag for set if stack is not empty.
Current tag stack: ['modeling', 'calculation', 'eigenvalues', 'array', 'set', 'set', 'set']

If I then append these tags to a copy of the loaded file object, parsing can proceed without issue, loading all the information available in the (incomplete) vasprun.xml:

# open file and append closing tags for any missing ones:
# current tag stack: ['modeling', 'calculation', 'eigenvalues', 'array', 'set', 'set', 'set']
file_path = "vasprun.xml"
ionic_steps = []
with open(file_path, 'a+') as file:
    # TODO: This should be a temp file copy, so as not to modify file on system
    # append closing tags for any missing ones:
    file.writelines([
        "</set>\n</set>\n</set>\n</array>\n</eigenvalues>\n</calculation>\n</modeling>"
    ])
    
    file.seek(0)  # move the file pointer back to the beginning to read content
    
    for event, elem in ET.iterparse(file, events=["start", "end"]):
        tag = elem.tag
        if tag == "calculation":
            parsed_header = True
            ionic_steps.append(_parse_ionic_step(vr, elem))
    ...

There are presumably far smarter ways of doing this, this is just a rough example showing how it could be achieved.

This could be implemented only when the user uses the exception_on_bad_xml=False option, which already throws a warning if indeed the vasprun.xml is incomplete.

Example vasprun.xml where this is desirable:
vasprun.xml.gz

Relevant Information

Often this truncated output can result in truncated array outputs (e.g. of the eigenvalues/DOS), so it might require a quick check that all arrays are of the expected size, and if not they are dropped?

@kavanase kavanase added the enhancement A new feature or improvement to an existing one label Sep 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement A new feature or improvement to an existing one
Projects
None yet
Development

No branches or pull requests

1 participant