Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keep all tags #50

Closed
wants to merge 6 commits into from
Closed

Conversation

ryder-cobean-nih
Copy link
Contributor

@ryder-cobean-nih ryder-cobean-nih commented Sep 17, 2024

#49

This pull request should cover a small change to rap_sitkCore that enables the user to retain tag values and tags themselves as seen when the file is loaded. rap_sitkCore's functions for converting color spaces and safely opening a file for use in processing pipelines are vital, but other tools perform the necessary anonymization. This enables the image to be opened with less tampering by the observer, if desired.

keep_all_tags defaults to FALSE so if omitted (as it would be in existing uses in other scripts), existing behavior remains the same.

usage:

from pathlib import Path
from rap_sitkCore import read_dcm

img = read_dcm(Path(path_to_dicom), keep_all_tags=TRUE)

defaults to FALSE for continuity with existing uses of the module. Enables user to bypass the default stripping of unsupported tags for use in applications in which other services perform anonymization.
Copy link
Collaborator

@blowekamp blowekamp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lower level _read_dcm_pydicom function also only copies the white list tags, partly because support for converting all value representations (CR) to the SITK string representation was not implemented.

Please let me know if you would like to continue to work on the PR, or if I should address it.

for tag_name in _keyword_to_copy:
key = keyword_to_gdcm_tag(tag_name)
if key in img:
out[key] = img[key]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After srgb2gray is run the output image img contains none of the original DICOM tags.

When `keep_all_tags' is true, all the tags need to be copies.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@blowekamp made the requested changes - please review? I think this should cover it in the lower level function.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When keep_all_tags is true, the tags still need to be copied from img to out.

else:
img[key] = str(float(de.value))
elif de.VR in ["CS", "UI"]:
img[key] = de.value
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two blocks are very similar but have different behavior. Is this intentional? Were there files that has issues that needed to be addressed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check out these changes, in latest commit:

def _get_string_representation(de: pydicom.dataelem.DataElement) -> str:
        """
        Get the string representation of the DICOM tag.
        
        Parameters:
        de (pydicom.dataelem.DataElement): The DICOM date element (a particular tag and its metadata).

        Returns:
        str: The string representation of the DICOM tag.
        """
        try:
            if de.value in [None, ""]:
                return ""
            elif de.VR == "DS":
                if de.VM > 1:
                    return convert_float_list_to_mv_ds(de.value)
                else:
                    return str(float(de.value))
            elif de.VR == "US":
                return str(int(de.value))
            else:
                return de.value
        except (TypeError, ValueError) as e:
            raise RuntimeError(
                f'"Error parsing data element "{de.name}" with value "{de.value}" '
                f'and value representation "{de.VR}". Error: {e}'
            )


def _read_dcm_pydicom(filename: Path, keep_all_tags: bool = False) -> sitk.Image:
    """
    Reading implementation with pydicom for DICOM
    """
    ds = pydicom.dcmread(filename)

    arr = ds.pixel_array

    if ds.PhotometricInterpretation == "MONOCHROME2":
        img = sitk.GetImageFromArray(arr, isVector=False)
    elif ds.PhotometricInterpretation == "MONOCHROME1":
        # only works with unsigned
        assert ds.PixelRepresentation == 0
        # use complement to invert the pixel intensity.
        img = sitk.GetImageFromArray(~arr, isVector=False)
    elif ds.PhotometricInterpretation in ["YBR_FULL_422", "YBR_FULL", "RGB"]:
        if ds.PhotometricInterpretation != "RGB":
            from pydicom.pixel_data_handlers.util import convert_color_space
            arr = convert_color_space(ds.pixel_array, ds.PhotometricInterpretation, "RGB")

        img = sitk.GetImageFromArray(arr, isVector=True)
    else:
        raise RuntimeError(f'Unsupported PhotometricInterpretation: "{ds.PhotometricInterpretation}"')

    # keep_all_tags is either all tags other than PixelData or the tags specified in 
    # _keyword_to_copy, provided they are present in the dataset
    if keep_all_tags:
        _keyword_to_copy = [elem.keyword for elem in ds if elem.keyword != "PixelData"]
    else:
        _keyword_to_copy = [keyword for keyword in _keyword_to_copy if keyword in ds]
    

    # iterate through all tags and copy the ones specified in _keyword_to_copy
    # to the SimpleITK image
    for tag in _keyword_to_copy:
        de = ds.data_element(tag)
        key = f"{de.tag.group:04x}|{de.tag.elem:04x}"
        img[key] = _get_string_representation(de)

    return img

introduce function _get_string_representation() which returns the string representation of a data element with proper error handling, and simplify definition of the _keyword_to_copy list based on whether keep_all_tags is True
Copy link
Collaborator

@blowekamp blowekamp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is also a need for testing. The cases when I have seen the tags not being copied should be triggering some test failures.

if keep_all_tags:
_keyword_to_copy = [elem.keyword for elem in ds if elem.keyword != "PixelData"]
else:
_keyword_to_copy = [keyword for keyword in _keyword_to_copy if keyword in ds]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this going to overwrite the file local _keyword_to_copy? Please use a different variable name.

for tag_name in _keyword_to_copy:
key = keyword_to_gdcm_tag(tag_name)
if key in img:
out[key] = img[key]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When keep_all_tags is true, the tags still need to be copied from img to out.

…n whether the `keep_all_tags` parameter is True. Remove overwrite of _keyword_to_copy, Also make sure to copy out keys in case that image is RGB and keep_all_tags is True
…han keywords (resolves edge case in which a keyword is missing in the image with the tag present). Resolved issue of reusing variable.

When keep_all_tags is true, the tags still need to be copied from img to out when calling srgb2gray. This is now resolved.
@blowekamp
Copy link
Collaborator

Updated work here: #53

@blowekamp blowekamp closed this Oct 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants