Skip to content

Commit

Permalink
merge with dev
Browse files Browse the repository at this point in the history
Signed-off-by: Isaac Milarsky <imilarsky@gmail.com>
  • Loading branch information
IsaacMilarky committed Aug 1, 2024
2 parents 4b2911d + 6c475ca commit 3ea875e
Show file tree
Hide file tree
Showing 4 changed files with 30 additions and 2 deletions.
8 changes: 8 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -170,6 +170,14 @@ the American public, but you are also welcome to submit anonymously.

For more information about our Security, Vulnerability, and Responsible Disclosure Policies, see [SECURITY.md](SECURITY.md).

### Software Bill of Materials (SBOM)

A Software Bill of Materials (SBOM) is a formal record containing the details and supply chain relationships of various components used in building software.

In the spirit of [Executive Order 14028 - Improving the Nation’s Cyber Security](https://www.gsa.gov/technology/it-contract-vehicles-and-purchasing-programs/information-technology-category/it-security/executive-order-14028), a SBOM for this repository is provided here: https://github.com/DSACMS/dedupliFHIR/network/dependencies.

For more information and resources about SBOMs, visit: https://www.cisa.gov/sbom.

## Public domain

This project is in the public domain within the United States, and copyright and related rights in the work worldwide are waived through the [CC0 1.0 Universal public domain dedication](https://creativecommons.org/publicdomain/zero/1.0/) as indicated in our [LICENSE](LICENSE).
Expand Down
10 changes: 9 additions & 1 deletion cli/deduplifhirLib/normalization.py
100644 → 100755
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
import re
from dateutil import parser as date_parser
from dateutil.parser import ParserError
from text_to_num import alpha2digit

NAME_ABBREVIATION_SYMBOLS = {
' jr ': 'junior',
Expand Down Expand Up @@ -307,6 +308,10 @@ def normalize_addr_text(input_text):
"""
text_copy = input_text
#text_copy = british_to_american(text_copy) not needed
try:
text_copy = alpha2digit(text_copy,"en")
except ValueError:
...
text_copy = remove_non_alphanum(text_copy)
print(text_copy)
text_copy = replace_abbreviations(text_copy.lower())
Expand All @@ -318,9 +323,12 @@ def normalize_addr_text(input_text):
NAME_TEXT = "Greene,Jacquleine"
print(normalize_name_text(NAME_TEXT))

PLACE_TEXT = "7805 Kartina Motorawy Apt. 313,Taylorstad,New Hampshire"
PLACE_TEXT = "7805 Kartina Motorawy Apt. three hundred thirteen ,Taylorstad,New Hampshire"

print(normalize_addr_text(PLACE_TEXT))

DATE_TEXT = "December 10, 1999"
print(normalize_date_text(DATE_TEXT))

NUM_TEXT = "I have one hundred twenty three apples and forty-five oranges. Valetnine"
print(alpha2digit(NUM_TEXT,'en'))
13 changes: 12 additions & 1 deletion poetry.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ openpyxl = "^3.1.2"
lxml = "^5.2.2"
pyarrow = "^16.1.0"
python-dateutil = "^2.9.0.post0"
text2num = "^2.5.1"

[tool.poetry.dev-dependencies]
black = "^24.4.2"
Expand Down

0 comments on commit 3ea875e

Please sign in to comment.