This respository contains a python file that runs from the command line.
It takes as input a regional OpenAddresses zip file, you have to point it at the folder \us, and a .csv file which it will write data into.
For example:
python OpenAddresses_metrics.py openaddr-collected-us_northeast/us/ data.csv
It then writes 10 fields into that file:
- State: The state abbreviation or file name
- Total Rows: The number of rows in the file
- Good: The number of good rows in the file. Good rows are defined as those where the lat, lon, number, and street fields are not blank, there are no quotation marks, the number field has at least one digit, the number field is not 0 or a negative number, and the row is not field descriptors.
- City: The number of good rows in the file with a city.
- Zip: The number of good rows in the file with a zip code.
- Both: The number of good rows in the file with both a city and zip code.
- Parsing: The number of rows in the file with a quotation mark as a proxy for parsing problems.
- 'PO': The number of rows in the file with no digits in the number field as a proxy for parsing and data problems.
- '-9s': The number of rows in the file with a negative number in the number field
- Missing Fields: The number of rows in the file with fewer than 9 fields.
- Bad Zip: The number of rows in the file that have a zipcode with fewer than five digits.
This file also has 1 optional output: Summary ('-s', '--summary'). If the summary flag is turned on the file takes another file as input, where the summary data will be written. It then returns:
- The number of good rows in a statewide file
- The number of good rows in other files
- The number of good rows with zips in either statewide or other rows, choosing the one with the most good rows with zips
- The number of good rows with cities in either statewide or other rows, choosing the one with the most good rows with cities
- The number of good rows with zips and cities in either statewide or other rows, choosing the one with the most good rows with zips and cities