Skip to content

Beloved6883/ETL_MODULES

Repository files navigation

ETL_MODULES is a repository for modules containing classes for my extract, transform, and load (ETL) process. The goal is to create modules that can be used for a variety of ETL jobs.

Creator: Natasha P. Wilson, PhD

ETL_Modules

Project structure

.
├── README.md (this current file, which provides information about this repository)
├── data_cleaning_classes (this directory contains classes for data ingestion and loading)
│   ├── __init__.py
│   ├── __pycache__
│   │   ├── __init__.cpython-312.pyc
│   │   └── data_ingestion.cpython-312.pyc
│   └── data_ingestion.py (this Python file contains the class Ingestion, which imports multiple file types into pandas dataframes: CSVs, EXCEL, JSON, TEXT, ZIP)
├── name_finder.py (This Python file contains functions for finding the most frequent male and female names assigned to social security numbers between 1880 - 2023 by sex only and by sex and year of birth)
├── requirements.txt (contains all the python dependencies used in this repo)
├── social_security_data (this directory contains some data from the Social Security Administration retrieved from data.gov)
│   ├── DeathProbsE_Female_Hist_TR2014_1900_to_2010.numbers (not used for the project; just keeping in this directory)
│   ├── DeathProbsE_Females_Alt2_TR2014_2011_to_2090.numbers (not used for the project; just keeping in this directory)
│   ├── DeathProbsE_Males_Alt2_TR2014_2011_to_2090.numbers (not used for the project; just keeping in this directory)
│   ├── DeathProbsE_Males_Hist_TR2014_1900_to_2010.numbers (not used for the project; just keeping in this directory)
│   ├── NCHS_-_Death_rates_and_life_expectancy_at_birth.numbers (not used for the project; just keeping in this directory)
│   ├── NationalReadMe.pdf (information about the dataset from the Social Security Administration)
│   ├── names (this directory contains the 144 text files 
containing the first names, sex, and frequency of the name from the Social Security Administration from years 1880 - 2023. Each file name is: yobYYYY.txt)
│   │   ├── yob1880.txt
│   │   ├── yob1881.txt
│   │   ├── yob1882.txt
│   │   ├── yob1883.txt
│   │   ├── yob1884.txt
│   │   ├── yob1885.txt
│   │   ├── yob1886.txt
│   │   ├── yob1887.txt
│   │   ├── yob1888.txt
│   │   ├── yob1889.txt
│   │   ├── yob1890.txt
│   │   ├── yob1891.txt
│   │   ├── yob1892.txt
│   │   ├── yob1893.txt
│   │   ├── yob1894.txt
│   │   ├── yob1895.txt
│   │   ├── yob1896.txt
│   │   ├── yob1897.txt
│   │   ├── yob1898.txt
│   │   ├── yob1899.txt
│   │   ├── yob1900.txt
│   │   ├── yob1901.txt
│   │   ├── yob1902.txt
│   │   ├── yob1903.txt
│   │   ├── yob1904.txt
│   │   ├── yob1905.txt
│   │   ├── yob1906.txt
│   │   ├── yob1907.txt
│   │   ├── yob1908.txt
│   │   ├── yob1909.txt
│   │   ├── yob1910.txt
│   │   ├── yob1911.txt
│   │   ├── yob1912.txt
│   │   ├── yob1913.txt
│   │   ├── yob1914.txt
│   │   ├── yob1915.txt
│   │   ├── yob1916.txt
│   │   ├── yob1917.txt
│   │   ├── yob1918.txt
│   │   ├── yob1919.txt
│   │   ├── yob1920.txt
│   │   ├── yob1921.txt
│   │   ├── yob1922.txt
│   │   ├── yob1923.txt
│   │   ├── yob1924.txt
│   │   ├── yob1925.txt
│   │   ├── yob1926.txt
│   │   ├── yob1927.txt
│   │   ├── yob1928.txt
│   │   ├── yob1929.txt
│   │   ├── yob1930.txt
│   │   ├── yob1931.txt
│   │   ├── yob1932.txt
│   │   ├── yob1933.txt
│   │   ├── yob1934.txt
│   │   ├── yob1935.txt
│   │   ├── yob1936.txt
│   │   ├── yob1937.txt
│   │   ├── yob1938.txt
│   │   ├── yob1939.txt
│   │   ├── yob1940.txt
│   │   ├── yob1941.txt
│   │   ├── yob1942.txt
│   │   ├── yob1943.txt
│   │   ├── yob1944.txt
│   │   ├── yob1945.txt
│   │   ├── yob1946.txt
│   │   ├── yob1947.txt
│   │   ├── yob1948.txt
│   │   ├── yob1949.txt
│   │   ├── yob1950.txt
│   │   ├── yob1951.txt
│   │   ├── yob1952.txt
│   │   ├── yob1953.txt
│   │   ├── yob1954.txt
│   │   ├── yob1955.txt
│   │   ├── yob1956.txt
│   │   ├── yob1957.txt
│   │   ├── yob1958.txt
│   │   ├── yob1959.txt
│   │   ├── yob1960.txt
│   │   ├── yob1961.txt
│   │   ├── yob1962.txt
│   │   ├── yob1963.txt
│   │   ├── yob1964.txt
│   │   ├── yob1965.txt
│   │   ├── yob1966.txt
│   │   ├── yob1967.txt
│   │   ├── yob1968.txt
│   │   ├── yob1969.txt
│   │   ├── yob1970.txt
│   │   ├── yob1971.txt
│   │   ├── yob1972.txt
│   │   ├── yob1973.txt
│   │   ├── yob1974.txt
│   │   ├── yob1975.txt
│   │   ├── yob1976.txt
│   │   ├── yob1977.txt
│   │   ├── yob1978.txt
│   │   ├── yob1979.txt
│   │   ├── yob1980.txt
│   │   ├── yob1981.txt
│   │   ├── yob1982.txt
│   │   ├── yob1983.txt
│   │   ├── yob1984.txt
│   │   ├── yob1985.txt
│   │   ├── yob1986.txt
│   │   ├── yob1987.txt
│   │   ├── yob1988.txt
│   │   ├── yob1989.txt
│   │   ├── yob1990.txt
│   │   ├── yob1991.txt
│   │   ├── yob1992.txt
│   │   ├── yob1993.txt
│   │   ├── yob1994.txt
│   │   ├── yob1995.txt
│   │   ├── yob1996.txt
│   │   ├── yob1997.txt
│   │   ├── yob1998.txt
│   │   ├── yob1999.txt
│   │   ├── yob2000.txt
│   │   ├── yob2001.txt
│   │   ├── yob2002.txt
│   │   ├── yob2003.txt
│   │   ├── yob2004.txt
│   │   ├── yob2005.txt
│   │   ├── yob2006.txt
│   │   ├── yob2007.txt
│   │   ├── yob2008.txt
│   │   ├── yob2009.txt
│   │   ├── yob2010.txt
│   │   ├── yob2011.txt
│   │   ├── yob2012.txt
│   │   ├── yob2013.txt
│   │   ├── yob2014.txt
│   │   ├── yob2015.txt
│   │   ├── yob2016.txt
│   │   ├── yob2017.txt
│   │   ├── yob2018.txt
│   │   ├── yob2019.txt
│   │   ├── yob2020.txt
│   │   ├── yob2021.txt
│   │   ├── yob2022.txt
│   │   └── yob2023.txt
│   ├── names_ranked_by_sex.csv (this csv contains names ranked by sex)
│   ├── names_ranked_by_year_sex.csv (this csv contains names ranked by sex and year)
│   ├── test_names.csv (this is a test file)
│   ├── top_female_names_by_year_sex.csv (this csv contains the top female names assigned to social security numbers each year)
│   └── top_male_names_by_year_sex.csv (this csv contains the top male names assigned to social security numbers each year)
├── test_data (this directory contains any data used to test functionality of the classes or functions)
│   └── test_excel.xlsx (a dummy data set for testing)
└── transform_social_security_names.py (this Ptyhon file utilizes the data_ingestion module, Ingestion Class, to ingestion 144 files containing first name, gender, and frequency of the names from the Social Security Administration from the years 1880 - 2023. This is for a particular project.)

Contributions: This repository is not currently taking contributions since this is for personal use. If you are interested in contributing to this repository, reach out to Natasha at natasha@originsconvo.com with your proposed contribution.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published