This repository is a collection of Jupyter notebooks that contain code relevant to the proposed talk on Inferential Statistics with Python at the SciPy India Conference in November 2017.
The notebooks are as follows:
- descriptive_primer.ipynb: The Descriptive Statistics Notebook. Explains measures of central tendencies, measures of spread, the Binomial Distribution, the Normal Distribution, the Normalcy test, Z-Scores and P-Values.
- sampling.ipynb: The Sampling Notebook. Explains the Central Limit Theorem, Estimation of Proportion from a sample, Estimation of mean from a sample.
- hypothesis.ipynb: The Hypothesis Testing Notebook. Explains one sample and two sample significance tests, test for mean(s), test for proportion(s) and the Chi Square Significance Test.
- correlation.ipynb: The Correlation, Scatter Plot and Linear Regression notebook. Explains the aforementioned, heatmaps, pairplots and the scikit-learn implemention of a Linear Regressor.
The following datasets have been used:
- 2016 Olympics in Rio de Janeiro: Athletes, medals, and events from summer games. Uploaded by Rio 2016 on Kaggle. Available at https://www.kaggle.com/rio2016/olympic-games#_=_
- Credit Card Fraud Detection: Anonymized credit card transactions labeled as fraudulent or genuine. Uploaded by Andrea on Kaggle. Available at https://www.kaggle.com/dalpozz/creditcardfraud
- Suicides in India: Sucides in each state is classified according to various parameters from 2001-12. Uploaded by Rajanand Illangovan on Kaggle. Available at https://www.kaggle.com/rajanand/suicides-in-india
- NBA Players Stats - 2014-2015: Points, Assists, Height, Weight and other personal details and stats. Uploaded by DrGuillermo on Kaggle. Available at https://www.kaggle.com/drgilermo/nba-players-stats-20142015
- Top 500 Indian Cities: What story do the top 500 cities of India tell to the world? Uploaded by Arijit Mukherjee on Kaggle. Available at https://www.kaggle.com/zed9941/top-500-indian-cities.
- Airbnb New User Bookings: Where will a new guest book their first travel experience? Uploaded by Airbnb on Kaggle. Available at https://www.kaggle.com/c/airbnb-recruiting-new-user-bookings/data.
- Racial Discrimination in the Job Market: Are Emily and Greg More Employable Than Lakisha and Jamal? Uploaded by the American Economic Association. Available at https://www.aeaweb.org/articles?id=10.1257/0002828042002561.
- The Iris Dataset: Classify iris plants into three species in this classic dataset. Available in the scikit-learn library.