Skip to content

Latest commit

 

History

History
26 lines (19 loc) · 2.65 KB

README.md

File metadata and controls

26 lines (19 loc) · 2.65 KB

Inferential Statistics with Python

A Talk Proposal for SciPy India 2017 (IIT Bombay)

This repository is a collection of Jupyter notebooks that contain code relevant to the proposed talk on Inferential Statistics with Python at the SciPy India Conference in November 2017.

Notebooks

The notebooks are as follows:

  1. descriptive_primer.ipynb: The Descriptive Statistics Notebook. Explains measures of central tendencies, measures of spread, the Binomial Distribution, the Normal Distribution, the Normalcy test, Z-Scores and P-Values.
  2. sampling.ipynb: The Sampling Notebook. Explains the Central Limit Theorem, Estimation of Proportion from a sample, Estimation of mean from a sample.
  3. hypothesis.ipynb: The Hypothesis Testing Notebook. Explains one sample and two sample significance tests, test for mean(s), test for proportion(s) and the Chi Square Significance Test.
  4. correlation.ipynb: The Correlation, Scatter Plot and Linear Regression notebook. Explains the aforementioned, heatmaps, pairplots and the scikit-learn implemention of a Linear Regressor.

Datasets

The following datasets have been used:

  1. 2016 Olympics in Rio de Janeiro: Athletes, medals, and events from summer games. Uploaded by Rio 2016 on Kaggle. Available at https://www.kaggle.com/rio2016/olympic-games#_=_
  2. Credit Card Fraud Detection: Anonymized credit card transactions labeled as fraudulent or genuine. Uploaded by Andrea on Kaggle. Available at https://www.kaggle.com/dalpozz/creditcardfraud
  3. Suicides in India: Sucides in each state is classified according to various parameters from 2001-12. Uploaded by Rajanand Illangovan on Kaggle. Available at https://www.kaggle.com/rajanand/suicides-in-india
  4. NBA Players Stats - 2014-2015: Points, Assists, Height, Weight and other personal details and stats. Uploaded by DrGuillermo on Kaggle. Available at https://www.kaggle.com/drgilermo/nba-players-stats-20142015
  5. Top 500 Indian Cities: What story do the top 500 cities of India tell to the world? Uploaded by Arijit Mukherjee on Kaggle. Available at https://www.kaggle.com/zed9941/top-500-indian-cities.
  6. Airbnb New User Bookings: Where will a new guest book their first travel experience? Uploaded by Airbnb on Kaggle. Available at https://www.kaggle.com/c/airbnb-recruiting-new-user-bookings/data.
  7. Racial Discrimination in the Job Market: Are Emily and Greg More Employable Than Lakisha and Jamal? Uploaded by the American Economic Association. Available at https://www.aeaweb.org/articles?id=10.1257/0002828042002561.
  8. The Iris Dataset: Classify iris plants into three species in this classic dataset. Available in the scikit-learn library.