Skip to content

swetakum/LA-Payroll-Data-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CSE544project

CSE 544 Probability and Statistics for Data Science.

Hypothesis testing is an essential procedure in statistics which is used to evaluate two mutually exclusive hypothesis about a data set to determine which hypothesis is best supported by the sample data.

We have picked the LA payroll data of government employees ranging from 2013 to 2016. We are interested in finding interesting statistical answers and insights from the data. Majorly, we are interested in following hypotheses:

  1. Annual pay and hourly pay doesn't increase over the years

  2. Work of non-risky departments are stagnant and is more likely to be forecasted, but the work for risky departments are very unpredictable and hence their salary distributions in the two halves of the year varies.

  3. Health Benefits follow same distribution over career ladders.

  4. Annual salaries can be predicted with very low error after required data pre-processing

Techniques used:

  1. Two sample t-test

  2. Wald's test

  3. Permutation test

  4. KS test

  5. Linear Regression

  6. Estimator

About

CSE 544 Probability and Statistics for Data Science.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published