-
Notifications
You must be signed in to change notification settings - Fork 0
/
program.qmd
329 lines (279 loc) · 10.5 KB
/
program.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
---
title: "Happiness City Index"
author: "Daniel Gillis"
format:
html:
self-contained: true
page-layout: full
title-block-banner: true
toc: true
toc-depth: 3
toc-location: body
number-sections: false
html-math-method: katex
code-fold: true
code-summary: "Show the code"
code-overflow: wrap
code-copy: hover
code-tools:
source: false
toggle: true
caption: See code
execute:
warning: false
---
```{python}
import pandas as pd
import plotly_express as px
# Get data for the project
data_path = "archive/train.csv"
city_data = pd.read_csv(data_path)
df = city_data[city_data['Year'] == 2024].drop_duplicates('City', keep='last')
```
<!-- ---IDEA GENERATION CHARTS START--- -->
<!-- ```{python}
# Cities over Happiness
chart = (px.bar(
df.query('Year <= 2024')
.sort_values(['Happiness_Score']),
x='City',
y='Happiness_Score',
color='City',
title='Cities over happiness')
)
chart.show()
```
```{python}
# Decibel Level over Happiness
chart = (px.scatter(
df.query('Year <= 2024'),
x='Decibel_Level',
y='Happiness_Score',
color='City',
title='Decibel Level over Happiness')
)
chart.show()
```
```{python}
# Traffic Density over Happiness
chart = (px.scatter(
df.query('Year <= 2024'),
x='Traffic_Density',
y='Happiness_Score',
color='City',
title='Traffic Density over Happiness')
)
chart.show()
```
```{python}
# Green Space over Happiness
chart = (px.scatter(
df.query('Year <= 2024'),
x='Green_Space_Area',
y='Happiness_Score',
color='City',
title='Green Space over Happiness')
)
chart.show()
```
<!-- There seems to be a negative correlation between air quality and happiness. Good air quality probably doesn't cause unhappiness, but there might be some other factor not shown, like high amounts of polluting industries that increase quality of life. -->
<!-- ```{python}
# Air Quality over Happiness
chart = (px.scatter(
df.query('Year <= 2024'),
x='Air_Quality_Index',
y='Happiness_Score',
color='City',
title='Air Quality over Happiness')
)
chart.show()
```
```{python}
# Cost of Living over Happiness
chart = (px.scatter(
df.query('Year <= 2024'),
x='Cost_of_Living_Index',
y='Happiness_Score',
color='City',
title='Cost of Living over Happiness')
)
chart.show()
```
```{python}
# Healthcare over Happiness
chart = (px.scatter(
df.query('Year <= 2024'),
x='Healthcare_Index',
y='Happiness_Score',
color='City',
title='Healthcare over Happiness')
)
chart.show()
```
<!-- ---IDEA GENERATION CHARTS END--- -->
## What Makes a City Happy?
To answer this question, we will look at a [city happiness index by Emirhan Bulut](https://www.kaggle.com/datasets/emirhanai/city-happiness-index-2024), which includes information such as air quality, cost of living, and amount of green space in a city. The data used in this webpage is specifically pulling data from the latest available date in 2024. The dataset includes predictions for future years, but these are not included in this page's data.
[Click here](#data-description) for a description of the data used on this page.
## Question 1
* What does a general analysis of the factors related to a city's happiness tell us?
When looking at the differing factors that could play a role in a city's happiness, one of the most direct correlations is quality of healthcare. This graph suggests that affordable healthcare services plays a large role in people's mental wellbeing.
```{python}
# Healthcare over Happiness
chart = (px.scatter(
df,
x='Healthcare_Index',
y='Happiness_Score',
title='Healthcare Quality over Happiness Score',
hover_data='City')
)
chart.show()
```
There is also a correlation between happiness scores and low levels of noise in a city. This makes sense, who would want to live in a city where there is constant honking and loud traffic?
```{python}
# Decibel Level over Happiness
chart = (px.scatter(
df,
x='Decibel_Level',
y='Happiness_Score',
title='Noise Level over Happiness Score',
hover_data='City')
)
chart.show()
```
The percentage of green spaces within a city also correlate to happiness... up to a point. Though cities with an average amount of green space do better than those with little green space, they also do better than cities with lots of green space. One possible explanation is that these cities with many green spaces have an underlying factor like lack of infrastructure that is causing both green space and unhappiness.
```{python}
# Green Space Area over Happiness
chart = (px.scatter(
df,
x='Green_Space_Area',
y='Happiness_Score',
title='Green Space Area over Happiness Score',
hover_data='City')
)
chart.show()
```
An interesting finding in the next chart is that air quality is inversely correlated with happiness. Perhaps this is due to highly-polluting factories that produce useful goods, making people happier.
```{python}
# Air Quality over Happiness Score
chart = (px.scatter(
df,
x='Air_Quality_Index',
y='Happiness_Score',
title='Air Quality over Happiness Score',
hover_data='City')
)
chart.show()
```
Observing what is *not* correlated to happiness is just as important as looking at what is correlated. A surprising finding is that there is no strong correlation between Cost of Living and Happiness Scores. Happy cities with both high and low cost of living are found in this dataset.
```{python}
# Cost of Living over Happiness Score
chart = (px.scatter(
df,
x='Cost_of_Living_Index',
y='Happiness_Score',
title='Cost of Living over Happiness Score',
hover_data='City')
)
chart.show()
```
These findings give us a basic understanding of the variables at work in this dataset.
## Question 2
* What are the most significant factors affecting a city's overall happiness?
Based on the current findings, quality of healthcare seems to be the most important variable when considering a city's overall happiness. Let's take a look at the healthcare graph again...
```{python}
# Healthcare over Happiness
chart = (px.scatter(
df,
x='Healthcare_Index',
y='Happiness_Score',
title='Healthcare Quality over Happiness Score',
hover_data='City')
)
chart.show()
```
As we can see, some cities seem to have high healthcare scores, but are relatively unhappy. Let's give these cities a distinct color to keep track of them:
```{python}
# Assign distinct variable to distinguish outlying cities.
df['Healthcare_Outlier'] = ((df['Healthcare_Index'] >= 90) & (df['Happiness_Score'] < 6.5))
# Healthcare over Happiness
chart = (px.scatter(
df,
x='Healthcare_Index',
y='Happiness_Score',
title='Healthcare Quality over Happiness Score',
hover_data='City',
color='Healthcare_Outlier')
)
chart.show()
```
Do any of the other variables offer a possible explanation? As it turns out, the graph of green space in a city we saw before hints at a possible answer.
```{python}
# Green Space Area over Happiness w/ outliers
chart = (px.scatter(
df,
x='Green_Space_Area',
y='Happiness_Score',
title='Green Space Area over Happiness',
hover_data='City',
color='Healthcare_Outlier')
)
chart.show()
```
Most of these outliers are on the "too much green" end of the spectrum. Earlier, we had hypothesized these cities are lacking infrastructure that citizens depend on. This is a possible reason why their happiness scores are so low.
Taking this into account, we can create a chart that shows the impact of both healthcare and green space. The happiest cities share two qualities: high-end healthcare and sitting in a kind of 'goldilocks zone' of green space--not too much, not too little.
```{python}
# Green Space Area over Healthcare and Happiness
chart = (px.scatter(
df,
x='Green_Space_Area',
y='Healthcare_Index',
title='Healthcare and Green Space both affect Happiness',
hover_data='City',
color='Happiness_Score')
)
chart.show()
```
Now we have a better theory of what makes a city happy: Quality healthcare with sufficient urban areas to support its citizens. Cities that acheive this are showing high levels of happiness.
Let's take a look at a variable we considered earlier: Noise Level. There is an inverse correlation between a city's happiness score and noise level, as shown below:
```{python}
# Decibel Level over Happiness
chart = (px.scatter(
df,
x='Decibel_Level',
y='Happiness_Score',
title='Noise Level over Happiness Score',
hover_data='City')
)
chart.show()
```
If we add the Healthcare Index as a variable, we see an unsurprising trend. Cities that are both Healthy and quiet are happier (except for the unhappy outliers we examined previously).
```{python}
# Decibel Level over Healthcare
chart = (px.scatter(
df,
x='Decibel_Level',
y='Healthcare_Index',
title='Healthcare Quality and Noise Level over Happiness Score',
hover_data='City',
color='Happiness_Score')
)
chart.show()
```
Now we can refine our theory to include three general factors that contribute to cities' happiness:
* High quality of healthcare.
* Middling green space area.
* Low noise level.
Where these three conditions are present, the happiness score is likely to be maximized.
## Data Description
Below are the descriptions of the variables provided in this dataset. The dataset can be found at: https://www.kaggle.com/datasets/emirhanai/city-happiness-index-2024
City: Name of the city.
Month: The month in which the data is recorded.
Year: The year in which the data is recorded. (For the purposes of this analysis, we will only be looking at the year 2024.)
Decibel_Level: Average noise levels in decibels, indicating the auditory comfort of the citizens.
Traffic_Density: Level of traffic density (Low, Medium, High, Very High), which might impact citizens' daily commute and stress levels.
Green_Space_Area: Percentage of green spaces in the city, positively contributing to the mental well-being and relaxation of the inhabitants.
Air_Quality_Index: Index measuring the quality of air, a crucial aspect affecting citizens' health and overall satisfaction.
Happiness_Score: The average happiness score of the city (on a 1-10 scale), representing the subjective well-being of the population.
Cost_of_Living_Index: Index measuring the cost of living in the city (relative to a reference city), which could impact the financial satisfaction of the citizens.
Healthcare_Index: Index measuring the quality of healthcare in the city, an essential component of the population's well-being and contentment.
[Back to Introduction](#what-makes-a-city-happy)