Merge pull request #28 from carpentries-incubator/manuel/review

Improvements to Lessons 1 to 12
carpentries-incubator · Feb 18, 2024 · 49d967f · 49d967f
2 parents ab68b89 + f168602
commit 49d967f
Show file tree

Hide file tree

Showing 8 changed files with 277 additions and 111 deletions.
diff --git a/episodes/01-intro-to-r.Rmd b/episodes/01-intro-to-r.Rmd
@@ -148,8 +148,8 @@ Each of the modes o interactions has its advantages and drawbacks.
 
 |        | Console | R script|
 |--------|---------|---------|
-|**Pros**|Immediate results|Work lost once you close RStudio |
-|**Cons**|Complete record of your work |Messy if you just want to print things out|
+|**Pros**|Immediate results| Complete record of your work |
+|**Cons**| Work lost once you close RStudio  | Messy if you just want to print things out|
 
 
 
@@ -312,7 +312,7 @@ In the script, we will write:
 ```{r download-files}
 # Download the data
 download.file('https://bit.ly/geospatial_data', 
-              here('episodes', 'data','gapminder_data.csv'))
+              here('data','gapminder_data.csv'))
 
 ```
 

diff --git a/episodes/02-data-structures.Rmd b/episodes/02-data-structures.Rmd
@@ -65,13 +65,16 @@ You can create a vector with a `c()` function.
 
 ```{r vectors}
 
-numeric_vector <- c(2, 6, 3) # vector of numbers - numeric data type.
+# vector of numbers - numeric data type.
+numeric_vector <- c(2, 6, 3) 
 numeric_vector
 
-character_vector <- c('banana', 'apple', 'orange') # vector of words - or strings of characters- character data type
+# vector of words - or strings of characters- character data type
+character_vector <- c('banana', 'apple', 'orange') 
 character_vector
 
-logical_vector <- c(TRUE, FALSE, TRUE) # vector of logical values (is something true or false?)- logical data type.
+# vector of logical values (is something true or false?)- logical data type.
+logical_vector <- c(TRUE, FALSE, TRUE) 
 logical_vector
 
 ```
@@ -121,7 +124,9 @@ First, let's try to calculate mean for the values in this vector
 ```{r remove-na1}
 mean(with_na) # mean() function cannot interpret the missing values
 
-mean(with_na, na.rm = T) # You can add the argument na.rm=TRUE to calculate the result while ignoring the missing values.
+# You can add the argument na.rm=TRUE to calculate the result while
+# ignoring the missing values.
+mean(with_na, na.rm = T) 
 ```
 
 However, sometimes, you would like to have the `NA` 
@@ -130,9 +135,11 @@ For this you need to identify which elements of the vector hold missing values
 with `is.na()` function. 
 
 ```{r remove-na2}
-is.na(with_na) #  This will produce a vector of logical values, stating if a statement 'This element of the vector is a missing value' is true or not
+is.na(with_na) # This will produce a vector of logical values, 
+# stating if a statement 'This element of the vector is a missing value'
+# is true or not
 
-!is.na(with_na) # # The ! operator means negation ,i.e. not is.na(with_na)
+!is.na(with_na) # The ! operator means negation, i.e. not is.na(with_na)
 
 ```
 
@@ -142,7 +149,8 @@ Sub-setting in `R` is done with square brackets`[ ]`.
 
 ```{r remove-na3}
 
-without_na <- with_na[ !is.na(with_na) ] # this notation will return only the elements that have TRUE on their respective positions
+without_na <- with_na[ !is.na(with_na) ] # this notation will return only
+# the elements that have TRUE on their respective positions
 
 without_na
 
@@ -170,7 +178,8 @@ known as levels.
 nordic_str <- c('Norway', 'Sweden', 'Norway', 'Denmark', 'Sweden')
 nordic_str # regular character vectors printed out
 
-nordic_cat <- factor(nordic_str) # factor() function converts a vector to factor data type
+# factor() function converts a vector to factor data type
+nordic_cat <- factor(nordic_str)
 nordic_cat # With factors, R prints out additional information - 'Levels'
 
 ```
@@ -201,8 +210,14 @@ displayed in a plot or which category is taken as a baseline in a statistical mo
 You can reorder the categories using `factor()` function. This can be useful, for instance, to select a reference category (first level) in a regression model or for ordering legend items in a plot, rather than using the default category systematically (i.e. based on alphabetical order).
 
 ```{r factor-reorder1}
-nordic_cat <- factor(nordic_cat, levels = c('Norway' , 'Denmark', 'Sweden'))  # now Norway should be the first category, Denmark second and Sweden third
-
+nordic_cat <- factor(
+  nordic_cat, levels = c(
+    'Norway', 
+    'Denmark', 
+    'Sweden'
+  )) 
+
+# now Norway will be the first category, Denmark second and Sweden third
 nordic_cat
 ```
 
@@ -212,7 +227,15 @@ There is more than one way to reorder factors. Later in the lesson,
 we will use `fct_relevel()` function from `forcats` package to do the reordering.
 
 ```{r factor-reorder2}
-# nordic_cat <- fct_relevel(nordic_cat, 'Norway' , 'Denmark', 'Sweden') # now Norway should be the first category, Denmark second and Sweden third
+library(forcats)
+
+nordic_cat <- fct_relevel(
+  nordic_cat, 
+  'Norway' , 
+  'Denmark', 
+  'Sweden'
+  ) # With this, Norway will be  first category, 
+    # Denmark second and Sweden third
 
 nordic_cat
 ```
@@ -239,8 +262,14 @@ outside of this set, it will become an unknown/missing value detonated by
 
 ```{r factor-missing-level}
 nordic_str
-nordic_cat2 <- factor(nordic_str, levels = c('Norway', 'Denmark'))
-nordic_cat2 # since we have not included Sweden in the list of factor levels, it has become NA.
+nordic_cat2 <- factor(
+  nordic_str, 
+  levels = c('Norway', 'Denmark')
+  )
+
+# because we did not include Sweden in the list of 
+# factor levels, it has become NA.
+nordic_cat2 
 ```
 ::::::::::::::::::::::::::::::::::::::::::::::::::::
 

diff --git a/episodes/03-explore-data.Rmd b/episodes/03-explore-data.Rmd
@@ -59,7 +59,7 @@ Because columns are vectors, each column must contain a **single type of data**
 For example, here is a figure depicting a data frame comprising a numeric, a character, and a logical vector.
 
 ![](fig/data-frame.svg)
-<br><font size="3">*Source*:[Data Carpentry R for Social Scientists ](https://datacarpentry.org/r-socialsci/02-starting-with-data/index.html#what-are-data-frames-and-tibbles)</font>
+<br><font size="3">*Source*: [Data Carpentry R for Social Scientists ](https://datacarpentry.org/r-socialsci/02-starting-with-data/index.html#what-are-data-frames-and-tibbles)</font>
 
 
 ## Reading data
@@ -68,7 +68,7 @@ For example, here is a figure depicting a data frame comprising a numeric, a cha
 We're gonna read in the `gapminder` data set with information about countries' size, GDP and average life expectancy in different years.
 
 ```{r reading-data}
-gapminder <- read_csv("data/gapminder_data.csv")
+gapminder <- read.csv("data/gapminder_data.csv")
 
 ```
 
@@ -92,9 +92,11 @@ There are multiple ways to explore a data set. Here are just a few examples:
 
 
 ```{r}
-head(gapminder) # see first 6  rows of the data set
 
-summary(gapminder) # gives basic statistical information about each column. Information format differes by data type.
+head(gapminder) # shows first 6  rows of the data set
+
+summary(gapminder) # basic statistical information about each column. 
+# Information format differes by data type.
 
 nrow(gapminder) # returns number of rows in a dataset
 
@@ -108,7 +110,9 @@ When you're analyzing a data set, you often need to access its specific columns.
 
 One handy way to access a column is using it's name and a dollar sign `$`: 
 ```{r subset-dollar-sign}
-country_vec <- gapminder$country  # Notation means: From dataset gapminder, give me column country. You can see that the column accessed in this way is just a vector of characters. 
+# This notation means: From dataset gapminder, give me column country. You can 
+# see that the column accessed in this way is just a vector of characters. 
+country_vec <- gapminder$country 
 
 head(country_vec)
 
@@ -157,8 +161,9 @@ We already know how to select only the needed columns. But now, we also want to
 In the `gapminder` data set, we want to see the results from outside of Europe for the 21st century. 
 ```{r}
 year_country_gdp_euro <- gapminder %>% 
-  filter(continent != "Europe" & year >= 2000) %>% # & operator (AND) - both conditions must be met
+  filter(continent != "Europe" & year >= 2000) %>% 
   select(year, country, gdpPercap)
+# '&' operator (AND) - both conditions must be met
 
 head(year_country_gdp_euro)
 ```
@@ -177,8 +182,9 @@ Write a single command (which can span multiple lines and includes pipes) that w
 
 ```{r ex5, class.source="bg-info"}
 year_country_gdp_eurasia <- gapminder %>% 
-  filter(continent == "Europe" | continent == "Asia") %>% # | operator (OR) - one of the conditions must be met
-  select(year, country, gdpPercap)
+  filter(continent == "Europe" | continent == "Asia") %>% 
+  select(year, country, gdpPercap) 
+# '|' operator (OR) - one of the conditions must be met
 
 nrow(year_country_gdp_eurasia)
 ``` 
@@ -191,7 +197,7 @@ So far, we have provided summary statistics on the whole dataset, selected colum
 ```{r dplyr-group}
 gapminder %>% # select the dataset
   group_by(continent) %>% # group by continent
-  summarize(avg_gdpPercap = mean(gdpPercap)) # summarize function creates statistics for the data set 
+  summarize(avg_gdpPercap = mean(gdpPercap)) # create basic stats
 
 ```
 
@@ -211,7 +217,8 @@ Calculate the average life expectancy per country. Which country has the longest
 gapminder %>%
    group_by(country) %>%
    summarize(avg_lifeExp=mean(lifeExp)) %>%
-   filter(avg_lifeExp == min(avg_lifeExp) | avg_lifeExp == max(avg_lifeExp))
+   filter(avg_lifeExp == min(avg_lifeExp) | 
+            avg_lifeExp == max(avg_lifeExp) )
 ```
 
 ### Multiple groups and summary variables