Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update raster episodes #43

Open
wants to merge 8 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
112 changes: 47 additions & 65 deletions episodes/13-intro-to-raster-data.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -18,19 +18,19 @@

::: questions
- What is a raster dataset?
- How do I work with and plot raster data in R?
- How do I import, examine and plot raster data in R?
:::

Check warning on line 22 in episodes/13-intro-to-raster-data.Rmd

View workflow job for this annotation

GitHub Actions / Build markdown source files if valid

check for the corresponding open tag

::: objectives

After completing this episode, participants should be able to…

- Explore raster attributes and metadata using R.
- Import rasters into R using the `terra` package.
- Explore raster attributes and metadata using the `terra` package.
- Plot a raster file in R using the `ggplot2` package.
- Describe the difference between single- and multi-band rasters.

:::

Check warning on line 33 in episodes/13-intro-to-raster-data.Rmd

View workflow job for this annotation

GitHub Actions / Build markdown source files if valid

check for the corresponding open tag

::: prereq

Expand All @@ -38,9 +38,9 @@

See the [setup instructions](../learners/setup.md) for detailed information about the software, data, and other prerequisites you will need to work through the examples in this episode.

<!-- This lesson uses the `terra` package in particular. If you have not installed it yet, do so by running `install.packages("terra")` before loading it with `library(terra)`. -->
This lesson uses the `terra` package in particular. If you have not installed it yet, do so by running `install.packages("terra")` before loading it with `library(terra)`.

:::

Check warning on line 43 in episodes/13-intro-to-raster-data.Rmd

View workflow job for this annotation

GitHub Actions / Build markdown source files if valid

check for the corresponding open tag

In this lesson, we will work with raster data. We will start with an introduction of the fundamental principles and metadata needed to work with raster data in R. We will discuss some of the core metadata elements needed to understand raster data in R, including CRS and resolution.

Expand All @@ -60,11 +60,11 @@
- data extracted from the [AHN digital elevation dataset of the Netherlands](https://www.ahn.nl/) for the TU Delft campus area; and
- high-resolution RGB aerial photos of the TU Delft library obtained from [Beeldmateriaal Nederland](https://www.beeldmateriaal.nl/download-luchtfotos).

:::

Check warning on line 63 in episodes/13-intro-to-raster-data.Rmd

View workflow job for this annotation

GitHub Actions / Build markdown source files if valid

check for the corresponding open tag

## View Raster File Attributes

We will be working with a series of GeoTIFF files in this lesson. The GeoTIFF format contains a set of embedded tags with metadata about the raster data. We can use the function `describe()` from the `terra` package to get information about our raster data before we read that data into R. It is recommended to do this before importing your data. We first examine the file `tud-dsm-5m.tif`.
We will be working with a series of GeoTIFF files in this lesson. The GeoTIFF format contains a set of embedded tags with metadata about the raster data. We can use the function `describe()` from the `terra` package to get information about our raster. It is recommended to do this before importing the data. We first examine the file `tud-dsm-5m.tif`.

```{r attr}
describe("data/tud-dsm-5m.tif")
Expand All @@ -73,102 +73,108 @@

## Open a Raster in R

Now that we've previewed the metadata for our GeoTIFF, let's import this raster dataset into R and explore its metadata more closely. We can use the `rast()` function to import a raster file in R.
Now that we've previewed the metadata for our GeoTIFF, let's import this raster file into R and explore its metadata more closely. We can use the `rast()` function to import a raster file in R.

::: callout
# Data tip - Object names
To improve code readability, use file and object names that make it clear what is in the file. The raster data for this episode contain the TU Delft campus and its surroundings so we’ll use a naming convention of `datatype_TUD`. The first object is a Digital Surface Model (DSM) in GeoTIFF format stored in a file `tud-dsm-5m.tif` which we will load into an object named according to our naming convention `DSM_TUD`.
To improve code readability, use file and object names that make it clear what is in the file. The raster data for this episode contain the TU Delft campus and its surroundings so we will use the naming convention `<DATATYPE>_TUD`. The first object is a Digital Surface Model (DSM) in GeoTIFF format stored in a file `tud-dsm-5m.tif` which we will load into an object named according to our naming convention `DSM_TUD`.
:::

Check warning on line 81 in episodes/13-intro-to-raster-data.Rmd

View workflow job for this annotation

GitHub Actions / Build markdown source files if valid

check for the corresponding open tag

First we will load our raster file into R and view the data structure.
Let's load our raster file into R and view its data structure.

```{r}
```{r read-dsm}
DSM_TUD <- rast("data/tud-dsm-5m.tif")
DSM_TUD
```
The information above includes a report on dimension, resolution, extent and CRS, but no information about the values. Similar to other data structures in R like vectors and data frame columns, descriptive statistics for raster data can be retrieved with the `summary()` function.
The information above includes a report on dimension, resolution, extent and CRS, but no information about the values. Similar to other data structures in R like vectors and data frames, descriptive statistics for raster data can be retrieved with the `summary()` function.

```{r, warning=TRUE}
```{r dsm-summary, warning=TRUE}
summary(DSM_TUD)
```

This output gives us information about the range of values in the DSM. We can see, for instance, that the lowest elevation is `-5.2235`, the highest is `89.7838`. But note the warning. Unless you force R to calculate these statistics using every cell in the raster, it will take a random sample of 100,000 cells and calculate from them instead. To force calculation all the values, you can use the function `values`:

```{r}
```{r dsm-summary-values}
summary(values(DSM_TUD))
```

With a summary on all cells of the raster, the values range from a smaller minimum of `-5.3907` to a higher maximum of `92.0910`.

To visualise the DSM in R using `ggplot2`, we need to convert it to a data frame. We learned about data frames in an [earlier lesson](../episodes/03-explore-data.Rmd). The `terra` package has the built-in function `as.data.frame()` for conversion to a data frame.
To visualise the DSM in R using `ggplot2`, we need to convert it to a data frame. We learned about data frames in an [earlier lesson](../episodes/03-explore-data.Rmd). The `terra` package has the built-in method `as.data.frame()` for conversion to a data frame.

```{r}
```{r dsm-df}
DSM_TUD_df <- as.data.frame(DSM_TUD, xy = TRUE)
```

Now when we view the structure of our data, we will see a standard data frame format.
Now when we view the structure of our data, we will see a standard data frame format in which every row is a cell from the raster, each containing information about the `x` and `y` coordinates and the raster value stored in the `tud-dsm-5m` column.

```{r}
```{r dsm-df-str}
str(DSM_TUD_df)
```

We can use `ggplot()` to plot this data with a specific `geom_` function called `geom_raster()`. We will make the colour scale in our plot colour-blindness friendly with `scale_fill_viridis_c`, introduced in an [earlier lesson](../episodes/04-intro-to-visualisation.Rmd). We will also use the `coord_quickmap()` function to use an approximate Mercator projection for our plots. This approximation is suitable for small areas that are not too close to the poles. Other coordinate systems are available in `ggplot2` if needed, you can learn about them at their help page `?coord_map`.
We can use `ggplot()` to plot this data with a specific `geom_` function called `geom_raster()`. We will make the colour scale in our plot colour-blindness friendly with `scale_fill_viridis_c`, introduced in an [earlier lesson](../episodes/04-intro-to-visualisation.Rmd). We will also use the `coord_equal()` function to ensure that the units (meters in our case) on the two axes are equal.

```{r first-rast-plot, fig.cap="Raster plot with `ggplot2` using the viridis color scale"}
ggplot() +
geom_raster(data = DSM_TUD_df , aes(x = x, y = y, fill = `tud-dsm-5m`)) +
scale_fill_viridis_c(option = "H") + # `option = "H"` provides a contrasting colour scale
coord_quickmap()
scale_fill_viridis_c(option = "turbo") +
coord_equal()
```

::: callout
# Plotting tip

More information about the viridis palette used above at [viridis package documentation](https://cran.r-project.org/web/packages/viridis/vignettes/intro-to-viridis.html).
The `"turbo"` scale in our code provides a good contrasting scale for our raster, but another colour scale may be preferred when plotting other rasters. More information about the viridis palette used above can be found in the [viridis package documentation](https://cran.r-project.org/web/packages/viridis/vignettes/intro-to-viridis.html).
:::

::: callout
# Plotting tip

For faster previews, you can use the `plot()` function on a `terra` object.

```{r dsm-plot}
plot(DSM_TUD)
```

:::

Check warning on line 139 in episodes/13-intro-to-raster-data.Rmd

View workflow job for this annotation

GitHub Actions / Build markdown source files if valid

check for the corresponding open tag

This map shows our Digital Surface Model, that is, the elevation of our study site including buildings and vegetation. From the legend we can confirm that the maximum elevation is around 90, but we cannot tell whether this is 90 feet or 90 meters because the legend doesn’t show us the units. We can look at the metadata of our object to see what the units are. Much of the metadata that we’re interested in is part of the CRS.
## View Raster Coordinate Reference System (CRS)

Now we will see how features of the CRS appear in our data file and what meanings they have.
The map above shows our Digital Surface Model (DSM), that is, the elevation of our study site including buildings and vegetation. From the legend we can confirm that the maximum elevation is around 90, but we cannot tell whether that is 90 feet or 90 meters because the legend does not show us the units. We can look at the metadata of our object to see what the units are. Much of the metadata that we are interested in is part of the CRS.

## View Raster Coordinate Reference System (CRS) in R
Now we will see how features of the CRS appear in our data file and what meaning they have.

We can view the CRS string associated with our R object using the `crs()` function.

```{r}
```{r dsm-crs}
crs(DSM_TUD, proj = TRUE)
```

::: challenge
What units are our data in?

# Challenge: What units are our data in?

::: solution

`+units=m` in the output of the code above tells us that our data is in meters (m).

:::

Check warning on line 161 in episodes/13-intro-to-raster-data.Rmd

View workflow job for this annotation

GitHub Actions / Build markdown source files if valid

check for the corresponding open tag
:::

Check warning on line 162 in episodes/13-intro-to-raster-data.Rmd

View workflow job for this annotation

GitHub Actions / Build markdown source files if valid

check for the corresponding open tag

::: callout
# Understanding CRS in PROJ.4 format

The CRS for our data is given to us by R in PROJ.4 format. Let’s break down the pieces of a PROJ.4 string. The string contains all of the individual CRS elements that R or another GIS might need. Each element is specified with a `+` sign, similar to how a `.csv` file is delimited or broken up by a `,`. After each `+` we see the CRS element such as projection (`proj=`) being defined.

See more about CRS and PROJ.4 strings in [this lesson](https://datacarpentry.org/organization-geospatial/03-crs).
See more about CRS and PROJ.4 strings in [this lesson](https://datacarpentry.org/organization-geospatial/03-crs#describing-coordinate-reference-systems).
:::

Check warning on line 170 in episodes/13-intro-to-raster-data.Rmd

View workflow job for this annotation

GitHub Actions / Build markdown source files if valid

check for the corresponding open tag

## Calculate Raster Min and Max values

It is useful to know the minimum and maximum values of a raster dataset. In this case, as we are working with elevation data, these values represent the min/max elevation range at our site.
It is useful to know the minimum and maximum values of a raster dataset. In this case, as we are working with elevation data, these values represent the minimum-to-maximum elevation range at our site.

Raster statistics are often calculated and embedded in a GeoTIFF for us. We can view these values:
```{r}
```{r dsm-minmax}
minmax(DSM_TUD)
```

Expand All @@ -177,64 +183,40 @@

If the `min` and `max` values are `Inf` and `-Inf` respectively, it means that they haven't been calculated. We can calculate them using the `setMinMax()` function.

```{r}
```{r dsm-setminmax}
DSM_TUD <- setMinMax(DSM_TUD)
```
:::

Check warning on line 189 in episodes/13-intro-to-raster-data.Rmd

View workflow job for this annotation

GitHub Actions / Build markdown source files if valid

check for the corresponding open tag

```{r}
A call to `minmax(DSM_TUD)` will now give us the correct values. Alternatively, `min(values())` and `max(values())` will return the minimum and maximum values respectively.

```{r dsm-min}
min(values(DSM_TUD))
```

```{r}
```{r dsm-max}
max(values(DSM_TUD))
```


We can see that the elevation at our site ranges from `-5.39069`m to `92.08102`m.

## Raster bands

The Digital Surface Model object (`DSM_TUD`) that we’ve been working with is a single band raster. This means that there is only one dataset stored in the raster: surface elevation in meters for one time period.

![Single- and multi-band raster](https://datacarpentry.org/r-raster-vector-geospatial/fig/dc-spatial-raster/single_multi_raster.png)

The Digital Surface Model object (`DSM_TUD`) that we have been working with is a single band raster. This means that there is only one layer stored in the raster: surface elevation in meters for one time period.

We can view the number of bands in a raster using the `nlyr()` function.

A raster dataset can contain one or more bands. We can view the number of bands in a raster using the `nlyr()` function.

```{r}
```{r dsm-nlyr}
nlyr(DSM_TUD)
```
This dataset has only 1 band. However, raster data can also be multi-band, meaning that one raster file contains data for more than one variable or time period for each cell. We will discuss multi-band raster data in a [later episode](../episodes/17-work-with-multi-band-rasters.Rmd).

![Single- and multi-band raster](https://datacarpentry.org/r-raster-vector-geospatial/fig/dc-spatial-raster/single_multi_raster.png)

Our DSM data has only one band. However, raster data can also be multi-band, meaning that one raster file contains data for more than one variable or time period for each cell. We will discuss multi-band raster data in a [later episode](../episodes/17-work-with-multi-band-rasters.Rmd).

## Creating a histogram of raster values

A histogram can be used to inspect the distribution of raster values visually. It can show if there are values above the max or below the min of the expected range.

We can inspect the distribution of values contained in a raster using the `ggplot2` function `geom_histogram()`. Histograms are often useful in identifying outliers and bad data values in our raster data.

```{r rast-hist, message=TRUE}
ggplot() +
geom_histogram(data = DSM_TUD_df, aes(`tud-dsm-5m`))
```

Notice that a message is displayed when R creates the histogram:

```
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
```

This message is caused by a default setting in `geom_histogram()` enforcing that there are 30 bins for the data. We can define the number of bins we want in the histogram by giving another value to the `bins` argument. 60 bins, for instance, will give a more detailed histogram of the same distribution:

```{r}
ggplot() +
geom_histogram(data = DSM_TUD_df, aes(`tud-dsm-5m`), bins = 60)
```

Note that the shape of this histogram looks similar to the previous one that was created using the default of 30 bins. The distribution of elevation values for our Digital Surface Model (DSM) looks reasonable. It is likely that there are no bad data values in this particular raster.
A histogram can be used to inspect the distribution of raster values visually. It can show if there are values above the maximum or below the minimum of the expected range. We can plot a histogram using the `ggplot2` function `geom_histogram()`. Histograms are often useful in identifying outliers and bad data values in our raster data. Read more on the use of histograms in [this lesson](https://datacarpentry.org/r-raster-vector-geospatial/01-raster-structure.html#create-a-histogram-of-raster-values)

::: challenge

Expand All @@ -251,7 +233,7 @@

::: solution

```{r}
```{r dsm-describe}
describe("data/tud-dsm-5m-hill.tif")
```

Expand All @@ -267,9 +249,9 @@

::::::::::::::::::::::::::::::::::::: keypoints

- The GeoTIFF file format includes metadata about the raster data.
- The GeoTIFF file format includes metadata about the raster data that can be inspected with the `describe()` function from the `terra` package.
- To plot raster data with the `ggplot2` package, we need to convert them to data frames.
- R stores CRS information in the PROJ.4 format.
- PROJ is a widely used standard format to store, represent and transform CRS.
- Histograms are useful to identify missing or bad data values.

::::::::::::::::::::::::::::::::::::::::::::::::
Expand Down
Loading
Loading