-
Notifications
You must be signed in to change notification settings - Fork 3
/
13_mapping.Rmd
42 lines (33 loc) · 1.46 KB
/
13_mapping.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
---
title: "Read Mapping Rates"
output: github_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(fig.width=12, fig.height=8,
echo=FALSE, warning=FALSE, message=FALSE)
library(tidyverse)
```
```{r}
library(tidyverse)
library(ggpubr)
source("R/load.R")
source("R/constants.R")
sample_table <- load_sample_table()
```
Overall read mapping rates were assessed using samtools flagstat which reports the percentage of all reads having at least one mapping entry in the bam file.
Note that since mapping is performed against the host genome these numbers are significantly affected by the proportion of symbiont reads in the sample. Almost all samples had high (>90%) mapping rates but mapping rates on average were slightly lower for Magnetic Island samples. This could reflect the fact that the reference genome was built from an individual from Pelorus Island (northern population). Of the Northern reefs Pelorus Island had the highest average mapping rate.
```{r}
mapping_rates <- read_tsv("hpc/gatk3/mapping_rates.tsv",
col_names = c("bamfile","rate")) %>%
left_join(sample_table) %>%
mutate(pop_order = site_order()[location_id])
```
```{r}
ggplot(mapping_rates,aes(x=reorder(location_name,pop_order),y=rate)) +
geom_boxplot() +
geom_point() +
xlab("") + ylab("Mapping Rate") +
theme_pubclean() +
theme(text = element_text(size=20)) + coord_flip()
ggsave("figures/mapping_rates.png",width = 12)
```