Introduction

In this project we investigate the relationship between COVID-19 cases and the Stringency index of COVID regulations in different countries by making a scatter plot and calculating the correlation between the two variables.

The R script from the workshop is also available on GitHub.

The first step of making the map is to (install and) load the required libraries.

# install.packages("tidyverse")
# install.packages("corrr")
library(tidyverse)
library(corrr)

Data

We need to load the data for the stringency index per country, together with the number of COVID-19 cases and the total population of each country.

First we load and tidy the data for stringency.
The data for the stringency was obtained from Our World in Data. You can load the data directly from the link below, therefore you don’t need to download the file.

The raw data has observations for every day, so to get a single number per country, we group the data by country and calculate the mean stringency index within each country.

# load the data for the stringency index and assign it to the object containment
containment <- read_csv("https://github.com/ucrdatacenter/projects/raw/main/SCIBIOM303/2024h1/covid-containment-and-health-index.csv") |> 
  # group the data by country code
  group_by(Code) |> 
  # calculate the mean stringency index per country
  summarise(avg_containment = mean(containment_index, na.rm = TRUE))

The data for the number of COVID cases are also from Our World in Data.

# load the data for the number of COVID cases and assign it to the object covid
covid <- read_csv("https://github.com/ucrdatacenter/projects/raw/main/SCIBIOM303/2024h1/owid-covid-data.csv")

The covid dataset reports the number of COVID cases (and many other variables) every week, so to get a single number of total cases per country, we sum up the weekly new cases per country.

To add the stringency measure for each country next to the COVID cases, we join the data together using the inner_join function. This function takes two dataframes and joins them together based on a common column, in this case Code.

# create a new dataframe merged_data that contains both variables
merged_data <- covid |> 
  # rename the column iso_code to Code
  rename(Code = iso_code) |> 
  # group the data by country code
  group_by(Code) |> 
  # calculate the mean number of new cases per million people per country
  summarise(avg_new_cases_per_million = mean(new_cases_per_million, na.rm = TRUE)) |> 
  # join the data with the containment data
  inner_join(containment, by = "Code")

Scatterplot and correlation

We can make a scatter plot of the relationship between the stringency index and the number of COVID cases per million people to visualize the relationship. We add a linear regression line to the scatter plot to see the trend more clearly and customize the axis labels and theme.

merged_data |> 
  # define a plot with the containment index on the x-axis and the number of new cases per million on the y-axis
  ggplot(aes(x = avg_containment, y = avg_new_cases_per_million)) +
  # add points for each country
  geom_point() +
  # add a linear regression line
  geom_smooth(method="lm") +
  # add labels to the axes and a title
  labs(title = "COVID-19 Cases vs Containment Index: 185 countries over 2020-2022",
       y = "Average weekly new cases per million inhabitants",
       x = "Average containment index") +
  # customize the theme
  theme_minimal()

Finally, we calculate the correlation matrix between the variables.

# calculate the correlation matrix
merged_data |> 
  correlate()

## # A tibble: 2 × 3
##   term                      avg_new_cases_per_million avg_containment
##   <chr>                                         <dbl>           <dbl>
## 1 avg_new_cases_per_million                   NA               0.0741
## 2 avg_containment                              0.0741         NA