Data visualization: patterns over time

This tutorial is not ready yet. Please come back later.

Introduction

Before you proceed, make sure you’re familiar with the logic of ggplot, as explained in our introduction to ggplot tutorial.

We’ll use the population dataset that comes pre-loaded with tidyverse to demonstrate how to plot the evolution of a variable over time, so let’s load tidyverse and have a look at the dataset:

# load tidyverse
library(tidyverse)

# add population to the environment
data(population)

Time series plots

In a time series plot, the x-axis represents time, and the y-axis represents the variable you want to visualize. To create a time series plot, you can use the line geom.

Let’s plot the evolution of the population variable over time.

# create a time series plot 
ggplot(population, aes(x = year, y = population)) +
  geom_line()

The plot does not make sense, because in every year we have observations for multiple countries, and by default geom_line() connects all of the points. Let’s filter the data for a single country before plotting. We use the filter() function – if you’re not familiar with it, have a look at our corresponding data wrangling tutorial.

ggplot(filter(population, country == "Netherlands"), 
       aes(x = year, y = population)) +
  geom_line()

If you’re familiar with the pipe operator %>% (see our tidy workflow tutorial), you can use it to make the code more readable:

population %>%
  filter(country == "Netherlands") %>%
  ggplot(aes(x = year, y = population)) +
  geom_line()

You can compare the evolution of the population by adding multiple lines to the plot and differentiating them by their color. To make it clearer that we have annual data, we can add points to the plot as well, because geoms can be layered.

population %>%
  # keep only data from the Netherlands and Belgium
  filter(country %in% c("Netherlands", "Belgium")) %>%
  # specify the aesthetics for all geoms
  ggplot(aes(x = year, y = population, color = country)) +
  # add a line for each country
  geom_line() +
  # add points for each data point
  geom_point()

Whenever you make a plot, make sure to use clear labels and titles with the labs() function to make your visualization easy to understand.

population %>%
  filter(country == "Netherlands") %>%
  ggplot(aes(x = year, y = population)) +
  geom_line() +
  labs(title = "Population of the Netherlands over time",
       x = "Year",
       y = "Population")

To learn more about other geoms and customization options, have a look at our advanced visualization tutorial and additional resources.

Video tutorial TBA