First steps
This tutorial assumes that you have already installed R and RStudio and are familiar with how to run code in RStudio. If not, please follow this installation tutorial and this introduction to R and RStudio.
Please watch this video (6:01), then read and follow along with the written tutorial below.
Objects in R
One of the most basic types of objects in R is a vector. A vector is a
collection of values of the same type, such as numbers, characters, or
logicals (TRUE/FALSE). You can create a vector with the c()
function,
which stands for concatenate. If you assign a vector to an object with
the assignment operator <-
, your vector will be saved in your
environment so you can work with it within your current R session. Some
examples of creating vectors are:
v1 <- c("A", "B", "C")
v2 <- 25
v3 <- 1:10
To subset or extract elements from a vector, you can use square brackets
[ ]
with an index. For example, v1[1]
returns the first element of
v1
, v3[2:5]
returns the 2nd to 5th elements of v3
, and
v3[-c(2, 4, 6)]
returns all but the 2nd, 4th and 6th elements of v3
.
v1[1]
## [1] "A"
v3[2:5]
## [1] 2 3 4 5
v3[-c(2, 4, 6)]
## [1] 1 3 5 7 8 9 10
A dataframe (or tibble in tidyverse
) is a special type of object that
combines vectors into a rectangular table. Each column of a dataframe is
a vector, and each row is an observation. usually you would load data
from an external source, but you can create a dataframe with the
data.frame()
and a tibble with the tibble()
function. You can also
convert other data types such as matrices to tibbles with the
as_tibble()
function. Both functions take vectors as their arguments.
Tibbles are preferred because they are more modern and have some
convenient features that dataframes don’t, but for the most part,
differences are minor and for the most part it does not matter whether
you work with tibbles or dataframes.
A simple example of creating a tibble is (make sure to load
tidyverse first
):
library(tidyverse)
# define vectors within the tibble() function
tibble(
name = c("Alice", "Bob", "Chris"),
height = c(165, 180, 175)
)
## # A tibble: 3 × 2
## name height
## <chr> <dbl>
## 1 Alice 165
## 2 Bob 180
## 3 Chris 175
# define the vectors first, then combine them into a tibble
name <- c("Alice", "Bob", "Chris")
height <- c(165, 180, 175)
tibble(name, height)
## # A tibble: 3 × 2
## name height
## <chr> <dbl>
## 1 Alice 165
## 2 Bob 180
## 3 Chris 175
Functions in R
Functions are reusable pieces of code that perform a specific task. They take arguments as inputs and return one or more pieces of output. You will mostly work with functions loaded from various packages or from the base R distribution, and in some cases you may write your own functions to avoid repetition or improve the readability of your code. We will cover writing your own functions later in the program.
As with vectors, the output of a function is saved to your environment
only if you assign the result to an object. For example, sum(x)
will
display the sum of the elements of the vector x
, but sum <- sum(x)
will save this result to an object.
x <- c(1, 5, 6, 2, 1, 8)
sum(x)
## [1] 23
sum <- sum(x)
Some important functions on vectors are
mean(x) # return the mean; add the argument na.rm = TRUE if missing values should be excluded
## [1] 3.833333
length(x) # give the length of the vector (number of elements)
## [1] 6
unique(x) # list the unique elements of the vector
## [1] 1 5 6 2 8
To learn more about a function and its arguments, you can use the ?
operator or the help() function, for example by typing ?sum
(or
equivalently, ?sum()
). It is good practice to request help files from
your console and not you R script, since there is no need to save these
queries for the future.