Data Center Apprenticeship: R basics: Introduction to R


June 2024

Objects in R

One of the most basic types of objects in R is a vector. A vector is a collection of values of the same type, such as numbers, characters, or logicals (TRUE/FALSE). You can create a vector with the c() function, which stands for concatenate. If you assign a vector to an object with the assignment operator <-, your vector will be saved in your environment so you can work with it within your current R session. Some examples of creating vectors are:

v1 <- c("A", "B", "C")  # character vector with 3 elements
v2 <- 25                # numeric vector with 1 element
v3 <- 1:10              # numeric vector with 10 elements - integers from 1 to 10

To subset or extract elements from a vector, you can use square brackets [ ] with an index. For example, v1[1] returns the first element of v1, v3[2:5] returns the 2nd to 5th elements of v3, and v3[-c(2, 4, 6)] returns all but the 2nd, 4th and 6th elements of v3.

v1[1]
## [1] "A"
v3[2:5]
## [1] 2 3 4 5
v3[-c(2, 4, 6)]
## [1]  1  3  5  7  8  9 10

A dataframe (or “tibble” in tidyverse) is a special type of object that combines vectors into a rectangular table. Each column of a dataframe is a vector, and each row is an observation. usually you would load data from an external source, but you can create a dataframe with the data.frame() and a tibble with the tibble() function. You can also convert other data types such as matrices to tibbles with the as_tibble() function. Both functions take vectors as their arguments. Tibbles are preferred because they are more modern and have some convenient features that dataframes don’t, but for the most part, differences are minor and for the most part it does not matter whether you work with tibbles or dataframes.

A simple example of creating a tibble is (make sure to load tidyverse first):

library(tidyverse)

# define vectors within the tibble() function
tibble(
  name = c("Alice", "Bob", "Chris"),
  height = c(165, 180, 175)
)
## # A tibble: 3 × 2
##   name  height
##   <chr>  <dbl>
## 1 Alice    165
## 2 Bob      180
## 3 Chris    175
# define the vectors first, then combine them into a tibble
name <- c("Alice", "Bob", "Chris")
height <- c(165, 180, 175)
tibble(name, height)
## # A tibble: 3 × 2
##   name  height
##   <chr>  <dbl>
## 1 Alice    165
## 2 Bob      180
## 3 Chris    175

Functions in R

Functions are reusable pieces of code that perform a specific task. They take arguments as inputs and return one or more pieces of output. You will mostly work with functions loaded from various packages or from the base R distribution, and in some cases you may write your own functions to avoid repetition or improve the readability of your code. We will cover writing your own functions later in the program.

As with vectors, the output of a function is saved to your environment only if you assign the result to an object. For example, sum(x) will display the sum of the elements of the vector x, but sum <- sum(x) will save this result to an object.

x <- c(1, 5, 6, 2, 1, 8)

sum(x)
## [1] 23
sum <- sum(x)

Some important functions on vectors are

mean(x)   # return the mean; add the argument na.rm = TRUE if missing values should be excluded
## [1] 3.833333
length(x) # give the length of the vector (number of elements)
## [1] 6
unique(x) # list the unique elements of the vector
## [1] 1 5 6 2 8

To learn more about a function and its arguments, you can use the ? operator or the help() function, for example by typing ?sum (or equivalently, ?sum()). It is good practice to request help files from your console and not you R script, since there is no need to save these queries for the future.

Go to