Appendix: R Tutorial

R (R Core Team, 2022) is a programming language and software environment for statistical computing and graphics. It is widely used among statisticians and data scientists for data analysis and data visualization. R has a large and active community of users and, as a result, there are many community-made resources available for learning and using R.

In an audit context, R can be used to analyze and visualize large datasets, allowing auditors to identify trends and anomalies in the data. R is particularly useful for performing statistical analysis and testing hypotheses, which can be employed in verifying the accuracy and reliability of financial statements. R can also be used to automate certain audit procedures, reducing the time and effort required to manually review and analyze large amounts of data. Additionally, R allows auditors to easily share their work with others through the use of code and reproducible reports, enabling more efficient and collaborative audit processes. A useful resource that discusses relevant applications of R in the audit can be found in Lin (2021).

This chapter provides a short introduction to working with R but it is not a full programming course in R. Furthermore, if you want to dive deeper into the nooks and crannies of this language, the author recommends reading Chang (2022), Wickam & Grolemund (2017), Grolemund (2014) and Wilke (2022). These books are free to read online and discuss various best practices for using R, including examples in code that can be easily reproduced.

Figure 1: R can be a powerful tool in a business context. Image available under a CC-BY-NC 4.0 license.

Calculations

One of the basic features of R is its ability to perform calculations. In R, basic calculations work by using the standard arithmetic operators such as + for addition, - for subtraction, * for multiplication, and / for division. For example, if you want to calculate 2 + 3, you would type in 2 + 3 and R will return the result of 5.

2 + 3
#> [1] 5

R also allows for more advanced calculations such as exponentiation using the ^ operator, and square roots using the sqrt() function. For example, to calculate the square root of 9, you would type in sqrt(9) and R will return the result of 3.

sqrt(9)
#> [1] 3

You can also use parentheses to specify the order of operations in your calculations. For example, if you want to calculate (2 + 3) * 4, you would type in (2 + 3) * 4 to get the result of 20.

(2 + 3) * 4
#> [1] 20

Overall, basic calculations in R are similar to those in other programming languages and follow the standard order of operations.

Vectors

In R, vectors are one-dimensional arrays of data that can hold numeric, character, or logical values. Vectors can be created using the c() function, which stands for concatenate. For example, to create a numeric vector, you can use the following code:

x <- c(1, 2, 3, 4, 5)

To create a character vector, you can use quotes around the values:

y <- c("apple", "banana", "orange")

To create a logical vector, you can use the logical values TRUE and FALSE:

z <- c(TRUE, FALSE, TRUE, TRUE, FALSE)

Vectors can be indexed using square brackets and a numeric value. For example, to access the second element of the vector x, you can use the following code:

x[2]
#> [1] 2

Vectors can also be subsetted using a logical vector. For example, to get all elements of the vector x that are greater than 3, you can use the following code:

x[x > 3]
#> [1] 4 5

Vectors can also be modified using indexing and assignment. For example, to change the third element of the vector x to 6, you can use the following code:

x[3]
#> [1] 3
x[3] <- 6
x[3]
#> [1] 6

R has many built-in functions for performing mathematical operations on vectors. For example, you can use the mean() function to calculate the average of a vector of numbers, or we can use the length() function to calculate the number of elements in a vector:

mean(x)
#> [1] 3.6
length(y)
#> [1] 3

Overall, vectors are a useful data structure in R for storing and manipulating data.

Matrices

In R, a matrix is a two-dimensional collection of values that are arranged in rows and columns. You can create a matrix using the matrix() function. For example:

m <- matrix(1:9, nrow = 3, ncol = 3)
m
#>      [,1] [,2] [,3]
#> [1,]    1    4    7
#> [2,]    2    5    8
#> [3,]    3    6    9

This creates a 3x3 matrix with the values 1, 2, 3 in the first column, 4, 5, 6 in the second column, and 7, 8, 9 in the third column.

You can also create a matrix by combining several vectors using the cbind() or rbind() functions. For example:

v1 <- c(1, 2, 3)
v2 <- c(4, 5, 6)
v3 <- c(7, 8, 9)
m <- cbind(v1, v2, v3)
m
#>      v1 v2 v3
#> [1,]  1  4  7
#> [2,]  2  5  8
#> [3,]  3  6  9

This creates a matrix with the same values as before, but the columns are created by binding the vectors together.

You can access the elements of a matrix using the square bracket notation. For example, to access the element in the second row and third column of m, you would use the following code:

m[2, 3]
#> v3 
#>  8

You can also use the dim() function to get the dimensions of a matrix, and the colnames() and rownames() functions to get the names of the columns and rows, respectively.

There are many other functions and operations available for working with matrices in R, including mathematical operations such as matrix multiplication and inversion.

Data Frames

In R, a data frame is a two-dimensional table of data with rows and columns. Each row represents a single observation or record, and each column represents a particular variable or attribute. Data frames are similar to a spreadsheet in Excel or a table in a database. Each column in a data frame can have a different data type, such as numerical, character, or logical. The data in each row must match the data type of the corresponding column.

To create a data frame in R, you can use the data.frame() function and pass in the data you want to include in the data frame as arguments. For example:

df <- data.frame(x = c(1, 2, 3), y = c(4, 5, 6))

This will create a data frame with two columns, x and y, and three rows of data. You can access the data in a data frame using indexing and subsetting. For example, to access the first row of the data frame, you can use the following command:

df[1, ]
#>   x y
#> 1 1 4

To access a specific column, you can use the $ operator (or the index):

df$x
#> [1] 1 2 3
df[, 1]
#> [1] 1 2 3
df[, "x"]
#> [1] 1 2 3
df[["x"]]
#> [1] 1 2 3

You can also use functions like head() and tail() to view the first or last few rows of a data frame. Data frames also have several built-in functions that allow you to manipulate and analyze the data. For example, you can use the summarize() function to calculate summary statistics for each column, or the group_by() function to group the data by a specific variable and apply a function to each group.

Data Sets

When working with data, you will need to load the data file into your R session. How this is done depends on the type of data file that you want to read.

Built-in Data

Data that is included in an R package can be loaded via the data() function. For example, to load the BuildIt data set that is included in the jfa package, you can run the following R code. Note that this requires that the package is loaded in the R session via a call to library().

data(BuildIt)

Loading Data from a CSV File

A commonly used data type is a .csv file. You can load this type of files via the read.csv() function. For example, if the file example.csv is in the current working directory, you can load it by running:

read.csv("example.csv")

Loading Data from an Excel File

Another commonly used data type are Excel files. You can load this type of files via the read_excel() function from the readxl package. For this to work, you should first install this package using the install.packages() command and load it into the R session using a call to library(). For example, if the file example.xlsx is in the current working directory, and the data you want to load is on the first worksheet, you can load it by running:

install.packages("readxl")
library(readxl)
read_excel("example.csv", sheet = 1)

Practical Exercises

  1. Compute the square root of 81 and store the result in a variable called t1.
Click to reveal answer

Assigning a variable can be achieved via the <- operator, while the square root is computed via the sqrt() function.

t1 <- sqrt(81)
  1. Compute 81 to the power a half and store the result in a variable called t2.
Click to reveal answer

The square root can also be computed using the power operator ^.

t2 <- 81^0.5
  1. Use the == operator to check whether the content of t1 and t2 is the same.
Click to reveal answer

You can check if the contents of t1 and t2 are the same using the == operator.

t1 == t2
#> [1] TRUE
  1. Use the c() function (or :) to create the following vector: -2 -1 0 1 2 3 4 5 6 7 8.
Click to reveal answer

There are two main (and many more other) ways of creating this vector:

c(-2, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8)
#>  [1] -2 -1  0  1  2  3  4  5  6  7  8

or:

-2:8
#>  [1] -2 -1  0  1  2  3  4  5  6  7  8
  1. Find out the length of the vector created in exercise 4.
Click to reveal answer

You can compute the length of any vector using the length() function.

length(-2:8)
#> [1] 11
  1. Find out the mean of the vector created in exercise 4.
Click to reveal answer

You can compute the average of any vector using the mean() function.

mean(-2:8)
#> [1] 3