Statistical Programming in R

This lecture

  • If-statements

  • For-loops

  • Apply()

  • Writing your own functions

New controls and functions

New control flow constructs

  • if(cond) expr
  • if(cond) cons.expr else alt.expr
  • for(var in seq) expr

New functions

  • rev(): reverse version of an argument
  • apply(): apply a function to margins of a matrix
  • sapply(): apply a function to elements of a list, vector or matrix return
  • lapply(): apply a function to elements of a list, list return
  • print(): print an object to the console
  • cat: outputs an object with less conversion than print()

Conditionals and loops

If-statements

Often, we want to run some code only if something condition is true.

For example:

a <- 2
if (a > 5) {
  print("a is larger than 5.")
}
a <- 8
if (a > 5) {
  print("a is larger than 5.")
}
## [1] "a is larger than 5."

If-else-statements

We can also specify something to be run if the condition is not true.

a <- 2
if (a > 5) {
  print("a is larger than 5.")
} else {
  print("a is smaller than 5.")
}
## [1] "a is smaller than 5."

If-else-statements

a <- 8
if (a > 5) {
  print("a is larger than 5.")
} else {
  print("a is smaller than 5.")
}
## [1] "a is larger than 5."

For-loops

Very often, we want to do something multiple times.

It is tedious, and often impossible, to write this out completely.

For-loops

# Let's print the numbers 1 to 6 one by one. 
print(1)
## [1] 1
print(2)
## [1] 2
print(3)
## [1] 3

For-loops

print(4)
## [1] 4
print(5)
## [1] 5
print(6)
## [1] 6

For-loops

For-loops allow us to automate this!

for (i in 1:6) {
  print(i)
}
## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] 5
## [1] 6

For-loops

for (some.var.name in 1:6) {
  print(some.var.name)
}
## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] 5
## [1] 6

For-loops

for (i in 1:6) {
  print(i < 5)
}
## [1] TRUE
## [1] TRUE
## [1] TRUE
## [1] TRUE
## [1] FALSE
## [1] FALSE

For-loops

for (i in 1:nrow(cats)) {
  if (cats$Bwt[i] > 2.5) {
    cat(i, "is over 2.5. It is:", cats$Bwt[i], "\n")
  }
}
## 37 is over 2.5. It is: 2.6 
## 38 is over 2.5. It is: 2.6 
## 39 is over 2.5. It is: 2.6 
## 40 is over 2.5. It is: 2.7 
## 41 is over 2.5. It is: 2.7 
## 42 is over 2.5. It is: 2.7 
## 43 is over 2.5. It is: 2.9 
## 44 is over 2.5. It is: 2.9 
## 45 is over 2.5. It is: 2.9 
## 46 is over 2.5. It is: 3 
## 47 is over 2.5. It is: 3 
## 73 is over 2.5. It is: 2.6 
## 74 is over 2.5. It is: 2.6 
## 75 is over 2.5. It is: 2.6 
## 76 is over 2.5. It is: 2.6 
## 77 is over 2.5. It is: 2.6 
## 78 is over 2.5. It is: 2.6 
## 79 is over 2.5. It is: 2.7 
## 80 is over 2.5. It is: 2.7 
## 81 is over 2.5. It is: 2.7 
## 82 is over 2.5. It is: 2.7 
## 83 is over 2.5. It is: 2.7 
## 84 is over 2.5. It is: 2.7 
## 85 is over 2.5. It is: 2.7 
## 86 is over 2.5. It is: 2.7 
## 87 is over 2.5. It is: 2.7 
## 88 is over 2.5. It is: 2.8 
## 89 is over 2.5. It is: 2.8 
## 90 is over 2.5. It is: 2.8 
## 91 is over 2.5. It is: 2.8 
## 92 is over 2.5. It is: 2.8 
## 93 is over 2.5. It is: 2.8 
## 94 is over 2.5. It is: 2.8 
## 95 is over 2.5. It is: 2.9 
## 96 is over 2.5. It is: 2.9 
## 97 is over 2.5. It is: 2.9 
## 98 is over 2.5. It is: 2.9 
## 99 is over 2.5. It is: 2.9 
## 100 is over 2.5. It is: 3 
## 101 is over 2.5. It is: 3 
## 102 is over 2.5. It is: 3 
## 103 is over 2.5. It is: 3 
## 104 is over 2.5. It is: 3 
## 105 is over 2.5. It is: 3 
## 106 is over 2.5. It is: 3 
## 107 is over 2.5. It is: 3 
## 108 is over 2.5. It is: 3 
## 109 is over 2.5. It is: 3.1 
## 110 is over 2.5. It is: 3.1 
## 111 is over 2.5. It is: 3.1 
## 112 is over 2.5. It is: 3.1 
## 113 is over 2.5. It is: 3.1 
## 114 is over 2.5. It is: 3.1 
## 115 is over 2.5. It is: 3.2 
## 116 is over 2.5. It is: 3.2 
## 117 is over 2.5. It is: 3.2 
## 118 is over 2.5. It is: 3.2 
## 119 is over 2.5. It is: 3.2 
## 120 is over 2.5. It is: 3.2 
## 121 is over 2.5. It is: 3.3 
## 122 is over 2.5. It is: 3.3 
## 123 is over 2.5. It is: 3.3 
## 124 is over 2.5. It is: 3.3 
## 125 is over 2.5. It is: 3.3 
## 126 is over 2.5. It is: 3.4 
## 127 is over 2.5. It is: 3.4 
## 128 is over 2.5. It is: 3.4 
## 129 is over 2.5. It is: 3.4 
## 130 is over 2.5. It is: 3.4 
## 131 is over 2.5. It is: 3.5 
## 132 is over 2.5. It is: 3.5 
## 133 is over 2.5. It is: 3.5 
## 134 is over 2.5. It is: 3.5 
## 135 is over 2.5. It is: 3.5 
## 136 is over 2.5. It is: 3.6 
## 137 is over 2.5. It is: 3.6 
## 138 is over 2.5. It is: 3.6 
## 139 is over 2.5. It is: 3.6 
## 140 is over 2.5. It is: 3.7 
## 141 is over 2.5. It is: 3.8 
## 142 is over 2.5. It is: 3.8 
## 143 is over 2.5. It is: 3.9 
## 144 is over 2.5. It is: 3.9

For-loops

for (i in 1:ncol(cats)) {
  print(class(cats[, i]))
}
## [1] "factor"
## [1] "numeric"
## [1] "numeric"

For-loops

empty.cats <- matrix(NA, nrow = nrow(cats), ncol = ncol(cats))

for (i in 1:ncol(cats)) {
  empty.cats[, i] <- rep(i, nrow(cats))
}

head(empty.cats)
##      [,1] [,2] [,3]
## [1,]    1    2    3
## [2,]    1    2    3
## [3,]    1    2    3
## [4,]    1    2    3
## [5,]    1    2    3
## [6,]    1    2    3

For-loops in for-loops

for (i in 1:3) {
  for (j in 1:3) {
    print(paste(i, "x", j, "=", i*j))
  }
}
## [1] "1 x 1 = 1"
## [1] "1 x 2 = 2"
## [1] "1 x 3 = 3"
## [1] "2 x 1 = 2"
## [1] "2 x 2 = 4"
## [1] "2 x 3 = 6"
## [1] "3 x 1 = 3"
## [1] "3 x 2 = 6"
## [1] "3 x 3 = 9"

Looping over lists or vectors

my.list <- list(A = c(4, 2, 1:3), B = "Hello.", C = TRUE)

for (list.item in my.list) {
  cat("One element is", list.item, "\n")
}
## One element is 4 2 1 2 3 
## One element is Hello. 
## One element is TRUE

Looping over lists or vectors

But if we want to change the item, we have to be able to access it, so we are better off using 1:length(list), or 1:ncol(data.frame)/1:nrow(data.frame).

my.list <- list(A = c(4, 2, 1:3), B = "Hello.", C = c(TRUE, TRUE, FALSE))

for (iter in 1:length(my.list)) {
  my.list[[iter]] <- rev(my.list[[iter]])
}
my.list
## $A
## [1] 3 2 1 2 4
## 
## $B
## [1] "Hello."
## 
## $C
## [1] FALSE  TRUE  TRUE

The apply() family

apply()

The apply family is a group of very useful functions that allow you to easily execute a function of your choice over a list of objects, such as a list, a data.frame, or matrix.

We will look at three examples:

  • apply

  • sapply

  • lapply

apply()

apply is used for matrices (and sometimes dataframes). It can take a function that takes a vector as input, and apply it to each row or column.

apply()

MARGIN is 1 for rows, 2 for columns.

apply(cats[, -1], MARGIN = 2, mean)
##       Bwt       Hwt 
##  2.723611 10.630556

But we’ve seen this done easier:

colMeans(cats[, -1])
##       Bwt       Hwt 
##  2.723611 10.630556

apply()

However, the power of apply() is that it can use any function we throw at it.

apply()

rand.mat <- matrix(rnorm(21), nrow = 3, ncol = 7)
rand.mat
##             [,1]      [,2]       [,3]      [,4]       [,5]      [,6]
## [1,]  1.02067666  1.591758 -0.4878098  1.073193 -0.9022891 -1.122945
## [2,]  0.08410246 -2.076734 -0.2600368 -2.089625 -0.5825552 -1.998969
## [3,] -0.57307884 -1.229113 -0.6818855  1.164227 -0.9405451 -2.729420
##           [,7]
## [1,] 0.9160751
## [2,] 0.5380085
## [3,] 0.7395754
apply(rand.mat, MARGIN = 1, FUN = max)
## [1] 1.5917584 0.5380085 1.1642267

apply(rand.mat, MARGIN = 2, FUN = max)
## [1]  1.0206767  1.5917584 -0.2600368  1.1642267 -0.5825552 -1.1229453
## [7]  0.9160751

apply()

rand.mat
##             [,1]      [,2]       [,3]      [,4]       [,5]      [,6]
## [1,]  1.02067666  1.591758 -0.4878098  1.073193 -0.9022891 -1.122945
## [2,]  0.08410246 -2.076734 -0.2600368 -2.089625 -0.5825552 -1.998969
## [3,] -0.57307884 -1.229113 -0.6818855  1.164227 -0.9405451 -2.729420
##           [,7]
## [1,] 0.9160751
## [2,] 0.5380085
## [3,] 0.7395754
apply(rand.mat, MARGIN = 1, FUN = sum)
## [1]  2.088659 -6.385809 -4.250240

apply(rand.mat, MARGIN = 2, FUN = var)
## [1] 0.64151918 3.68893622 0.04458370 3.43320919 0.03864169 0.64695626
## [7] 0.03578596

sapply()

sapply() is used on list-objects.

my.list <- list(A = c(4, 2, 1:3), B = "Hello.", C = TRUE)
sapply(my.list, class)
##           A           B           C 
##   "numeric" "character"   "logical"

sapply()

my.list <- list(A = c(4, 2, 1:3), B = "Hello.", C = TRUE)
sapply(my.list, range)
##      A   B        C  
## [1,] "1" "Hello." "1"
## [2,] "4" "Hello." "1"

It returns a vector or a matrix, depending on the output of the function.

Why is each element a string?

sapply()

Any data.frame is also a list, where each column is one list-element.

class(cats)
## [1] "data.frame"
is.list(cats)
## [1] TRUE

sapply()

print.default(cats[1:10, ])
## $Sex
##  [1] F F F F F F F F F F
## Levels: F M
## 
## $Bwt
##  [1] 2.0 2.0 2.0 2.1 2.1 2.1 2.1 2.1 2.1 2.1
## 
## $Hwt
##  [1] 7.0 7.4 9.5 7.2 7.3 7.6 8.1 8.2 8.3 8.5
## 
## attr(,"class")
## [1] "data.frame"

sapply()

This means we can use sapply on data frames as well, which is often useful.

sapply(cats, class)
##       Sex       Bwt       Hwt 
##  "factor" "numeric" "numeric"

lapply()

lapply() is exactly the same as sapply(), but it returns a list instead of a vector.

lapply(cats, class)
## $Sex
## [1] "factor"
## 
## $Bwt
## [1] "numeric"
## 
## $Hwt
## [1] "numeric"

Writing your own functions

What are functions?

Functions are reusable pieces of code that take an input, do some computation on the input, and return output.

We have been using a lot of functions: code of the form something() is usually a function.

mean(1:6)
## [1] 3.5

Our own function

We can make our own functions as follows:

squared <- function (x) {
  x.square <- x * x
  return(x.square)
}

squared(4)
## [1] 16

x, the input, is called the (formal) argument of the function. x.square is called the return value.

Our own function

If there is no return(), the last line is automatically returned, so we can also just write:

squared <- function (x) {
  x * x
}

squared(-2)
## [1] 4

Our own function

We can also combine this with apply()

rand.mat
##             [,1]      [,2]       [,3]      [,4]       [,5]      [,6]
## [1,]  1.02067666  1.591758 -0.4878098  1.073193 -0.9022891 -1.122945
## [2,]  0.08410246 -2.076734 -0.2600368 -2.089625 -0.5825552 -1.998969
## [3,] -0.57307884 -1.229113 -0.6818855  1.164227 -0.9405451 -2.729420
##           [,7]
## [1,] 0.9160751
## [2,] 0.5380085
## [3,] 0.7395754
apply(rand.mat, 2, squared)
##             [,1]     [,2]       [,3]     [,4]      [,5]     [,6]      [,7]
## [1,] 1.041780838 2.533695 0.23795839 1.151743 0.8141257 1.261006 0.8391936
## [2,] 0.007073224 4.312826 0.06761916 4.366533 0.3393705 3.995876 0.2894531
## [3,] 0.328419359 1.510718 0.46496786 1.355424 0.8846251 7.449734 0.5469717

Our own function

areDivisible <- function (x, y) {
  if (x > y) {
    larger  <- x
    smaller <- y
  } else {
    larger  <- y
    smaller <- x
  }
  remainder <- larger %% smaller 
  remainder == 0
}

Our own function

areDivisible(3, 10)
## [1] FALSE
areDivisible(3, 9)
## [1] TRUE
areDivisible(9, 3)
## [1] TRUE

Default options in functions

  • Default options for some arguments are provided in many functions.

  • They allow us to provide an additional option, but if no choice is provided, we can choose for the user of the function.

areDivisible <- function (x, y, printInput = TRUE) {
  if (printInput) {
    cat("Testing if", x, "and", y, "are divisible: \n")
  }

  ((x %% y) == 0) | ((y %% x) == 0)
}

Default options in functions

areDivisible(3, 10)
## Testing if 3 and 10 are divisible:
## [1] FALSE
areDivisible(3, 9, printInput = TRUE)
## Testing if 3 and 9 are divisible:
## [1] TRUE
areDivisible(9, 3, printInput = FALSE)
## [1] TRUE

Troubleshooting

  • Your first self-written for-loop, or function, will probably not work.

  • Don’t panic! Just go line-by-line, keeping track of what is currently inside each variable.

Troubleshooting

rand.vec <- rnorm(10)

# Add ten to each element.
for (i in 1:rand.vec) {
  i <- i + 10
}
## Warning in 1:rand.vec: numerical expression has 10 elements: only the first
## used
rand.vec
##  [1]  0.79339773 -0.75623507 -1.80975295  1.12497773  0.05506155
##  [6] -0.96945053 -1.86147685 -0.10418630  1.42688703 -0.97705275

Troubleshooting

# Add ten to each element.
for (i in 1:rand.vec) {
    i <- i + 10
}
1:rand.vec
## Warning in 1:rand.vec: numerical expression has 10 elements: only the first
## used
## [1] 1
1:length(rand.vec)
##  [1]  1  2  3  4  5  6  7  8  9 10

Troubleshooting

Let’s try again:

# Add ten to each element.
for (i in 1:length(rand.vec)) {
  i <- i + 10
}
rand.vec
##  [1]  0.79339773 -0.75623507 -1.80975295  1.12497773  0.05506155
##  [6] -0.96945053 -1.86147685 -0.10418630  1.42688703 -0.97705275

Still, no luck.

Troubleshooting

# Add ten to each element.
for (i in 1:length(rand.vec)) {
  i <- i + 10
}
i <- 1
i
## [1] 1

Troubleshooting

# Add ten to each element.
for (i in 1:length(rand.vec)) {
  i <- i + 10
}
i <- i + 10
i
## [1] 11
i <- 2
i
## [1] 2

Troubleshooting

# Add ten to each element.
for (i in 1:length(rand.vec)) {
  rand.vec[i] <- rand.vec[i] + 10
}

rand.vec
##  [1] 10.793398  9.243765  8.190247 11.124978 10.055062  9.030549  8.138523
##  [8]  9.895814 11.426887  9.022947

To conclude

Practical