If-statements
For-loops
Apply()
Writing your own functions
Statistical Programming in R
If-statements
For-loops
Apply()
Writing your own functions
if(cond)
expr
if(cond)
cons.expr
else
alt.expr
for(var in seq)
expr
rev()
: reverse version of an argumentapply()
: apply a function to margins of a matrixsapply()
: apply a function to elements of a list, vector
or matrix
returnlapply()
: apply a function to elements of a list, list
returnprint()
: print an object to the consolecat
: outputs an object with less conversion than print()
Often, we want to run some code only if something condition is true.
For example:
a <- 2 if (a > 5) { print("a is larger than 5.") }
a <- 8 if (a > 5) { print("a is larger than 5.") }
## [1] "a is larger than 5."
We can also specify something to be run if the condition is not true.
a <- 2 if (a > 5) { print("a is larger than 5.") } else { print("a is smaller than 5.") }
## [1] "a is smaller than 5."
a <- 8 if (a > 5) { print("a is larger than 5.") } else { print("a is smaller than 5.") }
## [1] "a is larger than 5."
Very often, we want to do something multiple times.
It is tedious, and often impossible, to write this out completely.
# Let's print the numbers 1 to 6 one by one. print(1)
## [1] 1
print(2)
## [1] 2
print(3)
## [1] 3
print(4)
## [1] 4
print(5)
## [1] 5
print(6)
## [1] 6
For-loops allow us to automate this!
for (i in 1:6) { print(i) }
## [1] 1 ## [1] 2 ## [1] 3 ## [1] 4 ## [1] 5 ## [1] 6
for (some.var.name in 1:6) { print(some.var.name) }
## [1] 1 ## [1] 2 ## [1] 3 ## [1] 4 ## [1] 5 ## [1] 6
for (i in 1:6) { print(i < 5) }
## [1] TRUE ## [1] TRUE ## [1] TRUE ## [1] TRUE ## [1] FALSE ## [1] FALSE
for (i in 1:nrow(cats)) { if (cats$Bwt[i] > 2.5) { cat(i, "is over 2.5. It is:", cats$Bwt[i], "\n") } }
## 37 is over 2.5. It is: 2.6 ## 38 is over 2.5. It is: 2.6 ## 39 is over 2.5. It is: 2.6 ## 40 is over 2.5. It is: 2.7 ## 41 is over 2.5. It is: 2.7 ## 42 is over 2.5. It is: 2.7 ## 43 is over 2.5. It is: 2.9 ## 44 is over 2.5. It is: 2.9 ## 45 is over 2.5. It is: 2.9 ## 46 is over 2.5. It is: 3 ## 47 is over 2.5. It is: 3 ## 73 is over 2.5. It is: 2.6 ## 74 is over 2.5. It is: 2.6 ## 75 is over 2.5. It is: 2.6 ## 76 is over 2.5. It is: 2.6 ## 77 is over 2.5. It is: 2.6 ## 78 is over 2.5. It is: 2.6 ## 79 is over 2.5. It is: 2.7 ## 80 is over 2.5. It is: 2.7 ## 81 is over 2.5. It is: 2.7 ## 82 is over 2.5. It is: 2.7 ## 83 is over 2.5. It is: 2.7 ## 84 is over 2.5. It is: 2.7 ## 85 is over 2.5. It is: 2.7 ## 86 is over 2.5. It is: 2.7 ## 87 is over 2.5. It is: 2.7 ## 88 is over 2.5. It is: 2.8 ## 89 is over 2.5. It is: 2.8 ## 90 is over 2.5. It is: 2.8 ## 91 is over 2.5. It is: 2.8 ## 92 is over 2.5. It is: 2.8 ## 93 is over 2.5. It is: 2.8 ## 94 is over 2.5. It is: 2.8 ## 95 is over 2.5. It is: 2.9 ## 96 is over 2.5. It is: 2.9 ## 97 is over 2.5. It is: 2.9 ## 98 is over 2.5. It is: 2.9 ## 99 is over 2.5. It is: 2.9 ## 100 is over 2.5. It is: 3 ## 101 is over 2.5. It is: 3 ## 102 is over 2.5. It is: 3 ## 103 is over 2.5. It is: 3 ## 104 is over 2.5. It is: 3 ## 105 is over 2.5. It is: 3 ## 106 is over 2.5. It is: 3 ## 107 is over 2.5. It is: 3 ## 108 is over 2.5. It is: 3 ## 109 is over 2.5. It is: 3.1 ## 110 is over 2.5. It is: 3.1 ## 111 is over 2.5. It is: 3.1 ## 112 is over 2.5. It is: 3.1 ## 113 is over 2.5. It is: 3.1 ## 114 is over 2.5. It is: 3.1 ## 115 is over 2.5. It is: 3.2 ## 116 is over 2.5. It is: 3.2 ## 117 is over 2.5. It is: 3.2 ## 118 is over 2.5. It is: 3.2 ## 119 is over 2.5. It is: 3.2 ## 120 is over 2.5. It is: 3.2 ## 121 is over 2.5. It is: 3.3 ## 122 is over 2.5. It is: 3.3 ## 123 is over 2.5. It is: 3.3 ## 124 is over 2.5. It is: 3.3 ## 125 is over 2.5. It is: 3.3 ## 126 is over 2.5. It is: 3.4 ## 127 is over 2.5. It is: 3.4 ## 128 is over 2.5. It is: 3.4 ## 129 is over 2.5. It is: 3.4 ## 130 is over 2.5. It is: 3.4 ## 131 is over 2.5. It is: 3.5 ## 132 is over 2.5. It is: 3.5 ## 133 is over 2.5. It is: 3.5 ## 134 is over 2.5. It is: 3.5 ## 135 is over 2.5. It is: 3.5 ## 136 is over 2.5. It is: 3.6 ## 137 is over 2.5. It is: 3.6 ## 138 is over 2.5. It is: 3.6 ## 139 is over 2.5. It is: 3.6 ## 140 is over 2.5. It is: 3.7 ## 141 is over 2.5. It is: 3.8 ## 142 is over 2.5. It is: 3.8 ## 143 is over 2.5. It is: 3.9 ## 144 is over 2.5. It is: 3.9
for (i in 1:ncol(cats)) { print(class(cats[, i])) }
## [1] "factor" ## [1] "numeric" ## [1] "numeric"
empty.cats <- matrix(NA, nrow = nrow(cats), ncol = ncol(cats)) for (i in 1:ncol(cats)) { empty.cats[, i] <- rep(i, nrow(cats)) } head(empty.cats)
## [,1] [,2] [,3] ## [1,] 1 2 3 ## [2,] 1 2 3 ## [3,] 1 2 3 ## [4,] 1 2 3 ## [5,] 1 2 3 ## [6,] 1 2 3
for (i in 1:3) { for (j in 1:3) { print(paste(i, "x", j, "=", i*j)) } }
## [1] "1 x 1 = 1" ## [1] "1 x 2 = 2" ## [1] "1 x 3 = 3" ## [1] "2 x 1 = 2" ## [1] "2 x 2 = 4" ## [1] "2 x 3 = 6" ## [1] "3 x 1 = 3" ## [1] "3 x 2 = 6" ## [1] "3 x 3 = 9"
my.list <- list(A = c(4, 2, 1:3), B = "Hello.", C = TRUE) for (list.item in my.list) { cat("One element is", list.item, "\n") }
## One element is 4 2 1 2 3 ## One element is Hello. ## One element is TRUE
But if we want to change the item, we have to be able to access it, so we are better off using 1:length(list)
, or 1:ncol(data.frame)
/1:nrow(data.frame)
.
my.list <- list(A = c(4, 2, 1:3), B = "Hello.", C = c(TRUE, TRUE, FALSE)) for (iter in 1:length(my.list)) { my.list[[iter]] <- rev(my.list[[iter]]) } my.list
## $A ## [1] 3 2 1 2 4 ## ## $B ## [1] "Hello." ## ## $C ## [1] FALSE TRUE TRUE
apply()
familyapply()
The apply
family is a group of very useful functions that allow you to easily execute a function of your choice over a list of objects, such as a list
, a data.frame
, or matrix
.
We will look at three examples:
apply
sapply
lapply
apply()
apply
is used for matrices (and sometimes dataframes). It can take a function that takes a vector as input, and apply it to each row or column.
apply()
MARGIN is 1 for rows, 2 for columns.
apply(cats[, -1], MARGIN = 2, mean)
## Bwt Hwt ## 2.723611 10.630556
But we’ve seen this done easier:
colMeans(cats[, -1])
## Bwt Hwt ## 2.723611 10.630556
apply()
However, the power of apply()
is that it can use any function we throw at it.
apply()
rand.mat <- matrix(rnorm(21), nrow = 3, ncol = 7) rand.mat
## [,1] [,2] [,3] [,4] [,5] [,6] ## [1,] 1.02067666 1.591758 -0.4878098 1.073193 -0.9022891 -1.122945 ## [2,] 0.08410246 -2.076734 -0.2600368 -2.089625 -0.5825552 -1.998969 ## [3,] -0.57307884 -1.229113 -0.6818855 1.164227 -0.9405451 -2.729420 ## [,7] ## [1,] 0.9160751 ## [2,] 0.5380085 ## [3,] 0.7395754
apply(rand.mat, MARGIN = 1, FUN = max)
## [1] 1.5917584 0.5380085 1.1642267
apply(rand.mat, MARGIN = 2, FUN = max)
## [1] 1.0206767 1.5917584 -0.2600368 1.1642267 -0.5825552 -1.1229453 ## [7] 0.9160751
apply()
rand.mat
## [,1] [,2] [,3] [,4] [,5] [,6] ## [1,] 1.02067666 1.591758 -0.4878098 1.073193 -0.9022891 -1.122945 ## [2,] 0.08410246 -2.076734 -0.2600368 -2.089625 -0.5825552 -1.998969 ## [3,] -0.57307884 -1.229113 -0.6818855 1.164227 -0.9405451 -2.729420 ## [,7] ## [1,] 0.9160751 ## [2,] 0.5380085 ## [3,] 0.7395754
apply(rand.mat, MARGIN = 1, FUN = sum)
## [1] 2.088659 -6.385809 -4.250240
apply(rand.mat, MARGIN = 2, FUN = var)
## [1] 0.64151918 3.68893622 0.04458370 3.43320919 0.03864169 0.64695626 ## [7] 0.03578596
sapply()
sapply()
is used on list
-objects.
my.list <- list(A = c(4, 2, 1:3), B = "Hello.", C = TRUE) sapply(my.list, class)
## A B C ## "numeric" "character" "logical"
sapply()
my.list <- list(A = c(4, 2, 1:3), B = "Hello.", C = TRUE) sapply(my.list, range)
## A B C ## [1,] "1" "Hello." "1" ## [2,] "4" "Hello." "1"
It returns a vector or a matrix, depending on the output of the function.
Why is each element a string?
sapply()
Any data.frame
is also a list
, where each column is one list
-element.
class(cats)
## [1] "data.frame"
is.list(cats)
## [1] TRUE
sapply()
print.default(cats[1:10, ])
## $Sex ## [1] F F F F F F F F F F ## Levels: F M ## ## $Bwt ## [1] 2.0 2.0 2.0 2.1 2.1 2.1 2.1 2.1 2.1 2.1 ## ## $Hwt ## [1] 7.0 7.4 9.5 7.2 7.3 7.6 8.1 8.2 8.3 8.5 ## ## attr(,"class") ## [1] "data.frame"
sapply()
This means we can use sapply
on data frames as well, which is often useful.
sapply(cats, class)
## Sex Bwt Hwt ## "factor" "numeric" "numeric"
lapply()
lapply()
is exactly the same as sapply()
, but it returns a list instead of a vector.
lapply(cats, class)
## $Sex ## [1] "factor" ## ## $Bwt ## [1] "numeric" ## ## $Hwt ## [1] "numeric"
Functions are reusable pieces of code that take an input, do some computation on the input, and return output.
We have been using a lot of functions: code of the form something()
is usually a function.
mean(1:6)
## [1] 3.5
We can make our own functions as follows:
squared <- function (x) { x.square <- x * x return(x.square) } squared(4)
## [1] 16
x
, the input, is called the (formal) argument of the function. x.square
is called the return value.
If there is no return()
, the last line is automatically returned, so we can also just write:
squared <- function (x) { x * x } squared(-2)
## [1] 4
We can also combine this with apply()
rand.mat
## [,1] [,2] [,3] [,4] [,5] [,6] ## [1,] 1.02067666 1.591758 -0.4878098 1.073193 -0.9022891 -1.122945 ## [2,] 0.08410246 -2.076734 -0.2600368 -2.089625 -0.5825552 -1.998969 ## [3,] -0.57307884 -1.229113 -0.6818855 1.164227 -0.9405451 -2.729420 ## [,7] ## [1,] 0.9160751 ## [2,] 0.5380085 ## [3,] 0.7395754
apply(rand.mat, 2, squared)
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] ## [1,] 1.041780838 2.533695 0.23795839 1.151743 0.8141257 1.261006 0.8391936 ## [2,] 0.007073224 4.312826 0.06761916 4.366533 0.3393705 3.995876 0.2894531 ## [3,] 0.328419359 1.510718 0.46496786 1.355424 0.8846251 7.449734 0.5469717
areDivisible <- function (x, y) { if (x > y) { larger <- x smaller <- y } else { larger <- y smaller <- x } remainder <- larger %% smaller remainder == 0 }
areDivisible(3, 10)
## [1] FALSE
areDivisible(3, 9)
## [1] TRUE
areDivisible(9, 3)
## [1] TRUE
Default options for some arguments are provided in many functions.
They allow us to provide an additional option, but if no choice is provided, we can choose for the user of the function.
areDivisible <- function (x, y, printInput = TRUE) { if (printInput) { cat("Testing if", x, "and", y, "are divisible: \n") } ((x %% y) == 0) | ((y %% x) == 0) }
areDivisible(3, 10)
## Testing if 3 and 10 are divisible:
## [1] FALSE
areDivisible(3, 9, printInput = TRUE)
## Testing if 3 and 9 are divisible:
## [1] TRUE
areDivisible(9, 3, printInput = FALSE)
## [1] TRUE
Your first self-written for-loop, or function, will probably not work.
Don’t panic! Just go line-by-line, keeping track of what is currently inside each variable.
rand.vec <- rnorm(10) # Add ten to each element. for (i in 1:rand.vec) { i <- i + 10 }
## Warning in 1:rand.vec: numerical expression has 10 elements: only the first ## used
rand.vec
## [1] 0.79339773 -0.75623507 -1.80975295 1.12497773 0.05506155 ## [6] -0.96945053 -1.86147685 -0.10418630 1.42688703 -0.97705275
# Add ten to each element. for (i in 1:rand.vec) { i <- i + 10 }
1:rand.vec
## Warning in 1:rand.vec: numerical expression has 10 elements: only the first ## used
## [1] 1
1:length(rand.vec)
## [1] 1 2 3 4 5 6 7 8 9 10
Let’s try again:
# Add ten to each element. for (i in 1:length(rand.vec)) { i <- i + 10 }
rand.vec
## [1] 0.79339773 -0.75623507 -1.80975295 1.12497773 0.05506155 ## [6] -0.96945053 -1.86147685 -0.10418630 1.42688703 -0.97705275
Still, no luck.
# Add ten to each element. for (i in 1:length(rand.vec)) { i <- i + 10 }
i <- 1 i
## [1] 1
# Add ten to each element. for (i in 1:length(rand.vec)) { i <- i + 10 }
i <- i + 10 i
## [1] 11
i <- 2 i
## [1] 2
# Add ten to each element. for (i in 1:length(rand.vec)) { rand.vec[i] <- rand.vec[i] + 10 } rand.vec
## [1] 10.793398 9.243765 8.190247 11.124978 10.055062 9.030549 8.138523 ## [8] 9.895814 11.426887 9.022947