Statistical Programming in R
This morning we have learned the basics of programming in R
:
<-
)R
-scriptsRStudio
a <- c(1, 2, 3, 4, 5) a
## [1] 1 2 3 4 5
b <- 1:5 b
## [1] 1 2 3 4 5
Characters (or character strings) in R
are indicated by the double quote identifier.
a.new <- c(a, "A") a.new
## [1] "1" "2" "3" "4" "5" "A"
Notice the difference with a
from the previous slide
a
## [1] 1 2 3 4 5
rep(a, 15)
## [1] 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 ## [36] 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 ## [71] 1 2 3 4 5
If we would want just the third element, we would type
a[3]
## [1] 3
This we would refer to as a matrix
c <- matrix(a, nrow = 5, ncol = 2) c
## [,1] [,2] ## [1,] 1 1 ## [2,] 2 2 ## [3,] 3 3 ## [4,] 4 4 ## [5,] 5 5
c[1, ]
## [1] 1 1
c[, 2]
## [1] 1 2 3 4 5
c[1, 2]
## [1] 1
In short; square brackets [] are used to call elements, rows, columns (and much more beyond the scope of this course)
If we add a character column to matrix c
; everything becomes a character:
cbind(c, letters[1:5])
## [,1] [,2] [,3] ## [1,] "1" "1" "a" ## [2,] "2" "2" "b" ## [3,] "3" "3" "c" ## [4,] "4" "4" "d" ## [5,] "5" "5" "e"
Alternatively,
cbind(c, c("a", "b", "c", "d", "e"))
## [,1] [,2] [,3] ## [1,] "1" "1" "a" ## [2,] "2" "2" "b" ## [3,] "3" "3" "c" ## [4,] "4" "4" "d" ## [5,] "5" "5" "e"
Remember, matrices and vectors are numerical OR character objects. They can never contain both and still be used for numerical calculations.
d <- data.frame("V1" = rnorm(5), "V2" = rnorm(5, mean = 5, sd = 2), "V3" = letters[1:5]) d
## V1 V2 V3 ## 1 -0.56047565 8.430130 a ## 2 -0.23017749 5.921832 b ## 3 1.55870831 2.469878 c ## 4 0.07050839 3.626294 d ## 5 0.12928774 4.108676 e
We ‘filled’ a dataframe with two randomly generated sets from the normal distribution - where \(V1\) is standard normal and \(V2 \sim N(5,2)\) - and a character set.
Data frames can contain both numerical and character elements at the same time, although never in the same column.
You can name the columns and rows in data frames (just like in matrices)
row.names(d) <- c("row 1", "row 2", "row 3", "row 4", "row 5") d
## V1 V2 V3 ## row 1 -0.56047565 8.430130 a ## row 2 -0.23017749 5.921832 b ## row 3 1.55870831 2.469878 c ## row 4 0.07050839 3.626294 d ## row 5 0.12928774 4.108676 e
There are two ways to obtain row 3
from data frame d
:
d["row 3", ]
## V1 V2 V3 ## row 3 1.558708 2.469878 c
and
d[3, ]
## V1 V2 V3 ## row 3 1.558708 2.469878 c
The intersection between row 2 and column 4 can be obtained by
d[2, 3]
## [1] b ## Levels: a b c d e
Both
d[, "V2"] # and
## [1] 8.430130 5.921832 2.469878 3.626294 4.108676
d[, 2]
## [1] 8.430130 5.921832 2.469878 3.626294 4.108676
yield the second column. But we can also use $
to call variable names in data frame objects
d$V2
## [1] 8.430130 5.921832 2.469878 3.626294 4.108676
If you wish to use numerical objects that have more than two dimension, an array would be a suitable object. The following code yields a 3-dimensional array (2 rows, 4 columns and 3 matrices):
e <- array(1:24, dim = c(2, 4, 3)) e
## , , 1 ## ## [,1] [,2] [,3] [,4] ## [1,] 1 3 5 7 ## [2,] 2 4 6 8 ## ## , , 2 ## ## [,1] [,2] [,3] [,4] ## [1,] 9 11 13 15 ## [2,] 10 12 14 16 ## ## , , 3 ## ## [,1] [,2] [,3] [,4] ## [1,] 17 19 21 23 ## [2,] 18 20 22 24
The square bracket identification works similarly to the identification of matrices and dataframes, but with the added dimension(s). For example,
e[1, 3, 2]
## [1] 13
yields the element in the first row of the third column in the second matrix. This is exactly the downside to an array: it is a series of matrices.
In other words, characters and numerical elements may not be mixed.
If we replace the third matrix in the array by a character version of that matrix, we obtain
e[, , 3] <- as.character(e[, , 3]) e
## , , 1 ## ## [,1] [,2] [,3] [,4] ## [1,] "1" "3" "5" "7" ## [2,] "2" "4" "6" "8" ## ## , , 2 ## ## [,1] [,2] [,3] [,4] ## [1,] "9" "11" "13" "15" ## [2,] "10" "12" "14" "16" ## ## , , 3 ## ## [,1] [,2] [,3] [,4] ## [1,] "17" "19" "21" "23" ## [2,] "18" "20" "22" "24"
List are just what it says they are: lists. You can have a list of everything mixed with everything. For example, an simple list can be created by
f <- list(a) f
## [[1]] ## [1] 1 2 3 4 5
Elements or objects within lists can be called by using double square brackets [[]]. For example, the first (and only) element in list f
is object a
f[[1]]
## [1] 1 2 3 4 5
We can simply add an object or element to an existing list
f[[2]] <- d f
## [[1]] ## [1] 1 2 3 4 5 ## ## [[2]] ## V1 V2 V3 ## row 1 -0.56047565 8.430130 a ## row 2 -0.23017749 5.921832 b ## row 3 1.55870831 2.469878 c ## row 4 0.07050839 3.626294 d ## row 5 0.12928774 4.108676 e
to obtain a list with a vector and a data frame.
We can add names to the list as follows
names(f) <- c("vector", "data frame") f
## $vector ## [1] 1 2 3 4 5 ## ## $`data frame` ## V1 V2 V3 ## row 1 -0.56047565 8.430130 a ## row 2 -0.23017749 5.921832 b ## row 3 1.55870831 2.469878 c ## row 4 0.07050839 3.626294 d ## row 5 0.12928774 4.108676 e
Calling the vector (a) from the list can be done as follows
f[[1]]
## [1] 1 2 3 4 5
f[["vector"]]
## [1] 1 2 3 4 5
f$vector
## [1] 1 2 3 4 5
Take the following example
g <- list(f, f)
To call the vector from the second list within the list g, use the following code
g[[2]][[1]]
## [1] 1 2 3 4 5
g[[2]]$vector
## [1] 1 2 3 4 5
Logical operators are signs that evaluate a statement, such as ==
, <
, >
, <=
, >=
, and |
(OR) as well as &
(AND). Typing !
before a logical operator takes the complement of that action. There are more operations, but these are the most useful.
For example, if we would like elements out of matrix c
that are larger than 3, we would type:
c[c > 3]
## [1] 4 5 4 5
c > 3
## [,1] [,2] ## [1,] FALSE FALSE ## [2,] FALSE FALSE ## [3,] FALSE FALSE ## [4,] TRUE TRUE ## [5,] TRUE TRUE
The column values for TRUE
may be of different length. A vector as a return is therefore more appropriate.
c[c < 3 | c > 3] #c smaller than 3 or larger than 3
## [1] 1 2 4 5 1 2 4 5
or
c[c != 3] #c not equal to 3
## [1] 1 2 4 5 1 2 4 5
c != 3
returns a matrix## [,1] [,2] ## [1,] TRUE TRUE ## [2,] TRUE TRUE ## [3,] FALSE FALSE ## [4,] TRUE TRUE ## [5,] TRUE TRUE
c
?:## [,1] [,2] ## [1,] 1 1 ## [2,] 2 2 ## [3,] 3 3 ## [4,] 4 4 ## [5,] 5 5
0 / 0
## [1] NaN
mean(c(1, 2, NA, 4, 5))
## [1] NA
There are two easy ways to perform “listwise deletion”:
mean(c(1, 2, NA, 4, 5), na.rm = TRUE)
## [1] 3
mean(na.omit(c(1, 2, NA, 4, 5)))
## [1] 3
(round(1740 / 600, 0) - 1740 / 600)
## [1] 0.1
(round(1740 / 600, 0) - 1740 / 600) <= 0.1
## [1] FALSE
(round(1740 / 600, 0) - 1740 / 600) <= 0.11
## [1] TRUE
(3 - 2.9)
## [1] 0.1
(3 - 2.9) <= 0.1
## [1] FALSE
(3 - 2.9) - .1
## [1] 8.326673e-17
#
) to clarify what you are doing
R
-scripts
RStudio
projectsAim to make the exercises without looking at the answers.
If this does not work out –> switch to the answer-based practical.
In any case; ask for help when you feel help is needed.