Practical F

Exercises

The following packages are required for this practical:

library(dplyr)
library(magrittr)
library(mice)

## Warning: package 'mice' was built under R version 3.5.1

and if you’d like the same results as I have obtained, you can fix the random seed

set.seed(123)

Use a pipe to do the following:

draw 1000 values from a normal distribution with mean = 5 and sd = 1 - $N(5, 1)$,
create a matrix where the first 500 values are the first column and the second 500 values are the second column **
make a scatterplot of these two columns

rnorm(1000, 5) %>%
  matrix(ncol = 2) %>%
  plot()

Use a pipe to assign values 1:5 to object x and verify that the object exists.

Normally, when we use the following code to assign values to an object, we can directly run

x <- 1:5

However, when we would like to do this in a pipe, we run into a problem.

"x" %>% assign(1:5)
x

## Error in eval(expr, envir, enclos): object 'x' not found

The pipe creates a seperate, temporary environment where all things %>% take place (environments were discussed in Lecture C). This environment is different from the Global Environment and disappears once the pipe is finished. In other words, we assign 1:5 to object x, but once we are done assigning, object x is deleted.

Function assign() is part of a class of functions that uses the current environment (the one that it is called from) to do its business. For such functions, we need to be explicit about the environment we would like the funtion to use:

env <- environment()
"x" %>% assign(1:5, envir = env)
x

## [1] 1 2 3 4 5

Now we have explicitly instructed function assign() to use the Global Environment:

environment()

## <environment: R_GlobalEnv>

We could also create a new environment to assign values to objects

assign.env <- new.env() 
"x" %>% assign(letters[1:5], envir = assign.env)

But then we need to call x from assign.env

assign.env$x

## [1] "a" "b" "c" "d" "e"

because otherwise we would still get x from R_GlobalEnv

## [1] 1 2 3 4 5

Use a pipe to calculate the correlation matrix on the anscombe data set

anscombe %>%
  cor()

##            x1         x2         x3         x4         y1         y2
## x1  1.0000000  1.0000000  1.0000000 -0.5000000  0.8164205  0.8162365
## x2  1.0000000  1.0000000  1.0000000 -0.5000000  0.8164205  0.8162365
## x3  1.0000000  1.0000000  1.0000000 -0.5000000  0.8164205  0.8162365
## x4 -0.5000000 -0.5000000 -0.5000000  1.0000000 -0.5290927 -0.7184365
## y1  0.8164205  0.8164205  0.8164205 -0.5290927  1.0000000  0.7500054
## y2  0.8162365  0.8162365  0.8162365 -0.7184365  0.7500054  1.0000000
## y3  0.8162867  0.8162867  0.8162867 -0.3446610  0.4687167  0.5879193
## y4 -0.3140467 -0.3140467 -0.3140467  0.8165214 -0.4891162 -0.4780949
##            y3         y4
## x1  0.8162867 -0.3140467
## x2  0.8162867 -0.3140467
## x3  0.8162867 -0.3140467
## x4 -0.3446610  0.8165214
## y1  0.4687167 -0.4891162
## y2  0.5879193 -0.4780949
## y3  1.0000000 -0.1554718
## y4 -0.1554718  1.0000000

Now use a pipe to calculate the correlation for the pair (x4, y4) on the anscombe data set

Using the standard %>% pipe:

anscombe %>%
  subset(select = c(x4, y4)) %>%
  cor()

##           x4        y4
## x4 1.0000000 0.8165214
## y4 0.8165214 1.0000000

Alternatively, we can use the %$% pipe from package magrittr to make this process much more efficient.

anscombe %$%
  cor(x4, y4)

## [1] 0.8165214

Use a pipe to calculate the correlation between hgt and wgt in the boys data set from package mice.

Because boys has missings values for almost all variables, we must first select wgt and hgt and then omit the rows that have missing values, before we can calculate the correlation. Using the standard %>% pipe, this would look like:

boys %>%
  subset(select = c("wgt", "hgt")) %>%
  cor(use = "pairwise.complete.obs")

##           wgt       hgt
## wgt 1.0000000 0.9428906
## hgt 0.9428906 1.0000000

which is equivalent to

boys %>%
  subset(select = c("wgt", "hgt")) %>%
  na.omit() %>%
  cor()

##           wgt       hgt
## wgt 1.0000000 0.9428906
## hgt 0.9428906 1.0000000

Alternatively, we can use the %$% pipe:

boys %$% 
  cor(hgt, wgt, use = "pairwise.complete.obs")

## [1] 0.9428906

The %$% pipe unfolds the listed dimensions of the boys dataset, such that we can refer to them directly.

In the boys data set, hgt is recorded in centimeters. Use a pipe to transform hgt in the boys dataset to height in meters and verify the transformation

Using the standard %>% and the %$% pipes:

boys %>%
  transform(hgt = hgt / 100) %$%
  mean(hgt, na.rm = TRUE)

## [1] 1.321518

Use a pipe to plot the pair (hgt, wgt) two times: once for hgt in meters and once for hgt in centimeters. Make the points in the ‘centimeter’ plot red and in the ‘meter’ plot blue.

This is best done with the %T>% pipe:

boys %>%
  subset(select = c(hgt, wgt)) %T>%
  plot(col = "red", main = "Height in centimeters") %>%
  transform(hgt = hgt / 100) %>%
  plot(col = "blue", main = "Height in meters")

The %T>% pipe is very useful, because it creates a literal T junction in the pipe. It is perhaps most informative to graphically represent the above pipe as follows:

boys %>%
  subset(select = c(hgt, wgt)) %T>%
  plot(col = "red", main = "Height in centimeters") %>%
  transform(hgt = hgt / 100) %>%
  plot(col = "blue", main = "Height in meters")

We can see that there is indeed a literal T-junction. Naturally, we can expand this process with more %T>% pipes. However, once a pipe gets too long or too complicated, it is perhaps more useful to cut the piped problem into smaller, manageble pieces.

End of Practical

Practical F

Gerko Vink

Statistical Programming in R

Exercises

Useful References