We use the following packages in this Practical:

library(MASS)
library(plyr)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:plyr':
## 
##     arrange, count, desc, failwith, id, mutate, rename, summarise,
##     summarize
## The following object is masked from 'package:MASS':
## 
##     select
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(magrittr)
library(ggplot2)

If you need package plyr and package dplyr, always load package plyr first! If you load dplyr before plyr, plyr will produce a warning.


Exercises


In this exercise we will again be using random number generators. When using random numbers, it is wise to always fix the random seed (the starting point of the number generation). This ensures that in the future we can exactly reproduce the chain of executions that led to the randomly generated results.

Start by setting a random seed. If you follow my random seed and reproduce the code below in exactly the same order, you will get the same results. If you do not follow the exact ordering (i.e. if you skip or rerun a question, or have different code), your results may be different due to random sampling. This is not a bad thing! It is just a result of the method and should be that way.

#set random seed, make things reproducible
set.seed(123)

  1. A group of experimenters have 4 experimental conditions they want to run. In each of these four conditions, there are seven manipulations that should be in a random order. Design the experiment such that for every condition, the seven manipulations are randomly ordered.
# Make an empty matrix. 
design <- matrix(NA, 7, 4)

# Name the columns and rows. 
colnames(design) <- paste("condition", 1:4, sep=" ")
rownames(design) <- paste("manipulation",  1:7, sep=" ")

# Put a random ordering in each column.
for(j in 1:ncol(design)) {
    design[, j] <- sample(1:7, replace=FALSE)
}

design
##                condition 1 condition 2 condition 3 condition 4
## manipulation 1           3           7           1           5
## manipulation 2           5           4           6           4
## manipulation 3           7           3           2           7
## manipulation 4           4           6           7           3
## manipulation 5           6           2           4           6
## manipulation 6           1           5           5           2
## manipulation 7           2           1           3           1

  1. Generate a vector of 100 random standard normal numbers.
# Get random numbers.
y <- rnorm(100)

  1. Compute the mean and standard deviation of the vector from question 2.
mean(y)
## [1] 0.005996774
sd(y)
## [1] 0.9135814

  1. Generate a vector of 100 random standard normal numbers (like in Exercise 2) 25 times, and each time store means in object av. Compute the standard deviation of av.
# Repeat the sampling of numbers 25 times, each time getting the mean.
av <- numeric(25)
for(i in 1:25) {
    av[i] <- mean(rnorm(100))
}

# Standard error of sample mean.
sd(av) 
## [1] 0.07753845

or alternatively with function rlply() from package plyr:

samples <- rlply(.n = 25, rnorm(100, mean = 0, sd = 1))

Function plyr::rlply() evaluates an expression \(n\) times and combines the results in a list. So the call rnorm(5000, mean = 0, sd = 1) is evaluated .n = 25 times.

We can then call the following sapply() statement to create object av:

av <- sapply(samples, mean)
sd(av) # Standard error of sample mean.
## [1] 0.09250157

Function sapply() evalutes the expression mean() over each listed element in samples. The result is a vector of length 100 with the means of the 100 samples from the samples object.


  1. Create a function that automatically returns a vector like av.
# Create a function for sd of sample means
mean.av <- function(n = 100, reps = 25) {
    
  # Make an empty vector. 
  av <- numeric(reps)
  
  for(i in 1:reps) {
      
    # Make a random standard normal dataset of size n.
    y     <- rnorm(n)
    
    # Compute and save the mean. 
    av[i] <- mean(y)
  }
  
  # Return the vector of means. 
  av
}

sd(mean.av())
## [1] 0.1089382

or, with rlply():

# Create a function for sd of sample means
mean.av <- function(n = 100, reps = 25) {
  rlply(.n = reps, rnorm(n, mean = 0, sd = 1)) %>%
    sapply(mean)
}

sd(mean.av())
## [1] 0.08850599

  1. Add the option to this function to print a density plot. Set it to TRUE by default.
# Create a function for sd of sample means
mean.av <- function(n = 100, reps = 25, plotDens = TRUE) {
    
  # Make an empty vector. 
  av <- numeric(reps)
  
  for(i in 1:reps) {
      
    # Make a random standard normal dataset of size n.
    y     <- rnorm(n)
    
    # Compute and save the mean. 
    av[i] <- mean(y)
  }
  
  if (plotDens) {
      plot(density(av), main = "Sampling distribution of the mean.")
  }
  
  # Return the vector of means. 
  av
}

The above function does two things:

  1. It returns a vector of 25 means by default (the default argument for reps is reps=25)
  2. It plots a densityplot of those 25 means, if wanted. By default it does this because the argument for plotting is plotDens=TRUE.

Let’s go through the function.

or, with rlply() and a pipe:

# Create a function for sd of sample means
mean.av <- function(n = 100, reps = 25, plotDens = TRUE) {
  av <- 
    rlply(.n = reps, rnorm(n, mean = 0, sd = 1)) %>%
    sapply(mean) 
  if (plotDens) {
    density(av) %>%
      plot(main = "Sampling distribution of the mean.")
  }
  return(av)
}

sd(mean.av())

## [1] 0.09456325

In the next codeblock, we first set the graphical parameters to display a 2 by 2 matrix of plots. Then we run the function mean.av() 4 times; each time generating a plot. After running the function four times, we will have filled the 2 by 2 plot raster. We do not have to specify the arguments of this function because all arguments are set by default. We end by resetting the graphical parameters to its original state by stating par(mfrow = c(1, 1)), such that when we plot a new graph, it will be displayed as a single graph and not in a raster of 4 plots.

par(mfrow = c(2, 2))
mean.av()
##  [1]  0.068470639 -0.077818879 -0.091692048 -0.130617879  0.251717283
##  [6]  0.054858574 -0.036049450  0.104475079 -0.089264959  0.080493654
## [11] -0.024325038 -0.097794614  0.141381646  0.009766486  0.153739037
## [16] -0.105587448 -0.001736498  0.097002198  0.045997872 -0.012079486
## [21] -0.101956204 -0.061177495  0.063383078  0.065626215 -0.069429600
mean.av()
##  [1]  0.1953824109  0.1348506175 -0.0094928504  0.0835959877 -0.0173680343
##  [6] -0.1508615650 -0.0900577592  0.0001168825 -0.0216254082 -0.0420224319
## [11]  0.0207623279 -0.1178076124 -0.1084565496 -0.0379984450  0.0545270409
## [16] -0.0328482522 -0.1625490248 -0.0327697543 -0.0922515961 -0.0779403956
## [21]  0.0294707701  0.0130250264 -0.1876611681  0.0288191266  0.0300587766
mean.av()
##  [1]  0.018066207  0.071066840 -0.115361401  0.061331876 -0.025427969
##  [6] -0.012207763 -0.067216129  0.009949625 -0.093228489 -0.030386433
## [11] -0.092443682 -0.079677253  0.173949802 -0.065719830  0.140577349
## [16] -0.049671329 -0.122333707 -0.048353074  0.052410305 -0.117598169
## [21] -0.095278072  0.008703199 -0.071540490 -0.166771781  0.029540325
mean.av()

##  [1] -0.189430327 -0.066863184  0.226732126  0.154103786 -0.265004735
##  [6]  0.009260854  0.149866929 -0.169644176 -0.047387711  0.105561401
## [11] -0.064871635 -0.187243442  0.192679309  0.037218065  0.093439232
## [16]  0.009343280  0.051302357 -0.068104398 -0.176032151 -0.042986921
## [21] -0.012468267 -0.151543761  0.137205895 -0.031933394  0.096857442

and we return the graphing parameter to its previous state of a single plot (i.e. 1 row and 1 column).

par(mfrow = c(1, 1))

  1. Generate a random sample of size 20 from a normal population with mean 100 and standard deviation 10.
rnorm(20, 100, 10)
##  [1] 118.08541 104.56931 119.38520  88.74277  87.08169 123.24315  95.05002
##  [8]  97.70985 102.79113 100.98217 102.30570 121.45238  94.62936  91.89949
## [15]  92.73861 111.96596 120.77000 104.17306  86.41646  90.04677

  1. Use mfrow to set up the layout for a 3 by 4 array of plots. In the top 4 panels, show normal probability plots (‘QQ-plots’) for 4 separate “random” samples of size 10, all drawn from a normal distribution. In the middle 4 panels, display plots for samples of size 100. In the bottom 4 panels, display plots for samples of size 1000. Comment on how the appearance of the plots changes as the sample size changes.
par(mfrow=c(3, 4))

for (i in 1:3) {
  for (j in 1:4) {
    qqnorm(rnorm(10^i))
  }
}


  1. Repeat exercise 8, but use runif instead of rnorm.
par(mfrow=c(3, 4))

for (i in 1:3) {
  for (j in 1:4) {
    qqnorm(runif(10^i))
  }
}

par(mfrow=c(1, 1))

  1. Use the function rexp() to simulate 100 exponential random numbers with rate 0.2. Do the following on the simulated random numbers
# Simulate numbers 
data <- rexp(100, .2) 

# Plot the data 
data %>%
  density(from = 0) %>%
  plot(main="Exponential with rate = 0.2")

# Comparison.
c("sample.mean" = mean(data), "pop.mean" = 1 / .2)
## sample.mean    pop.mean 
##    5.239311    5.000000

  1. Fit the following linear models on the anscombe data:
fit1 <- anscombe %$%
  lm(y1 ~ x1)
fit2 <- anscombe %$%
  lm(y2 ~ x2)
fit3 <- anscombe %$%
  lm(y3 ~ x3)
fit4 <- anscombe %$%
  lm(y4 ~ x4)

  1. `Create a data frame from the coefficients of the 4 fitted objects from Exercise 11
out <- data.frame(fit1 = coef(fit1),
                  fit2 = coef(fit2),
                  fit3 = coef(fit3),
                  fit4 = coef(fit4))
row.names(out) <- names(coef(fit1))
out
##                  fit1     fit2      fit3      fit4
## (Intercept) 3.0000909 3.000909 3.0024545 3.0017273
## x1          0.5000909 0.500000 0.4997273 0.4999091

These estimates are very similar.


  1. Plot the four fitted models from Exercise 11 in a single plotting window. Make the points in the plots blue, gray, orange and purple, respectively.
plot(y1 ~ x1, col = "blue", data = anscombe)
points(y2 ~ x2, col = "gray", data = anscombe)
points(y3 ~ x3, col = "orange", data = anscombe)
points(y4 ~ x4, col = "purple", data = anscombe)


  1. Now plot all four fitted models from Exercise 11 in a plotting window with 2 rows and 2 columns.
par(mfrow = c(2, 2))
plot(y1 ~ x1, data = anscombe)
plot(y2 ~ x2, data = anscombe)
plot(y3 ~ x3, data = anscombe)
plot(y4 ~ x4, data = anscombe)


End of practical.