We use the following packages:
library(ggplot2)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
Again, just like last time it is wise to start with fixing the random seed.
set.seed(123)
x <- rnorm(10, mean = 0, sd = 2)
t.test(x)
##
## One Sample t-test
##
## data: x
## t = 0.24742, df = 9, p-value = 0.8101
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## -1.215341 1.513843
## sample estimates:
## mean of x
## 0.1492513
x <- rnorm(10, 1.5, 2)
t.test(x)
##
## One Sample t-test
##
## data: x
## t = 2.9202, df = 9, p-value = 0.01703
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## 0.432058 3.402430
## sample estimates:
## mean of x
## 1.917244
n
numbers from a normal distribution with a user defined mean (i.e. a mean that you can choose when running the function) and standard deviation 1, and returns the p.value
for the test that the mean is 0.p.value.t <- function (n, mu) {
x <- rnorm(n, mu, 1)
t.test(x)$p.value
}
p.value.t(n = 30, mu = 3)
## [1] 2.912617e-17
qqplot
to compare distribution of the \(p\)-values with a uniform \([0,1]\) variable.y <- numeric(50)
for (i in 1:50) {
y[i] <- p.value.t(n = 10, mu = 0)
}
qqplot(x=qunif(ppoints(50)), y)
The p-values follow a uniform distribution.
In a study that examined the use of acupuncture to treat migraine headaches, consenting patients on a waiting list for treatment for migraine were randomly assigned in a 2:1:1 ratio to acupuncture treatment, a “sham” acupuncture treatment in which needles were inserted at non-acupuncture points, and waiting-list patients whose only treatment was self-administered (Linde et al., 2005). The “sham” acupuncture treatment was described to trial participants as an acupuncture treatment that did not follow the principles of Chinese medicine.
Use the following data
data <- matrix(c(74, 71, 43, 38, 11, 65), nrow = 2, ncol = 3)
colnames(data) <- c("Acupuncture", "Sham", "Waiting list")
rownames(data) <- c("> 50% reduction", "< 50% reduction")
data
## Acupuncture Sham Waiting list
## > 50% reduction 74 43 11
## < 50% reduction 71 38 65
We start with calculating the \(X^2\)-test:
X2test <-
data %>%
chisq.test()
X2test
##
## Pearson's Chi-squared test
##
## data: .
## X-squared = 32.486, df = 2, p-value = 8.825e-08
which is extremely significant. We can then calculate the expected cell frequencies
X2test$expected
## Acupuncture Sham Waiting list
## > 50% reduction 61.45695 34.33113 32.21192
## < 50% reduction 83.54305 46.66887 43.78808
and the raw residual
X2test$observed - X2test$expected
## Acupuncture Sham Waiting list
## > 50% reduction 12.54305 8.668874 -21.21192
## < 50% reduction -12.54305 -8.668874 21.21192
as well as the Pearson residual
X2test$residuals
## Acupuncture Sham Waiting list
## > 50% reduction 1.599991 1.479513 -3.737418
## < 50% reduction -1.372296 -1.268963 3.205546
to infer the difference in observed and expected cell frequencies. Patients on the waiting list experience > 50% reduction
much less than we would expect under independence of treatment and outcome.
data <- matrix(c(82, 17, 30, 30, 26, 16), nrow = 3, ncol = 2)
colnames(data) <- c("Acupuncture", "Sham")
rownames(data) <- c("Chinese", "Other", "Don't know")
data
## Acupuncture Sham
## Chinese 82 30
## Other 17 26
## Don't know 30 16
We again start with calculating the \(X^2\)-test:
X2test <-
data %>%
chisq.test()
X2test
##
## Pearson's Chi-squared test
##
## data: .
## X-squared = 15.358, df = 2, p-value = 0.0004624
which is very significant. We can then calculate the expected cell frequencies
X2test$expected
## Acupuncture Sham
## Chinese 71.88060 40.11940
## Other 27.59701 15.40299
## Don't know 29.52239 16.47761
and the raw residual
X2test$observed - X2test$expected
## Acupuncture Sham
## Chinese 10.1194030 -10.1194030
## Other -10.5970149 10.5970149
## Don't know 0.4776119 -0.4776119
as well as the Pearson residual
X2test$residuals
## Acupuncture Sham
## Chinese 1.19357319 -1.5976353
## Other -2.01721641 2.7001078
## Don't know 0.08790214 -0.1176598
We find that people who are receiving true Acupuncture
are more inclined to believe that they receive Chinese
acupuncture than we would expect under independence, while people wo received Sham
acupuncture are more inclined to believe that they receive Other
type of acupuncture. Don't know
is more or less similarly distributed over the observed and expected frequencies.
In the following simulation experiment we investigate least-squares estimation of the mean.
32083
.set.seed(32083)
x <- rnorm(100, mean=3, sd=7)
mean(x)
## [1] 4.165695
mu = 3
summed <- x-mu
sum.sq <- sum(summed^2)
And a fast way
sum.sq2 <- apply(outer(x, mu, "-")^2, 2, sum)
Both solutions are identical
identical(sum.sq, sum.sq2)
## [1] TRUE
lsfun
because we are going to identify the least squares estimate in exercise 8.lsfun <- function(meanest) apply(outer(x, meanest, "-")^2, 2, sum)
or, 100% equivalently, but easier to spot as a function:
lsfun <- function(meanest){
apply(outer(x, meanest, "-")^2, 2, sum)
}
curve(lsfun, from = 4.16, to = 4.17)
lsfun()
. Let the function plot the curve and print the location where the minimum of the sum of the squares is located each time. Fix the seed to set.seed(123)
:Hint: use the sample mean \(\bar{x}\) as the center of your graph and add/subtract e.g. .5
from this value to plot a range.
First, we fix the random seed
set.seed(123)
Code-wise it is efficient to write a function that does the repetitive experimentation. That way we have to write some lines of code only once.
plotfun <- function(x, meanest, plot = TRUE){
xbar <- mean(x)
lsfun <- function(meanest){ apply(outer(xbar, meanest, "-")^2, 2, sum)}
# lsfun <- function(mu, xbar) {
# summed <- x-mu
# sum.sq <- sum(summed^2)
# return(sum.sq)
# }
if (plot) {
curve(lsfun, from = xbar - .5, to = xbar + .5,
ylab=expression(paste(Sigma," ",e^2))) #nicer y-axis label
}
return(cat("The mean is:", xbar, "\n"))
}
rnorm(100, mean=3, sd=sqrt(7)) %>%
plotfun(meanest = 3, plot = TRUE)
## The mean is: 3.239192
rnorm(100, mean=15, sd=sqrt(12)) %>%
plotfun(meanest = 15, plot = TRUE)
## The mean is: 14.62745
rnorm(100, mean=0, sd=2) %>%
plotfun(meanest = 0, plot = TRUE)
## The mean is: 0.2409302
rnorm(100, mean=0, sd=2)^2 %>%
plotfun(meanest = 0, plot = TRUE)
## The mean is: 4.278351
End of practical.