mice
: Imputing multi-level dataThis is the fifth vignette in a series of six.
In this vignette we will focus on multi-level imputation. You need to have package pan
installed. You can install it by running: install.packages("pan")
.
1. Open R
and load the packages mice
, lattice
and pan
.
require(mice)
require(lattice)
require(pan)
set.seed(123)
We choose seed value 123
. This is an arbitrary value; any value would be an equally good seed value. Fixing the random seed enables you (and others) to exactly replicate anything that involves random number generators. If you set the seed in your R
instance to 123
, you will get the exact same results and plots as we present in this document.
We are going to work with the popularity data from Joop Hox (2010). The variables in this data set are described as follows:
pupil | Pupil number within class |
class | Class number |
extrav | Pupil extraversion |
sex | Pupil gender |
texp | Teacher experience (years) |
popular | Pupil popularity |
popteach | Teacher popularity |
1. Open the popular.RData
workspace. A workspace with complete and incomplete versions of the popularity data can be obtained here or can be loaded into the Global Environment by running:
con <- url("https://www.gerkovink.com/mimp/popular.RData")
load(con)
This workspace contains several datasets and functions that, when loaded, are available to you in R. If you’d like to see what is inside: run the following code
ls()
## [1] "con" "icc"
## [3] "mice.impute.2lonly.mean2" "popMCAR"
## [5] "popMCAR2" "popNCR"
## [7] "popNCR2" "popNCR3"
## [9] "popular"
The dataset popNCR
is a variation on the Hox (2010) data, where the missingness in the variables is either missing at random (MAR) or missing not at random (MNAR).
2. Check with the functions head()
, dim()
- alternatively one could use nrow()
and ncol()
instead of dim()
- and summary()
how large the dataset is, of which variables the data frame consists and if there are missing values in a variable.
head(popNCR)
## pupil class extrav sex texp popular popteach
## 1 1 1 5 1 NA 6.3 NA
## 2 2 1 NA 0 24 4.9 NA
## 3 3 1 4 1 NA 5.3 6
## 4 4 1 3 <NA> NA 4.7 5
## 5 5 1 5 1 24 NA 6
## 6 6 1 NA 0 NA 4.7 5
dim(popNCR)
## [1] 2000 7
nrow(popNCR)
## [1] 2000
ncol(popNCR)
## [1] 7
summary(popNCR)
## pupil class extrav sex texp
## Min. : 1.00 17 : 26 Min. : 1.000 0 :661 Min. : 2.0
## 1st Qu.: 6.00 63 : 25 1st Qu.: 4.000 1 :843 1st Qu.: 7.0
## Median :11.00 10 : 24 Median : 5.000 NA's:496 Median :12.0
## Mean :10.65 15 : 24 Mean : 5.313 Mean :11.8
## 3rd Qu.:16.00 4 : 23 3rd Qu.: 6.000 3rd Qu.:16.0
## Max. :26.00 21 : 23 Max. :10.000 Max. :25.0
## (Other):1855 NA's :516 NA's :976
## popular popteach
## Min. :0.000 Min. : 1.000
## 1st Qu.:3.900 1st Qu.: 4.000
## Median :4.800 Median : 5.000
## Mean :4.829 Mean : 4.834
## 3rd Qu.:5.800 3rd Qu.: 6.000
## Max. :9.100 Max. :10.000
## NA's :510 NA's :528
The data set has 2000 rows and 7 columns (variables). The variables extrav
, sex
, texp
, popular
and popteach
contain missings. About a quarter of these variables is missing, except for texp
where 50 % is missing.
3. As we have seen before, the function md.pattern()
is used to display all different missing data patterns. How many different missing data patterns are present in the popNCR
dataframe and which patterns occur most frequently in the data? Also find out how many patterns we would observe when variable texp
(teacher experience) is not considered.
md.pattern(popNCR)
## pupil class sex popular extrav popteach texp
## 308 1 1 1 1 1 1 1 0
## 279 1 1 1 1 1 1 0 1
## 110 1 1 1 1 1 0 1 1
## 115 1 1 1 1 1 0 0 2
## 114 1 1 1 1 0 1 1 1
## 98 1 1 1 1 0 1 0 2
## 33 1 1 1 1 0 0 1 2
## 24 1 1 1 1 0 0 0 3
## 119 1 1 1 0 1 1 1 1
## 113 1 1 1 0 1 1 0 2
## 50 1 1 1 0 1 0 1 2
## 75 1 1 1 0 1 0 0 3
## 29 1 1 1 0 0 1 1 2
## 21 1 1 1 0 0 1 0 3
## 2 1 1 1 0 0 0 1 3
## 14 1 1 1 0 0 0 0 4
## 102 1 1 0 1 1 1 1 1
## 89 1 1 0 1 1 1 0 2
## 25 1 1 0 1 1 0 1 2
## 29 1 1 0 1 1 0 0 3
## 85 1 1 0 1 0 1 1 2
## 56 1 1 0 1 0 1 0 3
## 9 1 1 0 1 0 0 1 3
## 14 1 1 0 1 0 0 0 4
## 19 1 1 0 0 1 1 1 2
## 27 1 1 0 0 1 1 0 3
## 13 1 1 0 0 1 0 1 3
## 11 1 1 0 0 1 0 0 4
## 4 1 1 0 0 0 1 1 3
## 9 1 1 0 0 0 1 0 4
## 2 1 1 0 0 0 0 1 4
## 2 1 1 0 0 0 0 0 5
## 0 0 496 510 516 528 976 3026
There are 32 unique patterns. The pattern where everything is observed and the pattern where only texp is missing occur most frequently.
If we omit texp, then the following pattern matrix is realized:
md.pattern(popNCR[ , -5])
## pupil class sex popular extrav popteach
## 587 1 1 1 1 1 1 0
## 225 1 1 1 1 1 0 1
## 212 1 1 1 1 0 1 1
## 57 1 1 1 1 0 0 2
## 232 1 1 1 0 1 1 1
## 125 1 1 1 0 1 0 2
## 50 1 1 1 0 0 1 2
## 16 1 1 1 0 0 0 3
## 191 1 1 0 1 1 1 1
## 54 1 1 0 1 1 0 2
## 141 1 1 0 1 0 1 2
## 23 1 1 0 1 0 0 3
## 46 1 1 0 0 1 1 2
## 24 1 1 0 0 1 0 3
## 13 1 1 0 0 0 1 3
## 4 1 1 0 0 0 0 4
## 0 0 496 510 516 528 2050
Without texp, there are only 16 patterns.
4. Let’s focus more precisely on the missing data patterns. Does the missing data of popular
depend on popteach
? One could for example check this by making a histogram of popteach
separately for the pupils with known popularity and missing popularity.
In R the missingness indicator
is.na(popNCR$popular)
## [1] FALSE FALSE FALSE FALSE TRUE FALSE FALSE TRUE TRUE FALSE FALSE FALSE
## [13] FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE
## [25] FALSE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [37] TRUE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE
## [49] FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [61] FALSE FALSE FALSE TRUE TRUE TRUE FALSE FALSE FALSE TRUE FALSE FALSE
## [73] FALSE TRUE TRUE FALSE FALSE FALSE FALSE TRUE TRUE FALSE FALSE FALSE
## [85] FALSE FALSE TRUE FALSE FALSE FALSE TRUE TRUE FALSE FALSE FALSE FALSE
## [97] FALSE FALSE TRUE FALSE TRUE FALSE TRUE FALSE FALSE FALSE FALSE FALSE
## [109] FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE FALSE FALSE FALSE TRUE
## [121] FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [133] TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [145] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [157] TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [169] TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE
## [181] FALSE TRUE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [193] TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE TRUE FALSE TRUE FALSE
## [205] FALSE FALSE TRUE FALSE FALSE FALSE TRUE FALSE FALSE FALSE TRUE FALSE
## [217] TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [229] FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [241] FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE TRUE TRUE
## [253] TRUE FALSE FALSE FALSE TRUE FALSE FALSE FALSE TRUE FALSE FALSE TRUE
## [265] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
## [277] TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
## [289] FALSE FALSE FALSE FALSE TRUE TRUE FALSE TRUE FALSE TRUE FALSE TRUE
## [301] TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE
## [313] FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE
## [325] FALSE FALSE FALSE TRUE FALSE FALSE TRUE FALSE TRUE TRUE FALSE FALSE
## [337] FALSE FALSE FALSE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [349] FALSE FALSE TRUE TRUE FALSE TRUE FALSE FALSE FALSE TRUE TRUE FALSE
## [361] FALSE FALSE FALSE TRUE FALSE TRUE TRUE TRUE FALSE FALSE FALSE TRUE
## [373] FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE
## [385] FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE
## [397] FALSE FALSE FALSE TRUE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE
## [409] FALSE TRUE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [421] FALSE TRUE FALSE FALSE FALSE TRUE FALSE TRUE TRUE FALSE TRUE FALSE
## [433] FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE TRUE FALSE
## [445] FALSE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE
## [457] TRUE FALSE FALSE TRUE FALSE FALSE TRUE FALSE FALSE FALSE TRUE FALSE
## [469] FALSE FALSE FALSE FALSE TRUE FALSE TRUE FALSE FALSE FALSE FALSE FALSE
## [481] FALSE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [493] FALSE FALSE FALSE FALSE TRUE TRUE FALSE TRUE FALSE FALSE FALSE FALSE
## [505] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE TRUE TRUE
## [517] FALSE FALSE TRUE FALSE FALSE TRUE FALSE FALSE TRUE FALSE FALSE FALSE
## [529] FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [541] FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE TRUE
## [553] FALSE FALSE TRUE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE
## [565] TRUE TRUE FALSE FALSE FALSE FALSE FALSE TRUE TRUE FALSE FALSE TRUE
## [577] FALSE FALSE FALSE FALSE TRUE FALSE TRUE FALSE TRUE FALSE FALSE FALSE
## [589] TRUE FALSE TRUE FALSE TRUE FALSE FALSE FALSE FALSE TRUE FALSE FALSE
## [601] FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE
## [613] FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [625] FALSE FALSE FALSE TRUE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE
## [637] FALSE FALSE TRUE FALSE FALSE FALSE FALSE TRUE FALSE TRUE FALSE FALSE
## [649] TRUE FALSE FALSE TRUE FALSE TRUE FALSE TRUE FALSE FALSE TRUE TRUE
## [661] TRUE TRUE FALSE TRUE TRUE TRUE FALSE FALSE TRUE TRUE TRUE FALSE
## [673] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [685] TRUE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE
## [697] TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE TRUE FALSE TRUE FALSE
## [709] TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [721] FALSE TRUE TRUE FALSE FALSE FALSE TRUE FALSE TRUE FALSE TRUE FALSE
## [733] FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE FALSE FALSE TRUE FALSE
## [745] FALSE FALSE TRUE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE
## [757] FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE TRUE FALSE TRUE FALSE
## [769] FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE TRUE
## [781] FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE TRUE FALSE TRUE
## [793] TRUE TRUE FALSE TRUE FALSE FALSE TRUE FALSE FALSE FALSE TRUE FALSE
## [805] FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE
## [817] FALSE FALSE TRUE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE
## [829] FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [841] TRUE FALSE FALSE FALSE TRUE FALSE FALSE TRUE FALSE FALSE FALSE FALSE
## [853] FALSE TRUE FALSE TRUE FALSE TRUE TRUE FALSE FALSE FALSE TRUE FALSE
## [865] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE
## [877] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
## [889] TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [901] TRUE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE TRUE TRUE
## [913] FALSE FALSE FALSE TRUE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE
## [925] FALSE FALSE TRUE FALSE TRUE TRUE TRUE FALSE TRUE FALSE FALSE TRUE
## [937] FALSE FALSE TRUE TRUE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE
## [949] FALSE TRUE FALSE TRUE TRUE FALSE TRUE FALSE FALSE FALSE FALSE FALSE
## [961] FALSE TRUE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE
## [973] FALSE FALSE FALSE FALSE FALSE TRUE FALSE TRUE FALSE FALSE FALSE TRUE
## [985] FALSE TRUE TRUE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE
## [997] TRUE TRUE FALSE TRUE FALSE TRUE FALSE TRUE FALSE FALSE FALSE FALSE
## [1009] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [1021] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [1033] FALSE FALSE FALSE FALSE TRUE TRUE FALSE FALSE FALSE TRUE FALSE FALSE
## [1045] TRUE TRUE FALSE TRUE TRUE FALSE TRUE TRUE FALSE FALSE TRUE FALSE
## [1057] FALSE TRUE FALSE TRUE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE
## [1069] FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [1081] TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [1093] FALSE FALSE FALSE FALSE FALSE TRUE TRUE FALSE FALSE FALSE FALSE FALSE
## [1105] FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE TRUE FALSE
## [1117] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
## [1129] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [1141] FALSE FALSE TRUE FALSE FALSE TRUE FALSE TRUE TRUE TRUE FALSE FALSE
## [1153] FALSE FALSE TRUE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE
## [1165] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE TRUE FALSE
## [1177] FALSE FALSE TRUE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE
## [1189] FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [1201] TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE TRUE TRUE
## [1213] FALSE FALSE FALSE TRUE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE
## [1225] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE TRUE
## [1237] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [1249] FALSE FALSE FALSE TRUE TRUE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [1261] FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE
## [1273] TRUE FALSE FALSE TRUE TRUE TRUE FALSE TRUE FALSE FALSE TRUE FALSE
## [1285] FALSE TRUE TRUE FALSE FALSE FALSE FALSE TRUE FALSE TRUE FALSE FALSE
## [1297] FALSE FALSE FALSE TRUE TRUE FALSE TRUE TRUE FALSE TRUE FALSE FALSE
## [1309] FALSE TRUE TRUE FALSE FALSE FALSE FALSE TRUE FALSE TRUE FALSE FALSE
## [1321] FALSE TRUE FALSE FALSE FALSE TRUE TRUE FALSE FALSE FALSE FALSE TRUE
## [1333] TRUE TRUE FALSE TRUE FALSE TRUE FALSE FALSE TRUE TRUE TRUE TRUE
## [1345] FALSE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE
## [1357] FALSE TRUE TRUE TRUE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE
## [1369] FALSE TRUE FALSE FALSE FALSE FALSE FALSE TRUE TRUE FALSE TRUE FALSE
## [1381] FALSE FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE TRUE FALSE
## [1393] FALSE TRUE FALSE FALSE TRUE TRUE FALSE FALSE FALSE TRUE FALSE FALSE
## [1405] TRUE FALSE FALSE FALSE FALSE TRUE FALSE TRUE TRUE TRUE FALSE FALSE
## [1417] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE FALSE
## [1429] TRUE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [1441] TRUE FALSE TRUE FALSE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE
## [1453] TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE
## [1465] FALSE FALSE TRUE TRUE TRUE FALSE FALSE FALSE FALSE TRUE TRUE FALSE
## [1477] TRUE FALSE TRUE TRUE TRUE FALSE FALSE TRUE FALSE FALSE FALSE FALSE
## [1489] FALSE TRUE FALSE TRUE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE
## [1501] TRUE TRUE TRUE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE
## [1513] FALSE TRUE TRUE FALSE FALSE FALSE FALSE TRUE FALSE FALSE TRUE FALSE
## [1525] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
## [1537] TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [1549] TRUE TRUE FALSE FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE TRUE
## [1561] FALSE TRUE FALSE FALSE TRUE FALSE FALSE FALSE TRUE FALSE FALSE TRUE
## [1573] FALSE TRUE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE
## [1585] FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE
## [1597] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE
## [1609] FALSE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [1621] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [1633] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [1645] FALSE TRUE TRUE FALSE FALSE TRUE TRUE TRUE FALSE FALSE TRUE TRUE
## [1657] FALSE TRUE FALSE TRUE FALSE TRUE TRUE FALSE TRUE TRUE TRUE FALSE
## [1669] TRUE FALSE TRUE TRUE FALSE TRUE FALSE FALSE TRUE TRUE TRUE FALSE
## [1681] FALSE FALSE TRUE TRUE FALSE TRUE TRUE FALSE FALSE TRUE FALSE FALSE
## [1693] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE FALSE
## [1705] FALSE TRUE FALSE FALSE FALSE TRUE TRUE TRUE FALSE FALSE FALSE FALSE
## [1717] FALSE FALSE FALSE TRUE TRUE FALSE TRUE FALSE TRUE FALSE FALSE FALSE
## [1729] FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [1741] FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
## [1753] FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [1765] FALSE FALSE TRUE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE
## [1777] FALSE FALSE TRUE TRUE FALSE FALSE FALSE FALSE TRUE FALSE TRUE TRUE
## [1789] TRUE FALSE FALSE TRUE FALSE TRUE TRUE FALSE FALSE FALSE TRUE TRUE
## [1801] TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE
## [1813] FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE TRUE FALSE FALSE FALSE
## [1825] FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [1837] FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE TRUE
## [1849] FALSE FALSE FALSE TRUE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE
## [1861] FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [1873] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [1885] FALSE TRUE FALSE FALSE TRUE TRUE FALSE FALSE TRUE FALSE TRUE TRUE
## [1897] FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE
## [1909] FALSE FALSE FALSE FALSE TRUE FALSE TRUE TRUE FALSE FALSE FALSE FALSE
## [1921] TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [1933] TRUE FALSE FALSE FALSE TRUE FALSE TRUE FALSE FALSE FALSE FALSE FALSE
## [1945] FALSE FALSE TRUE TRUE TRUE FALSE FALSE TRUE TRUE FALSE FALSE TRUE
## [1957] FALSE TRUE FALSE TRUE FALSE FALSE FALSE FALSE TRUE FALSE TRUE FALSE
## [1969] FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE
## [1981] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
## [1993] FALSE FALSE FALSE TRUE FALSE TRUE FALSE FALSE
is a dummy variable of the same length as popular
with value 0 (FALSE
) for observed pupil popularity and 1 (TRUE
) for missing pupil popularity. A histogram can be made with the function histogram()
. The code for a conditional histogram of popteach
given the missingness indicator for popular is
histogram(~ popteach | is.na(popular), data=popNCR)
We do see that the histogram for the missing popular
(TRUE
) is further to the right than the histogram for observed popular
(FALSE
). This would indicate a right-tailed MAR missingness. In fact this is exactly what happens, because we created the missingness in these data ourselves. But we can make it observable by examining the relations between the missingness in popular
and the observed data in popteach
.
5. Does the missingness of the other incomplete variables depend on popteach
? If yes, what is the direction of the relation?
histogram(~ popteach | is.na(sex), data = popNCR)
There seems to be a left-tailed relation between popteach
and the missingness in sex
.
histogram(~ popteach | is.na(extrav), data = popNCR)
There also seems to be a left-tailed relation between popteach
and the missingness in extrav
.
histogram(~ popteach | is.na(texp), data = popNCR)
There seems to be no observable relation between popteach
and the missingness in texp
. It might be MCAR or even MNAR.
6. Find out if the missingness in teacher popularity depends on pupil popularity.
histogram(~ popular | is.na(popteach), data = popNCR)
Yes: there is a dependency. The relation seems to be right-tailed.
7. Have a look at the intraclass correlation (ICC) for the incomplete variables popular
, popteach
and texp
.
icc(aov(popular ~ as.factor(class), data = popNCR))
## [1] 0.328007
icc(aov(popteach ~ class, data = popNCR))
## [1] 0.3138658
icc(aov(texp ~ class, data = popNCR))
## [1] 1
Please note that the function icc()
comes from the package multilevel
(function ICC1()
), but is included in the workspace popular.RData
. Write down the ICCs, you’ll need them later.
7b. Do you think it is necessary to take the multilevel structure into account?
YES! There is a strong cluster structure going on. If we ignore the clustering in our imputation model, we may run into invalid inference. To stay as close to the true data model, we must take the cluster structure into account during imputation.
8. Impute the popNCR
dataset with mice
using imputation method norm
for popular
, popteach
, texp
and extrav
. Exclude class
as a predictor for all variables. Call the mids
-object imp1
.
ini <- mice(popNCR, maxit = 0)
meth <- ini$meth
meth
## pupil class extrav sex texp popular popteach
## "" "" "pmm" "logreg" "pmm" "pmm" "pmm"
meth[c(3, 5, 6, 7)] <- "norm"
meth
## pupil class extrav sex texp popular popteach
## "" "" "norm" "logreg" "norm" "norm" "norm"
pred <- ini$pred
pred
## pupil class extrav sex texp popular popteach
## pupil 0 1 1 1 1 1 1
## class 1 0 1 1 1 1 1
## extrav 1 1 0 1 1 1 1
## sex 1 1 1 0 1 1 1
## texp 1 1 1 1 0 1 1
## popular 1 1 1 1 1 0 1
## popteach 1 1 1 1 1 1 0
pred[, "class"] <- 0
pred[, "pupil"] <- 0
pred
## pupil class extrav sex texp popular popteach
## pupil 0 0 1 1 1 1 1
## class 0 0 1 1 1 1 1
## extrav 0 0 0 1 1 1 1
## sex 0 0 1 0 1 1 1
## texp 0 0 1 1 0 1 1
## popular 0 0 1 1 1 0 1
## popteach 0 0 1 1 1 1 0
imp1 <- mice(popNCR, meth = meth, pred = pred, print = FALSE)
9. Compare the means of the variables in the first imputed dataset and in the incomplete dataset.
summary(complete(imp1))
## pupil class extrav sex texp
## Min. : 1.00 17 : 26 Min. : 1.000 0: 985 Min. :-6.465
## 1st Qu.: 6.00 63 : 25 1st Qu.: 4.139 1:1015 1st Qu.: 8.000
## Median :11.00 10 : 24 Median : 5.000 Median :12.253
## Mean :10.65 15 : 24 Mean : 5.269 Mean :12.509
## 3rd Qu.:16.00 4 : 23 3rd Qu.: 6.000 3rd Qu.:16.698
## Max. :26.00 21 : 23 Max. :10.000 Max. :35.745
## (Other):1855
## popular popteach
## Min. : 0.000 Min. : 1.000
## 1st Qu.: 4.100 1st Qu.: 4.000
## Median : 5.000 Median : 5.000
## Mean : 5.006 Mean : 5.021
## 3rd Qu.: 5.971 3rd Qu.: 6.000
## Max. :10.547 Max. :10.000
##
summary(popNCR)
## pupil class extrav sex texp
## Min. : 1.00 17 : 26 Min. : 1.000 0 :661 Min. : 2.0
## 1st Qu.: 6.00 63 : 25 1st Qu.: 4.000 1 :843 1st Qu.: 7.0
## Median :11.00 10 : 24 Median : 5.000 NA's:496 Median :12.0
## Mean :10.65 15 : 24 Mean : 5.313 Mean :11.8
## 3rd Qu.:16.00 4 : 23 3rd Qu.: 6.000 3rd Qu.:16.0
## Max. :26.00 21 : 23 Max. :10.000 Max. :25.0
## (Other):1855 NA's :516 NA's :976
## popular popteach
## Min. :0.000 Min. : 1.000
## 1st Qu.:3.900 1st Qu.: 4.000
## Median :4.800 Median : 5.000
## Mean :4.829 Mean : 4.834
## 3rd Qu.:5.800 3rd Qu.: 6.000
## Max. :9.100 Max. :10.000
## NA's :510 NA's :528
9b. The missingness in texp
is MNAR: higher values for texp
have a larger probability to be missing. Can you see this in the imputed data? Do you think this is a problem?
Yes, we can see this in the imputed data: teacher experience increases slightly after imputation. However, texp
is the same for all pupils in a class. But not all pupils have this information recorded (as if some pupils did not remember, or were not present during data collection). This is not a problem, because as long as at least one pupil in each class has teacher experience recorded, we can deductively impute the correct (i.e. true) value for every pupil in the class.
10. Compare the ICCs of the variables in the first imputed dataset with those in the incomplete dataset (use popular
, popteach
and texp
). Make a notation of the ICCs after imputation.
data.frame(vars = names(popNCR[c(6, 7, 5)]),
observed = c(icc(aov(popular ~ class, popNCR)),
icc(aov(popteach ~ class, popNCR)),
icc(aov(texp ~ class, popNCR))),
norm = c(icc(aov(popular ~ class, complete(imp1))),
icc(aov(popteach ~ class, complete(imp1))),
icc(aov(texp ~ class, complete(imp1)))))
## vars observed norm
## 1 popular 0.3280070 0.2798518
## 2 popteach 0.3138658 0.2639095
## 3 texp 1.0000000 0.4595004
11. Now impute the popNCR
dataset again with mice
using imputation method norm
for popular
, popteach
, texp
and extrav
, but now include class
as a predictor for all variables. Call the mids
-object imp2
.
pred <- ini$pred
pred[, "pupil"] <- 0
imp2 <- mice(popNCR, meth = meth, pred = pred, print = FALSE)
## Warning: Number of logged events: 90
12. Compare the ICCs of the variables in the first imputed dataset from imp2
with those of imp1
and the incomplete dataset (use popular
, popteach
and texp
). Make a notation of the ICCs after imputation.
data.frame(vars = names(popNCR[c(6, 7, 5)]),
observed = c(icc(aov(popular ~ class, popNCR)),
icc(aov(popteach ~ class, popNCR)),
icc(aov(texp ~ class, popNCR))),
norm = c(icc(aov(popular ~ class, complete(imp1))),
icc(aov(popteach ~ class, complete(imp1))),
icc(aov(texp ~ class, complete(imp1)))),
normclass = c(icc(aov(popular ~ class, complete(imp2))),
icc(aov(popteach ~ class, complete(imp2))),
icc(aov(texp ~ class, complete(imp2)))))
## vars observed norm normclass
## 1 popular 0.3280070 0.2798518 0.3629046
## 2 popteach 0.3138658 0.2639095 0.3326133
## 3 texp 1.0000000 0.4595004 1.0000000
By simply forcing the algorithm to use the class variable during estimation we adopt a fixed effects approach. This conforms to formulating seperate regression models for each class
and imputing within classes from these models.
13. Inspect the trace lines for the variables popular
, texp
and extrav
.
plot(imp2, c("popular", "texp", "popteach"))
14. Add another 10 iterations and inspect the trace lines again. What do you observe with respect to the convergence of the sampler?
imp3 <- mice.mids(imp2, maxit = 10)
##
## iter imp variable
## 6 1 extrav sex texp popular popteach
## 6 2 extrav sex texp popular popteach
## 6 3 extrav sex texp popular popteach
## 6 4 extrav sex texp popular popteach
## 6 5 extrav sex texp popular popteach
## 7 1 extrav sex texp popular popteach
## 7 2 extrav sex texp popular popteach
## 7 3 extrav sex texp popular popteach
## 7 4 extrav sex texp popular popteach
## 7 5 extrav sex texp popular popteach
## 8 1 extrav sex texp popular popteach
## 8 2 extrav sex texp popular popteach
## 8 3 extrav sex texp popular popteach
## 8 4 extrav sex texp popular popteach
## 8 5 extrav sex texp popular popteach
## 9 1 extrav sex texp popular popteach
## 9 2 extrav sex texp popular popteach
## 9 3 extrav sex texp popular popteach
## 9 4 extrav sex texp popular popteach
## 9 5 extrav sex texp popular popteach
## 10 1 extrav sex texp popular popteach
## 10 2 extrav sex texp popular popteach
## 10 3 extrav sex texp popular popteach
## 10 4 extrav sex texp popular popteach
## 10 5 extrav sex texp popular popteach
## 11 1 extrav sex texp popular popteach
## 11 2 extrav sex texp popular popteach
## 11 3 extrav sex texp popular popteach
## 11 4 extrav sex texp popular popteach
## 11 5 extrav sex texp popular popteach
## 12 1 extrav sex texp popular popteach
## 12 2 extrav sex texp popular popteach
## 12 3 extrav sex texp popular popteach
## 12 4 extrav sex texp popular popteach
## 12 5 extrav sex texp popular popteach
## 13 1 extrav sex texp popular popteach
## 13 2 extrav sex texp popular popteach
## 13 3 extrav sex texp popular popteach
## 13 4 extrav sex texp popular popteach
## 13 5 extrav sex texp popular popteach
## 14 1 extrav sex texp popular popteach
## 14 2 extrav sex texp popular popteach
## 14 3 extrav sex texp popular popteach
## 14 4 extrav sex texp popular popteach
## 14 5 extrav sex texp popular popteach
## 15 1 extrav sex texp popular popteach
## 15 2 extrav sex texp popular popteach
## 15 3 extrav sex texp popular popteach
## 15 4 extrav sex texp popular popteach
## 15 5 extrav sex texp popular popteach
plot(imp3, c("popular", "texp", "popteach"))
It seems OK. Adding another 20 iterations confirms this.
imp3b <- mice.mids(imp3, maxit = 20, print = FALSE)
plot(imp3b, c("popular", "texp", "popteach"))
Further inspection
Several plotting methods based on the package lattice
for Trellis graphics are implemented in mice
for imputed data.
15. Plot the densities of the observed and imputed data (use imp2
) with the function densityplot()
.
To obtain all densities of the different imputed datasets use
densityplot(imp2)
To obtain just the densities for popular one can use
densityplot(imp2, ~ popular)
or
densityplot(imp2, ~ popular | .imp)
The latter case results in a conditional plot (conditional on the different imputed datasets).
16. Have a look at the imputed dataset by asking the first 15 rows of the first completed dataset for imp2
. What do you think of the imputed values?
complete(imp2, 1)[1:15, ]
## pupil class extrav sex texp popular popteach
## 1 1 1 5.000000 1 24 6.300000 6.314991
## 2 2 1 3.616012 0 24 4.900000 4.343070
## 3 3 1 4.000000 1 24 5.300000 6.000000
## 4 4 1 3.000000 0 24 4.700000 5.000000
## 5 5 1 5.000000 1 24 5.656993 6.000000
## 6 6 1 3.564456 0 24 4.700000 5.000000
## 7 7 1 5.000000 0 24 5.900000 5.000000
## 8 8 1 4.000000 0 24 4.484762 4.389721
## 9 9 1 5.000000 0 24 4.686309 5.000000
## 10 10 1 5.000000 0 24 3.900000 3.000000
## 11 11 1 3.217854 1 24 5.700000 5.000000
## 12 12 1 5.000000 0 24 4.800000 5.000000
## 13 13 1 5.000000 0 24 5.000000 5.000000
## 14 14 1 5.000000 1 24 6.157194 6.000000
## 15 15 1 5.000000 1 24 6.000000 5.000000
or, alternatively
head(complete(imp2, 1), n = 15)
## pupil class extrav sex texp popular popteach
## 1 1 1 5.000000 1 24 6.300000 6.314991
## 2 2 1 3.616012 0 24 4.900000 4.343070
## 3 3 1 4.000000 1 24 5.300000 6.000000
## 4 4 1 3.000000 0 24 4.700000 5.000000
## 5 5 1 5.000000 1 24 5.656993 6.000000
## 6 6 1 3.564456 0 24 4.700000 5.000000
## 7 7 1 5.000000 0 24 5.900000 5.000000
## 8 8 1 4.000000 0 24 4.484762 4.389721
## 9 9 1 5.000000 0 24 4.686309 5.000000
## 10 10 1 5.000000 0 24 3.900000 3.000000
## 11 11 1 3.217854 1 24 5.700000 5.000000
## 12 12 1 5.000000 0 24 4.800000 5.000000
## 13 13 1 5.000000 0 24 5.000000 5.000000
## 14 14 1 5.000000 1 24 6.157194 6.000000
## 15 15 1 5.000000 1 24 6.000000 5.000000
17. Impute the popNCR
data once more where you use predictive mean matching and include all variables as predictors. Name the object imp4
.
imp4 <- mice(popNCR)
##
## iter imp variable
## 1 1 extrav sex texp popular popteach
## 1 2 extrav sex texp popular popteach
## 1 3 extrav sex texp popular popteach
## 1 4 extrav sex texp popular popteach
## 1 5 extrav sex texp popular popteach
## 2 1 extrav sex texp popular popteach
## 2 2 extrav sex texp popular popteach
## 2 3 extrav sex texp popular popteach
## 2 4 extrav sex texp popular popteach
## 2 5 extrav sex texp popular popteach
## 3 1 extrav sex texp popular popteach
## 3 2 extrav sex texp popular popteach
## 3 3 extrav sex texp popular popteach
## 3 4 extrav sex texp popular popteach
## 3 5 extrav sex texp popular popteach
## 4 1 extrav sex texp popular popteach
## 4 2 extrav sex texp popular popteach
## 4 3 extrav sex texp popular popteach
## 4 4 extrav sex texp popular popteach
## 4 5 extrav sex texp popular popteach
## 5 1 extrav sex texp popular popteach
## 5 2 extrav sex texp popular popteach
## 5 3 extrav sex texp popular popteach
## 5 4 extrav sex texp popular popteach
## 5 5 extrav sex texp popular popteach
## Warning: Number of logged events: 90
18. Plot again the densities of the observed and imputed data with the function densityplot()
, but now use imp4
. Is there a difference between the imputations obtained with pmm
and norm
and can you explain this?
densityplot(imp4)
Yes, pmm
samples from the observed values and this clearly shows: imputations follow the shape of the observed data.
19. Compare the ICCs of the variables in the first imputed dataset from imp4
with those of imp1
, imp2
and the incomplete dataset (use popular
, popteach
and texp
).
See Exercise 20 for the solution.
20. Finally, compare the ICCs of the imputations to the ICCs in the original data. The original data can be found in dataset popular
. What do you conclude?
data.frame(vars = names(popNCR[c(6, 7, 5)]),
observed = c(icc(aov(popular ~ class, popNCR)),
icc(aov(popteach ~ class, popNCR)),
icc(aov(texp ~ class, popNCR))),
norm = c(icc(aov(popular ~ class, complete(imp1))),
icc(aov(popteach ~ class, complete(imp1))),
icc(aov(texp ~ class, complete(imp1)))),
normclass = c(icc(aov(popular ~ class, complete(imp2))),
icc(aov(popteach ~ class, complete(imp2))),
icc(aov(texp ~ class, complete(imp2)))),
pmm = c(icc(aov(popular ~ class, complete(imp4))),
icc(aov(popteach ~ class, complete(imp4))),
icc(aov(texp ~ class, complete(imp4)))),
orig = c(icc(aov(popular ~ as.factor(class), popular)),
icc(aov(popteach ~ as.factor(class), popular)),
icc(aov(texp ~ as.factor(class), popular))))
## vars observed norm normclass pmm orig
## 1 popular 0.3280070 0.2798518 0.3629046 0.3700960 0.3629933
## 2 popteach 0.3138658 0.2639095 0.3326133 0.3385374 0.3414766
## 3 texp 1.0000000 0.4595004 1.0000000 1.0000000 1.0000000
Note: these display only the first imputed data set.
Changing the imputation method
Mice includes several imputation methods for imputing multilevel data:
The latter two methods aggregate level 1 variables at level 2, but in combination with mice.impute.2l.pan
, allow switching regression imputation between level 1 and level 2 as described in Yucel (2008) or Gelman and Hill (2007, p. 541). For more information on these imputation methods see the help.
21. Impute the variable popular
by means of 2l.norm
. Use dataset popNCR2
.
ini <- mice(popNCR2, maxit = 0)
pred <- ini$pred
pred["popular", ] <- c(0, -2, 2, 2, 2, 0, 2)
In the predictor matrix, -2
denotes the class variable, a value 1
indicates a fixed effect and a value 2
indicates a random effect. However, the currently implemented algorithm does not handle predictors that are specified as fixed effects (type = 1
). When using mice.impute.2l.norm()
, the current advice is to specify all predictors as random effects (type = ``2).
meth <- ini$meth
meth <- c("", "", "", "", "", "2l.norm", "")
imp5 <- mice(popNCR2, pred = pred, meth=meth, print = FALSE)
22. Inspect the imputations. Did the algorithm converge?
densityplot(imp5, ~popular, ylim = c(0, 0.35), xlim = c(-1.5, 10))
densityplot(imp4, ~popular, ylim = c(0, 0.35), xlim = c(-1.5, 10))
The imputations generated with 2l.norm
are very similar to the ones obtained by pmm
with class
as a fixed effect. If we plot the first imputed dataset from imp4
and imp5
against the original (true) data:
plot(density(popular$popular)) #true data
lines(density(complete(imp5)$popular), col = "red", lwd = 2) #2l.norm
lines(density(complete(imp4)$popular), col = "green", lwd = 2) #PMM
We can see that the imputations are very similar. When studying the convergence
plot(imp5)
we conclude that it may be wise to run additional iterations. Convergence is not apparent from this plot.
imp5.b <- mice.mids(imp5, maxit = 10, print = FALSE)
plot(imp5.b)
After running another 10 iterations, convergence is more convincing.
23. In the original data, the group variances for popular
are homogeneous. Use 2l.pan
to impute the variable popular
in dataset popNCR2
. Inspect the imputations. Did the algorithm converge?
ini <- mice(popNCR2, maxit = 0)
pred <- ini$pred
pred["popular", ] <- c(0, -2, 2, 2, 1, 0, 2)
meth <- ini$meth
meth <- c("", "", "", "", "", "2l.pan", "")
imp6 <- mice(popNCR2, pred = pred, meth = meth, print = FALSE)
Let us create the densityplot for imp6
densityplot(imp6, ~popular, ylim = c(0, 0.35), xlim = c(-1.5, 10))
and compare it to the one for imp4
densityplot(imp4, ~popular, ylim = c(0, 0.35), xlim = c(-1.5, 10))
If we plot the first imputed dataset from both objects against the original (true) density, we obtain the following plot:
plot(density(popular$popular), main = "black = truth | green = PMM | red = 2l.pan") #
lines(density(complete(imp6)$popular), col = "red", lwd = 2) #2l.pan
lines(density(complete(imp4)$popular), col = "green", lwd = 2) #PMM
We can see that the imputations are very similar. When studying the convergence
plot(imp6)
we conclude that it may be wise to run additional iterations. Convergence is not apparent from this plot.
imp6.b <- mice.mids(imp5, maxit = 10, print = FALSE)
plot(imp6.b)
Again, after running another 10 iterations, convergence is more convincing.
24. Now inspect dataset popNCR3
and impute the incomplete variables according to the following imputation methods:
Variable | Method |
---|---|
extrav | 2l.norm |
texp | 2lonly.mean |
sex | logreg |
popular | 2l.pan |
popteach | 2l.pan |
ini <- mice(popNCR3, maxit = 0)
pred <- ini$pred
pred["extrav", ] <- c(0, -2, 0, 2, 2, 2, 2) #2l.norm
pred["sex", ] <- c(0, 1, 1, 0, 1, 1, 1) #2logreg
pred["texp", ] <- c(0, -2, 1, 1, 0, 1, 1) #2lonly.mean
pred["popular", ] <- c(0, -2, 2, 2, 1, 0, 2) #2l.pan
pred["popteach", ] <- c(0, -2, 2, 2, 1, 2, 0) #2l.pan
meth <- ini$meth
meth <- c("", "", "2l.norm", "logreg", "2lonly.mean", "2l.pan", "2l.pan")
imp7 <- mice(popNCR3, pred = pred, meth = meth, print = FALSE)
25. Evaluate the imputations by means of convergence, distributions and plausibility.
densityplot(imp7)
Given what we know about the missingness, the imputed densities look very reasonable.
plot(imp7)
Convergence has not yet been reached. more iterations are advisable.
26. Repeat the same imputations as in the previous step, but now use pmm
for everything.
pmmdata <- popNCR3
pmmdata$class <- as.factor(popNCR3$class)
imp8 <- mice(pmmdata, m = 5, print = FALSE)
## Warning: Number of logged events: 90
With pmm
, the imputations are very similar and conform to the shape of the observed data.
densityplot(imp8)
When looking at the convergence of pmm
, more iterations are advisable:
plot(imp8)
Conclusions
There are ways to ensure that imputations are not just “guesses of unobserved values”. Imputations can be checked by using a standard of reasonability. We are able to check the differences between observed and imputed values, the differences between their distributions as well as the distribution of the completed data as a whole. If we do this, we can see whether imputations make sense in the context of the problem being studied.
References
Gelman, A., & Hill, J. (2006). Data analysis using regression and multilevel/hierarchical models. Cambridge University Press.
Hox, J. J., Moerbeek, M., & van de Schoot, R. (2010). Multilevel analysis: Techniques and applications. Routledge.
Yucel, R. M. (2008). Multiple imputation inference for multivariate multilevel continuous data with ignorable non-response. Philosophical Transactions of the Royal Society of London A: Mathematical, Physical and Engineering Sciences, 366(1874), 2389-2403. Article
- End of Vignette