This is the fourth vignette in a series of six.

In this vignette we will walk you through the more advanced features of mice, such as post-processing of imputations and passive imputation.


1. Open R and load the packages mice and lattice.

require(mice)
require(lattice)
set.seed(123)

We choose seed value 123. This is an arbitrary value; any value would be an equally good seed value. Fixing the random seed enables you (and others) to exactly replicate anything that involves random number generators. If you set the seed in your R instance to 123, you will get the exact same results and plots as we present in this document.


Passive Imputation

There is often a need for transformed, combined or recoded versions of the data. In the case of incomplete data, one could impute the original, and transform the completed original afterwards, or transform the incomplete original and impute the transformed version. If, however, both the original and the transformed version are needed within the imputation algorithm, neither of these approaches work: One cannot be sure that the transformation holds between the imputed values of the original and transformed versions. mice has a built-in approach, called passive imputation, to deal with situations as described above. The goal of passive imputation is to maintain the consistency among different transformations of the same data. As an example, consider the following deterministic function in the boys data \[\text{BMI} = \frac{\text{Weight (kg)}}{\text{Height}^2 \text{(m)}}\] or the compositional relation in the mammalsleep data: \[\text{ts} = \text{ps}+\text{sws}\]


2. Use passive imputation to impute the deterministic sleep relation in the mammalsleep data. Name the new multiply imputed dataset pas.imp.

ini <- mice(mammalsleep[, -1], maxit=0, print=F)
meth<- ini$meth
meth
##    bw   brw   sws    ps    ts   mls    gt    pi   sei   odi 
##    ""    "" "pmm" "pmm" "pmm" "pmm" "pmm"    ""    ""    ""
pred <- ini$pred
pred
##     bw brw sws ps ts mls gt pi sei odi
## bw   0   1   1  1  1   1  1  1   1   1
## brw  1   0   1  1  1   1  1  1   1   1
## sws  1   1   0  1  1   1  1  1   1   1
## ps   1   1   1  0  1   1  1  1   1   1
## ts   1   1   1  1  0   1  1  1   1   1
## mls  1   1   1  1  1   0  1  1   1   1
## gt   1   1   1  1  1   1  0  1   1   1
## pi   1   1   1  1  1   1  1  0   1   1
## sei  1   1   1  1  1   1  1  1   0   1
## odi  1   1   1  1  1   1  1  1   1   0
pred[c("sws", "ps"), "ts"] <- 0
pred
##     bw brw sws ps ts mls gt pi sei odi
## bw   0   1   1  1  1   1  1  1   1   1
## brw  1   0   1  1  1   1  1  1   1   1
## sws  1   1   0  1  0   1  1  1   1   1
## ps   1   1   1  0  0   1  1  1   1   1
## ts   1   1   1  1  0   1  1  1   1   1
## mls  1   1   1  1  1   0  1  1   1   1
## gt   1   1   1  1  1   1  0  1   1   1
## pi   1   1   1  1  1   1  1  0   1   1
## sei  1   1   1  1  1   1  1  1   0   1
## odi  1   1   1  1  1   1  1  1   1   0
meth["ts"]<- "~ I(sws + ps)"
pas.imp <- mice(mammalsleep[, -1], meth=meth, pred=pred, maxit=10, seed=123, print=F)

We used a custom predictor matrix and method vector to tailor our imputation approach to the passive imputation problem. We made sure to exclude ts as a predictor for the imputation of sws and ps to avoid circularity.

We also gave the imputation algorithm 10 iterations to converge and fixed the seed to 123 for this mice instance. This means that even when people do not fix the overall R seed for a session, exact replication of results can be obtained by simply fixing the seed for the random number generator within mice. Naturally, the same input (data) is each time required to yield the same output (mids-object).


3. Inspect the trace lines for pas.imp.

plot(pas.imp)

We can see that the pathological nonconvergence we experienced before has been properly dealt with. The trace lines for the sleep variable look okay now and convergence can be inferred by studying the trace lines.


Post-proccessing of the imputations

Remember that we imputed the boys data in the previous tutorial with pmm and with norm. One of the problems with the imputed values of tv with norm is that there are negative values among the imputations. Somehow we should be able to lay a constraint on the imputed values of tv.

The mice() function has an argument called post that takes a vector of strings of R commands. These commands are parsed and evaluated after the univariate imputation function returns, and thus provides a way of post-processing the imputed values while using the processed version in the imputation algorithm. In other words; the post-processing allows us to manipulate the imputations for a particular variable that are generated within each iteration. Such manipulations directly affect the imputated values of that variable and the imputations for other variables. Naturally, such a procedure should be handled with care.


4. Post-process the values to constrain them between 1 and 25, use norm as the imputation method for tv.

ini <- mice(boys, maxit = 0)
meth <- ini$meth
meth["tv"] <- "norm"
post <- ini$post
post["tv"] <- "imp[[j]][, i] <- squeeze(imp[[j]][, i], c(1, 25))"
imp <- mice(boys, meth=meth, post=post, print=FALSE)

In this way the imputed values of tv are constrained (squeezed by function squeeze()) between 1 and 25.


5. Compare the imputed values and histograms of tv for the solution obtained by pmm to the constrained solution (created with norm, constrained between 1 and 25).

First, we recreate the default pmm solution

imp.pmm <- mice(boys, print=FALSE)

and we inspect the imputed values for the norm solution

table(complete(imp)$tv)
## 
##                1 1.21710148525462 1.43588483352455 1.51201882679116 
##              323                1                1                1 
## 1.53292756925877 1.53884163903632 1.58285408167725 1.80802667422845 
##                1                1                1                1 
##                2 2.07663165624588 2.24979678211323 2.92335718014784 
##               26                1                1                1 
## 2.94117422338134 2.95084234692005                3 3.02364296734824 
##                1                1               19                1 
## 3.53970347137679 3.57972881016042 3.74672076785238 3.96039115719309 
##                1                1                1                1 
##                4 4.04754711531009 4.08248223904697 4.14754362271122 
##               17                1                1                1 
## 4.17913738746458  4.5108993044032 4.57883711875945 4.72773944856595 
##                1                1                1                1 
## 4.78039105160729 4.87498648012577 4.95790683812151                5 
##                1                1                1                5 
## 5.22626498342572 5.27699780349692 5.29699542293051  5.5135973716855 
##                1                1                1                1 
## 5.58961218020009 5.69794944880459                6 6.25631547711316 
##                1                1               10                1 
## 6.35152222227269 6.50171770177514 7.15463206998544 7.41565552751217 
##                1                1                1                1 
## 7.91305727094718 7.93871120477867                8 8.05174592128119 
##                1                1               13                1 
## 8.05700045665079 8.16362288564921 8.21549432374612 8.49997598070039 
##                1                1                1                1 
## 8.51370505398192 8.58764399352692 8.61845893470197 8.97364475446134 
##                1                1                1                1 
##                9 9.00919768391507 9.63183636229161 9.63562902483335 
##                1                1                1                1 
## 9.76095833349053 9.84465985091274               10 10.2674945703833 
##                1                1               16                1 
## 10.4662560634783 10.5268192284329 10.5687382671324 10.6082021769037 
##                1                1                1                1 
## 10.6737703082101 10.8772940701329 10.8854812337215  10.930532331179 
##                1                1                1                1 
## 11.0093322524732 11.1652748123109 11.2642535206875 11.5290193322052 
##                1                1                1                1 
## 11.7584784239471 11.7965517201956 11.8064326042326               12 
##                1                1                1               15 
## 12.1696703772424  12.592892417515  12.618204518511 12.6689878607267 
##                1                1                1                1 
## 12.7081972517136 12.8979961443309               13 13.0240980425785 
##                1                1                1                1 
## 13.0587048153365  13.106348108398 13.5405375517151 13.9431160205217 
##                1                1                1                1 
##               14 14.2482185768873 14.2798608007013 14.3315252653246 
##                1                1                1                1 
## 14.4076303004689 14.4338076497289 14.5138128344765 14.6788476362263 
##                1                1                1                1 
## 14.6824007199852 14.7441518164776 14.8078250873427 14.9525753157757 
##                1                1                1                1 
## 14.9981738657718               15 15.1258748162893 15.1835169459907 
##                1               27                1                1 
## 15.4911886491463 15.5120235066289 15.6264815503803 15.6348463939284 
##                1                1                1                1 
## 15.6933134568447 15.7497016188987 15.7977063752737 15.8513524150625 
##                1                1                1                1 
## 15.9198411082574 15.9284148402579               16  16.340697942133 
##                1                1                1                1 
## 16.3716211968936 16.4317068195426 16.4699630810462 16.5178617262188 
##                1                1                1                1 
## 16.6654847715227 16.6721133690251 16.6871350554192 16.7050893382433 
##                1                1                1                1 
## 16.7500877350959  16.919505854755 16.9696935240149               17 
##                1                1                1                1 
## 17.0607955750451  17.061123780778 17.0975117585518 17.1863794919946 
##                1                1                1                1 
## 17.2885587826642 17.3213750660243 17.3288063082604 17.3371972081781 
##                1                1                1                1 
## 17.4139221505318 17.5083623700701  17.525693552751 17.5536295065084 
##                1                1                1                1 
## 17.7355682743341 17.7543354628357 17.7767055311222 17.7892576850834 
##                1                1                1                1 
## 17.8020025078871 17.9058272130569 17.9861341293521               18 
##                1                1                1                1 
## 18.0089953938541 18.0125066085225 18.0780752065915 18.0894548325433 
##                1                1                1                1 
## 18.1085713536776 18.4853383062642 18.4943336555561 18.6076606087712 
##                1                1                1                1 
## 18.6213898628289 18.7328605098401 18.8458099353893 18.8627827329173 
##                1                1                1                1 
## 19.0764930131297 19.2910290679375 19.3288860993901 19.4126039248187 
##                1                1                1                1 
## 19.6064318553331  19.653890171068 19.6670241312546 19.8448125657382 
##                1                1                1                1 
## 19.8516466793385               20  20.006279627252 20.1685051396209 
##                1               38                1                1 
## 20.2483453303049 20.4090716225676 20.5489240439564 20.7075019374611 
##                1                1                1                1 
##  20.848944671593 20.8504396969632 20.9256973033623 21.2111117512718 
##                1                1                1                1 
## 21.2988437259828 21.4887083820786 21.6632008465181 21.8584346902023 
##                1                1                1                1 
## 21.9017081211805 22.0705004778976 22.3764963365167 22.4619460041735 
##                1                1                1                1 
## 22.6563991359074 22.9207287921778 22.9263656773065 23.0539049074043 
##                1                1                1                1 
## 23.0557994938662 23.4243509549263  23.554099918438 23.7245812810879 
##                1                1                1                1 
## 23.7401686860153 23.8808526223058 23.9241225232408 24.4052538064194 
##                1                1                1                1 
## 24.4395463953961 24.6699887395728               25 
##                1                1               44

and for the pmm solution

table(complete(imp.pmm)$tv)
## 
##   1   2   3   4   5   6   8   9  10  12  13  14  15  16  17  18  20  25 
##  73 219  99  29   6  16  26   1  37  29   3   5  47   1   1   4  88  64

It is clear that the norm solution does not give us integer data as imputations. Next, we inspect and compare the density of the incomplete and imputed data for the constrained solution.

densityplot(imp, ~tv)


A nice way of plotting the histograms of both datasets simultaneously is by creating first the dataframe (here we named it tvm) that contains the data in one column and the imputation method in another column.

tv <- c(complete(imp.pmm)$tv, complete(imp)$tv)
method <- rep(c("pmm", "norm"), each = nrow(boys))
tvm <- data.frame(tv = tv, method = method)

and then plotting a histogram of tv conditional on method.

histogram( ~tv | method, data = tvm, nint = 25)

Is there still a difference in distribution between the two different imputation methods? Which imputations are more plausible do you think?


6. Make a missing data indicator (name it miss) for bmi and check the relation of bmi, wgt and hgt for the boys in the imputed data. To do so, plot the imputed values against their respective calculated values.

miss <- is.na(imp$data$bmi)
xyplot(imp, bmi ~ I (wgt / (hgt / 100)^2),
       na.groups = miss, cex = c(0.8, 1.2), pch = c(1, 20),
       ylab = "BMI (kg/m2) Imputed", xlab = "BMI (kg/m2) Calculated")

With this plot we show that the relation between hgt, wgt and bmi is not preserved in the imputed values. In order to preserve this relation, we should use passive imputation.


7. Use passive imputation to conserve the relation between imputed bmi, wgt and hgt by setting the imputation method for bmi to meth["bmi"]<- "~ I(wgt / (hgt / 100)^2)".

meth<- ini$meth
meth["bmi"]<- "~ I(wgt / (hgt / 100)^2)"
imp <- mice(boys, meth=meth, print=FALSE)

8. Again, plot the imputed values of bmi versus the calculated values and check whether there is convergence for bmi.

To inspect the relation:

xyplot(imp, bmi ~ I(wgt / (hgt / 100)^2), na.groups = miss,
       cex = c(1, 1), pch = c(1, 20),
       ylab = "BMI (kg/m2) Imputed", xlab = "BMI (kg/m2) Calculated")

To study convergence for bmi alone:

plot(imp, c("bmi"))

Although the relation of bmi is preserved now in the imputations we get absurd imputations and on top of that we clearly see there are some problems with the convergence of bmi. The problem is that we have circularity in the imputations. We used passive imputation for bmi but bmi is also automatically used as predictor for wgt and hgt. This can be solved by adjusting the predictor matrix.


9. Solve the problem of circularity (if you did not already do so) and plot once again the imputed values of bmi versus the calculated values.

First, we remove bmi as a predictor for hgt and wgt to remove circularity.

pred<-ini$pred
pred
##     age hgt wgt bmi hc gen phb tv reg
## age   0   1   1   1  1   1   1  1   1
## hgt   1   0   1   1  1   1   1  1   1
## wgt   1   1   0   1  1   1   1  1   1
## bmi   1   1   1   0  1   1   1  1   1
## hc    1   1   1   1  0   1   1  1   1
## gen   1   1   1   1  1   0   1  1   1
## phb   1   1   1   1  1   1   0  1   1
## tv    1   1   1   1  1   1   1  0   1
## reg   1   1   1   1  1   1   1  1   0
pred[c("hgt", "wgt"), "bmi"] <- 0
pred
##     age hgt wgt bmi hc gen phb tv reg
## age   0   1   1   1  1   1   1  1   1
## hgt   1   0   1   0  1   1   1  1   1
## wgt   1   1   0   0  1   1   1  1   1
## bmi   1   1   1   0  1   1   1  1   1
## hc    1   1   1   1  0   1   1  1   1
## gen   1   1   1   1  1   0   1  1   1
## phb   1   1   1   1  1   1   0  1   1
## tv    1   1   1   1  1   1   1  0   1
## reg   1   1   1   1  1   1   1  1   0

and we run the mice algorithm again with the new predictor matrix (we still ‘borrow’ the imputation methods object meth from before)

imp <-mice(boys, meth=meth, pred=pred, print=FALSE)

Second, we recreate the plots from Assignment 8. We start with the plot to inspect the relations in the observed and imputed data

xyplot(imp, bmi ~ I(wgt / (hgt / 100)^2), na.groups = miss,
       cex=c(1, 1), pch=c(1, 20),
       ylab="BMI (kg/m2) Imputed", xlab="BMI (kg/m2) Calculated")

and continue with the trace plot to study convergence

plot(imp, c("bmi"))

All is well now!


Conclusion

We have seen that the practical execution of multiple imputation and pooling is straightforward with the R package mice. The package is designed to allow you to assess and control the imputations themselves, the convergence of the algorithm and the distributions and multivariate relations of the observed and imputed data.

It is important to ‘gain’ this control as a user. After all, we are imputing values and taking their uncertainty properly into account. Being also uncertain about the process that generated those values is therefor not a valid option.


For fun: what you shouldn’t do with passive imputation

Never set all relations fixed. You will remain with the starting values and waste your computer’s energy (and your own).

ini <- mice(boys, maxit=0)
meth<- ini$meth
pred <- ini$pred
pred
##     age hgt wgt bmi hc gen phb tv reg
## age   0   1   1   1  1   1   1  1   1
## hgt   1   0   1   1  1   1   1  1   1
## wgt   1   1   0   1  1   1   1  1   1
## bmi   1   1   1   0  1   1   1  1   1
## hc    1   1   1   1  0   1   1  1   1
## gen   1   1   1   1  1   0   1  1   1
## phb   1   1   1   1  1   1   0  1   1
## tv    1   1   1   1  1   1   1  0   1
## reg   1   1   1   1  1   1   1  1   0
meth["bmi"]<- "~ I(wgt/(hgt/100)^2)"
meth["wgt"]<- "~ I(bmi*(hgt/100)^2)"
meth["hgt"]<- "~ I(sqrt(wgt/bmi)*100)"
imp.path <- mice(boys, meth=meth, pred=pred, seed=123)
## 
##  iter imp variable
##   1   1  hgt  wgt  bmi  hc  gen  phb  tv  reg
##   1   2  hgt  wgt  bmi  hc  gen  phb  tv  reg
##   1   3  hgt  wgt  bmi  hc  gen  phb  tv  reg
##   1   4  hgt  wgt  bmi  hc  gen  phb  tv  reg
##   1   5  hgt  wgt  bmi  hc  gen  phb  tv  reg
##   2   1  hgt  wgt  bmi  hc  gen  phb  tv  reg
##   2   2  hgt  wgt  bmi  hc  gen  phb  tv  reg
##   2   3  hgt  wgt  bmi  hc  gen  phb  tv  reg
##   2   4  hgt  wgt  bmi  hc  gen  phb  tv  reg
##   2   5  hgt  wgt  bmi  hc  gen  phb  tv  reg
##   3   1  hgt  wgt  bmi  hc  gen  phb  tv  reg
##   3   2  hgt  wgt  bmi  hc  gen  phb  tv  reg
##   3   3  hgt  wgt  bmi  hc  gen  phb  tv  reg
##   3   4  hgt  wgt  bmi  hc  gen  phb  tv  reg
##   3   5  hgt  wgt  bmi  hc  gen  phb  tv  reg
##   4   1  hgt  wgt  bmi  hc  gen  phb  tv  reg
##   4   2  hgt  wgt  bmi  hc  gen  phb  tv  reg
##   4   3  hgt  wgt  bmi  hc  gen  phb  tv  reg
##   4   4  hgt  wgt  bmi  hc  gen  phb  tv  reg
##   4   5  hgt  wgt  bmi  hc  gen  phb  tv  reg
##   5   1  hgt  wgt  bmi  hc  gen  phb  tv  reg
##   5   2  hgt  wgt  bmi  hc  gen  phb  tv  reg
##   5   3  hgt  wgt  bmi  hc  gen  phb  tv  reg
##   5   4  hgt  wgt  bmi  hc  gen  phb  tv  reg
##   5   5  hgt  wgt  bmi  hc  gen  phb  tv  reg
plot(imp.path, c("hgt", "wgt", "bmi"))

We named the mids- object imp.path, because the nonconvergence is pathological in this example!


- End of Vignette