Error in s$it : $ operator is invalid for atomic vectors when the data or global environment contains a variable named state. The internal logging object is renamed from state to .mice.state to avoid name collisions (#527). Breaking change for extension packages: custom mice.impute.* functions that call the internal mice:::updateLog() must ensure .mice.state is visible in the call stack (it is, when called from within mice()). Direct calls to mice:::updateLog() outside of the mice() Gibbs sampler will fail to find the logging state.Error in apply(draws, 2, sum) : dim(X) must have a positive length in mice.impute.polr(), mice.impute.lda() and mice.impute.polyreg() occuring when only one missing value was present (#684).Error in colMeans(as.matrix(imp[[j]])) crash when data contains character variables. The chain statistics loop now skips character columns, consistent with the existing factor handling (#601)summary.mipo() (#719)random.effects argument to mice.impute.2l.bin() with options "laplace" (default), "eb", and "marginal", implementing the FCS-GLM distinction between sporadic and systematic missingness as described in Audigier et al. (2018). Also vectorises the imputation loop (#686)random-effects-2l-bin discussing the three random effects strategies for two-level binary imputationknitr::kable() and as.data.frame() on mipo objects (#733), The mipo class inherited from data.frame but is structurally a list, causing as.data.frame() to return 0 rows and breaking knitr::kable() and rmarkdown::paged_table(). Remove the spurious data.frame inheritance and add an as.data.frame.mipo() method that returns summary().mice.impute.midastouch() producing "invalid factor level" warnings when imputing factors with non-default levels (e.g. 0/1 or labelled factors). The function converted y to numeric for internal calculations but returned integer codes instead of the original factor values, causing assignment failures in the completed data. Fix: preserve the original y and return from it (#738)mice() when the data contain POSIXct or POSIXlt date-time columns. Such variables are stored as large numbers (~1e9–1e10) that can make the predictor matrix near-singular in norm-based methods, causing an opaque solve() crash. The warning names the offending columns and suggests converting them to Date or a standardised numeric before imputing (#746)estimice() when the ridge-penalised solve() also fails. Previously this crashed silently; it now stops with a message explaining that extreme predictor scales (e.g. POSIX date-time columns) are the likely cause and suggests standardising or removing such variables (#746)pool() returning dfcom = 1 for clmm models from the ordinal package, which caused incorrect degrees of freedom, p-values and confidence intervals. Root cause: clmm returns empty vectors from both stats::df.residual() and stats::residuals(), so get.dfcom() computed nobs = 0 and silently floored dfcom to 1. Fix: use stats::nobs() as the primary fallback, consistent with broom conventions (#748)options(mice.printFlag = FALSE)ampute(run = FALSE) (#732)Added predict_mi() to generate predictions from models fitted on
multiply imputed datasets. The function pools predictions across
imputations using Rubin’s rules, and can return point predictions
or prediction intervals at a specified confidence level.
Typical workflow:
predict_mi() with the list of models and the corresponding
new data (per imputation).pool = TRUE) or per-imputation
predictions (pool = FALSE).This functionality makes it easier to evaluate predictive performance on test sets while correctly accounting for imputation uncertainty.
Contributed: @fdvanleeuwen, @thomvolker (#720)
Adds a correction for the Barnard-Rubin degrees of freedom calculation that provides stabler results for small samples and zero within-imputation variance. Contributed: @frederikfabriciusbjerre (#726)
Adds fallback for lmer objects in pool() without requiring broom.mixed.
Contributed: @anya-decarlo (#728)
Explicitly load toenail data from the mice package to avoid lme4 conflict. Contributed: @bbolker (#730)
Fixed a long-standing issue in the internal augment() function that affected ordered factors (#713).
Previously, augment() would:
The old behavior could degrade imputation quality for ordinal outcomes when using the "polr" method, potentially causing model convergence issues or increased noise in imputations.
The issue did not affect methods for unordered factors ("logreg", "polyreg", "mnar.logreg"), where level order is inconsequential.
Thanks to @mmansolf for identifying the problem and suggesting a fix. The updated augment() now
correctly preserves the ordered class and level order of factor variables.
mice will now automatically move all passive variables to the end of the visitSequence for passive methods used without a user-specified visitSequence.
This change in behavior ensures greater consistency at the end of each iteration.
The new behavior works well for simple cases. However, for more complex situations — especially when passive variables depend on other passive variables — it is recommended to manually specify a visitSequence that updates each passive variable immediately after one of its right-hand side predictors changes. (#699)
Adds the calltype argument to mice() for mixing predictorMatrix and formulas specifications per variable-block. The calltype argument allows the user to specify some variables (or blocks of variables) by the formulas argument, and other variables by predictorMatrix argument. (Note: This argument was called modeltype in version 3.17.1).
calltype is a character vector of length(blocks) elements that indicates how the imputation model is specified. Entries can one of two values: "pred" or "formula". If calltype = "pred", the predictors of the imputation model for the block are specified by the corresponding row of the predictorMatrix. If calltype = "formula" the imputation model is specified by relevant entry in formulas. The default depends on the presence of the formulas argument. If formulas is present, then mice() sets calltype = "formula" for any block for which a formula is specified. Otherwise, calltype = "pred".
Introduces an optimized matchindex C++ function to improve speed of predictive mean matching (#695)
dawidd6/action-download-artifact@v9pool.r.squared() (#700)lasso.select.norm() and lasso.norm() into one file test-mice.impute.lasso.norm.Rlasso.select.logreg() and lasso.logreg() into one file test-mice.impute.lasso.logreg.Rmice 3.17.0 - with the dfcom argument of pool(..., dfcom = .., ) (#689, #706, #707)method and formulas (#698)Imputing categorical data by predictive mean matching. Predictive mean matching (PMM) is the default method of mice() for imputing numerical variables, but it has long been possible to impute factors. This enhancement introduces better support to work with categorical variables in PMM. The former system translated factors into integers by ynum <- as.integer(f). However, the order of integers in ynum may have no sensible interpretation for an unordered factor. The new system quantifies ynum and could yield better results because of higher $R^2$. The method calculates the canonical correlation between y (as dummy matrix) and a linear combination of imputation model predictors x. The algorithm then replaces each category of y by a single number taken from the first canonical variate. After this step, the imputation model is fitted, and the predicted values from that model are extracted to function as the similarity measure for the matching step.
The method works for both ordered and unordered factors. No special precautions are taken to ensure monotonicity between the category numbers and the quantifications, so the method should be able to preserve quadratic and other non-monotone relations of the predicted metric. It may be beneficial to remove very sparsely filled categories, for which there is a new trim argument. All you have to use the new technique is specify to mice(..., method = "pmm", ...). Both numerical and categorical variables will then be imputed by PMM.
Potential advantages are:
Note that we still lack solid evidence for these claims. (#576). Contributed @stefvanbuuren
New system-independent method for pooling: This version introduces a new function pool.table() that takes a tidy table of parameter estimates stemming from m repeated analyses. The input data must consist of three columns (parameter name, estimate, standard error) and a specification of the degrees of freedom of the model fitted to the complete data. The pool.table() function outputs 14 pooled statistics in a tidy form. The primary use of pool.table() is to support parameter pooling for techiques that have no tidy() or glance() methods, either within R or outside R. The pool.table() function also allows for a novel workflows that 1) break apart the traditional pool() function into a data-wrangling part and a parameters-reducing part, and 2) does not necessarily depend on classed R objects. (#574). Contributed @stefvanbuuren
literanger: Adds support for the literanger package for rf imputation that is about twice as fast as ranger (#648). Thanks @stephematician for the contribution.
The complete(..., action = "long", ...) command puts the columns named ".imp" and ".id" in the last two positions of the long data (instead of first two positions). In this way, the columns of the imputed data will have the same positions as in the original data, which is more user-friendly and easier to work with. Note that any existing code that assumes that variables ".imp" and ".id" are in columns 1 and 2 will need to be modified. The advice is to modify the code using the variable names ".imp" and ".id". If you want the old behaviour, specify the argument order = "first". (#569). Contributed @stefvanbuuren
Drops support for S4. Convert S4-related code to S3. Syntax as(df, "mids") is deprecated. Use as.mids(df) instead.
dots argument to ranger::ranger(...) in mice.impute.rf() (#563). Contributed @edbonnevilleblocks argument at various placesblocks in initialize_chain()rbind(), when formulas are concatenated and duplicate names are found, also rename the duplicated variables in formulas by their new nameNEWS.md formatting to get correct version sequence on CRAN and in-package NEWSmake.method() in a more efficient way (resolves #672)as.mids() from filling the imp object for complete variablesmids, mads, mira and mipo objectscomplete() that auto-repeated imputed values into cells that should NOT be imputed (occurred as a special case of rbind(), where the first set of rows was imputed and the second was not).type by the more informative pred (currently active row of predictorMatrix)filter.mids() that incorrectly removed empty components in the imp objectibind() that incorrectly used length(blocks) as the first dimension of the chainMean and chainVar objectsvisitSequence, chainMean and chainVar components of the mids objectminpuc argument in quickpred() (#634)coef() not available on S4 object when using with lavaan (#615, #616).github/dependabot.yml configuration to automate daily check (#598)roxygen2 7.3.1 requirementsRprofile prints to stdout on Fedora, R version 4.1.3 (#646, #647). Thanks @brookslogan for the fix.methods and rlang from Dependsampute() helpers\link statements that do not pass CRAN checksExpands futuremice() functionality by allowing for external packages and user-written functions (#550). Contributed @thomvolker
Adds GH issue templates bug_report, feature_request and help_wanted (#560). Contributed @hanneoberman
rbind.mids() and cbind.mids() to conform to CRAN policymitml and glmnet to imports so that test code conforms to _R_CHECK_DEPENDS_ONLY=true flag in R CMD checkfuturemice() if there is no .Random.seed yet.predictorMatrix for case F by adding a predictorMatrix argument to make.predictorMatrix()mice.impute.mpmm() example codemice.impute.2lonly.pmm() (#555)tidy(), update(), format() and sum()R CMD check with _R_CHECK_DEPENDS_ONLY=truefuturemice() that throws an error when the number of cores is not specified, but the number of available cores is greater than the number of imputations.mice.impute.mpmm() that changed the column order of the dataAdds a function futuremice() with support for parallel imputation using the future package (#504). Contributed @thomvolker, @gerkovink
Adds multivariate predictive mean matching mice.impute.mpmm(). (#460). Contributed @Mingyang-Cai
Adds convergence() for convergence evaluation (#484). Contributed @hanneoberman
Reverts the internal seed behaviour back to mice 3.13.10 (#515). #432 introduced new local seed in response to #426. However, various issues arose with this facility (#459, #492, #502, #505). This version restores the old behaviour using global .Random.seed. Contributed @gerkovink
Adds a custom.t argument to pool() that allows the advanced user to specify a custom rule for calculating the total variance $T$. Contributed @gerkovink
Adds new argument exclude to mice.impute.pmm() that excludes a user-specified vector of values from matching. Excluded values will not appear in the imputations. Since the observed values are not imputed, the user-specified values are still being used to fit the imputation model (#392, #519). Contributed @gerkovink
.R and .Rmd filessampler.R (#511)inherits() to check on class membershipparlmice()prop, patterns and weights matrices for pattern with only 1'sD1() and D2() (#420)mice()make.where()test-mice.impute.rf.R(#448).Random.seed reads from the .GlobalEnv by get(".Random.seed", envir = globalenv(), mode = "integer", inherits = FALSE)lastSeedValue variable namex$lastSeedValue problem in cbind.mids() (#502)ampute()mice() by smarter random seed initialisation (#459)drop = FALSE buglet in mice.impute.rf() (#447, #448)withr package should have version 2.4.0 (published in January 2021) or higher. Versions withr 2.3.0 and before may give Error: object 'local_seed' is not exported by 'namespace:withr'. Either update manually, or install the patched version mice 3.14.1 from GitHub. (#445). NOTE: withr is no longer needed in mice 3.15.0Adds four new univariate functions using the lasso for automatic variable selection. Contributed by @EdoardoCostantini (#438).
mice.impute.lasso.norm() for lasso linear regressionmice.impute.lasso.logreg() for lasso logistic regressionmice.impute.lasso.select.norm() for lasso selector + linear regressionmice.impute.lasso.select.logreg() for lasso selector + logistic regressionAdds Jamshidian && Jalal's non-parametric MCAR test, mice::MCAR() and associated plot method. Contributed by @cjvanlissa (#423).
Adds two new functions pool.syn() and pool.scalar.syn() that specialise pooling estimates from synthetic data. The "reiter2003" pooling rule assumes that synthetic data were created from complete data. Thanks Thom Volker (#436).
By default, mice.impute.rf() now uses the faster ranger package as back-end instead of randomForest package. If you want the old behaviour specify the rfPackage = "randomForest" argument to the mice(...) call. Contributed @prockenschaub (#431).
.Random.seed (#426, #432) by implementing withr::local_preserve_seed() and withr::local_seed(). This change provides stabler behavior in complex scripts. The change does not appear to break reproducibility when mice() was run with a seed. Nevertheless, if you run into a reproducibility problem, install mice 3.13.12 or before.mice.impute.quadratic(), adds a parameter quad.outcome containing the name of the outcome variable in the complete-data model. Contributed @Mingyang-Cai, @gerkovink (#408)pool() so that it processes the parameters from all gamlss sub-models. Thanks Marcio Augusto Diniz (#406, #405)pool() can extract robust.se from the object returned by broom::tidy() (#310)pool() cannot take a mids object (#433)mice.impute.2l.lmer() to indicate a problem in fitting the imputation model (#385)post parameter (#326)install.on.demand() broke the standard CRAN workflow. mice 3.14.0 does not call install.on.demand() anymore for recommended packages. Also, install.on.demand() will not run anymore in non-interactive mode.mice:::barnard.rubin() function for infinite dfcom. Thanks @huftis (#441).Xi <- as.matrix(...) in mice.impute.2l.lmer() that occurred when a cluster contains only one observation (#384)predictorMatrix to a monotone pattern if visitSequence = "monotone" and maxit = 1 (#316)md.pattern() (#318, #323)make.formulas() (#305, #324)newdata in mice.mids() (#313, #325)where element created in rbind() (#319)mids2spss() replaces the foreign by haven package. Contributed Gerko Vink (#291)tests\testhat\test-D1.R that failed on mitml 0.4-0with.mids() function to old version because the change in commit 4634094 broke downstream package metafor (#292)mice.impute.rf() in finding candidate donors (#288, #289)Much faster predictive mean matching. The new matchindex C function makes predictive mean matching 50 to 600 times faster.
The speed of pmm is now on par with normal imputation (mice.impute.norm())
and with the miceFast package, without compromising on the statistical quality of
the imputations. Thanks to Polkas https://github.com/Polkas/miceFast/issues/10 and
suggestions by Alexander Robitzsch. See #236 for more details.
New ignore argument to mice(). This argument is a logical vector
of nrow(data) elements indicating which rows are ignored when creating
the imputation model. We may use the ignore argument to split the data
into a training set (on which the imputation model is built) and a test
set (that does not influence the imputation model estimates). The argument
is based on the suggestion in
https://github.com/amices/mice/issues/32#issuecomment-355600365. See #32 for
more background and techniques. Crafted by Patrick Rockenschaub
New filter() function for mids objects. New filter() method that
subsets a mids object (multiply-imputed data set).
The method accepts a logical vector of length nrow(data), or an expression
to construct such a vector from the incomplete data. (#269).
Crafted by Patrick Rockenschaub.
Breaking change: The matcher algorithm in pmm has changed to matchindex
for speed improvements. If you want the old behavior, specify mice(..., use.matcher = TRUE).
cpp11 package (#286)with.mids() by calling eval_tidy() on a quosure. Does not yet solve #265.pool() and pool.scalar() (#142, #106, #190 and others)tidy.mipo more flexible (#276)nelsonaalen() gets a tibble (#272)NAs can appear in the imputed data (#267)quickpred() documentation (#268)sum.scores()lm.mids(), glm.mids(), pool.compare().pmm.match() and expandcov()return() calls placed just before end-of-functionprintFlag value (#258)amicesdf.residual, which caused problematic behavior in the D1(), D2(), D3(), anova() and pool(). mice now extracts the relevant information from other parts of the objects returned by survival::coxph(), which solves long-standing issues with the integration of the Cox model (#246).Rccp dependency to work with tidyr 1.1.1 (#248).Non-file package-anchored link(s) in documentation object.ampute documentation (#251).suggests.tidy.mipo() and glance.mipo() return standardized output that conforms to broom specifications. Kindly contributed by Vincent Arel Bundock (#240).D3 testing script that produced an error on CRAN (#244).The D3() function in mice gave incorrect results. This version solves a problem in the calculation of the D3-statistic. See #226 and #228 for more details. The documentation explains why results from mice::D3() and mitml::testModels() may differ.
The pool() function is now more forgiving when there is no glance() function (#233)
It is possible to bypass remove.lindep() by setting eps = 0 (#225)
plot.mids() documentationThis version adds two new NARFCS methods for imputing data under the Missing Not at Random (MNAR) assumption. NARFCS is generalised version of the so-called $\delta$-adjustment method. Margarita Moreno-Betancur and Ian White kindly contributes the functions mice.impute.mnar.norm() and mice.impute.mnar.logreg(). These functions aid in performing sensitivity analysis to investigate the impact of different MNAR assumptions on the conclusion of the study. An alternative for MNAR is the older mice.impute.ri() function.
Installation of mice is faster. External packages needed for imputation and analyses are now installed on demand. The number of dependencies as estimated by rsconnect::appDepencies() decreased from 132 to 83.
The name clash with the complete() function of tidyr should no longer be a problem.
There is now a more flexible pool() function that integrates better with the broom and broom.mixed packages.
pool.compare(). Use D1() instead (#220)utils::globalVariables()tidyr by defining complete.mids() as an S3 method for the tidyr::complete() generic (#212)pool() function to deal with multiple sets of parameters. Currently supported keywords are: term (all broom functions), component (some broom.mixed functions) and y.values (for multinom() model) (#219)install.on.demand() function for lighter installationtoenail2 and remove dependency on HSAUR3ampute in extreme cases (#216)pool with mgcv::gam (#218).gitattributes for consistent line endingspolr() always fail (#206)data.frame (#208)mira-class documentation (#207)CALIBERrfimpute2lonly.norm and 2lonly.pmma2 to elementwise division by a matrix of observations2lonly.norm and 2lonly.pmm2lonly.pmm2lonly.mean now also works with factorsimputationMethod argument in examples by methodcheck.predictorMatrix() (#191)toenail data from orphaned DPpackage packageDPpackage from Suggests field in DESCRIPTIONmd.pattern() (#170, #177)as.mids() (#173)mice.impute.xxx() so that mice::mice() works as expected (#55)mids2spss(), thanks Edgar Schoreit (#149)predictorMatrix.mice 3.3.1 will impute
those variables using the intercept onlynelsonaalen() function for data where variables
time or status have already been defined (#140), thanks matthieu-faronmice 3.0.0 - mice 3.2.0 under
passive imputation.broom 0.5.0 (#128)mice.impute.2l.norm() (#129)mice.impute.2l.norm() (#129)D1() (#128)md.pattern (#126)rbind and cbind (#114)rbind problem when method is a list (#113)parlmice (#109)dfcom argument to pool() (#105, #110)parlmice + bugfix (#107)parlmice (#104)flux (#102)estimice (#101)parent.frame (#98)NEWS.md, index.Rmd and online package documentation.R instead of .rupdateLog (#8, @alexanderrobitzsch)md.pattern (#90)m (#89)Version 3.0 represents a major update that implements the following features:
blocks: The main algorithm iterates over blocks. A block is
simply a collection of variables. In the common MICE algorithm each
block was equivalent to one variable, which - of course - is
the default; The blocks argument allows mixing univariate
imputation method multivariate imputation methods. The blocks
feature bridges two seemingly disparate approaches, joint modeling
and fully conditional specification, into one framework;
where: The where argument is a logical matrix of the same size
of data that specifies which cells should be imputed. This opens
up some new analytic possibilities;
Multivariate tests: There are new functions D1(), D2(), D3()
and anova() that perform multivariate parameter tests on the
repeated analysis from on multiply-imputed data;
formulas: The old form argument has been redesign and is now
renamed to formulas. This provides an alternative way to specify
imputation models that exploits the full power of R's native
formula's.
Better integration with the tidyverse framework, especially
for packages dplyr, tibble and broom;
Improved numerical algorithms for low-level imputation function. Better handling of duplicate variables.
Last but not least: A brand new edition AND online version of Flexible Imputation of Missing Data. Second Edition.
mids object in mice (thanks stephematician) (#61)rbind.mids (thanks stephematician) (#59)pool.compare() in handling factors (#60)rbind.mids in handling where (#59)as.mids(), add as()cart not accepting a matrix (thanks Joerg Drechsler)pool() to list of modelsampute function and vignettes (Rianne Schouten)mice.impute.2l.sys to mice.impute.2l.lmerwhereargument to micewy argument to imputation functionsmice.impute.2l.sys(), author Shahab Jolanicbind() functionmids objectlattice packagexyplot.madsmice.impute.2lonly.pmm()ampute() by Rianne Schoutenmice function (thanks Ben Ogorek)cbind.mids() replaced by calls to cbind()miceVignettes on github (thanks Gerko Vink)README for GitHubccn --> ncc, icn --> niccc(), ncc(), cci(), ic(), nic() and ici() use S3 dispatchmultinom MaxNWts type fix in polyreg and polr #9pool.compare #12as.mids if names not same as all columns #11glmer models #5midastouch: predictive mean matching for small samples (thanks Philip Gaffert, Florian Meinfelder)rpart callridge to 2l.norm().o filesas.mids() bug that crashed miceadds::mice.1chain()impute.polyreg() bug that bombed if there were no predictors (thanks Jan Graffelman)as.mids() bug that gave incorrect $m$ (several users)pool.compare() error for lmer object (thanks Claudio Bustos)mice.impute.2l.norm() if just one NA (thanks Jeroen Hoogland)pool.scalar() now can do Barnard-Rubin adjustmentpool() now handles class lmerMod from the lme4 package.pmm.match() for safetymice.impute.pmm() for increased visibilitymice.impute.rf() from 100 to 10 (thanks Anoop Shah)long2mids() deprecated. Use as.mids() insteadlattice back into DEPENDS to find generic xyplot() and friends2lonly.pmm (thanks Alexander Robitzsch, Gerko Vink, Judith Godin)as.mids() (thanks Tommy Nyberg, Gerko Vink)mdc() in example mice.impute.quadratic()mice.impute.rf() if just one NA (thanks Anoop Shah)summary.mipo() when names(x$qbar) equals NULL (thanks Aiko Kuhn)ncol() in mice.impute.2lonly.mean()