4

I am trying to compare imputation methods for an 81 samples x 407 variables data set with ~17% missing values. Some of the variables will be correlated, some highly, that is the nature of the data. I have already filtered variables out of the data set according to variance in quality control samples and number of missing values within treatment groups. None of the variables are categorical, all are numeric or integers. My script is:

mice(data, m=5, method="pmm")

And the error is:

Error in solve.default(xtx + diag(pen)) : 
  system is computationally singular: reciprocal condition number = 1.48341e-18

Other imputation functions work just fine i.e. median, k nearest neighbors, random forest. Why does mice fail?

Emma
  • 135
  • 2
  • 7
  • 1
    Presumably you are trying to invert an non-invertible matrix. Possibly one of your variables is a linear combination of others of your variables – Henry May 24 '16 at 06:56
  • Thanks @Henry, it is very likely the problem lies with the collinearity but variables that are highly correlated are of interest to me. Would this mean the other imputation methods I tried would result in low quality data or is it mainly a problem with predictive mean matching? – Emma May 24 '16 at 20:07
  • 2
    You might want to read https://www.jstatsoft.org/article/view/v045i03/v45i03.pdf especially pages 22 26 and 42 which mention collinearity – Henry May 24 '16 at 20:19
  • https://www.kaggle.com/c/house-prices-advanced-regression-techniques/discussion/24586 Will give you the answer for the problem – user2279481 Mar 27 '18 at 14:25
  • See both answers at https://stats.stackexchange.com/questions/76488/error-system-is-computationally-singular-when-running-a-glm – rolando2 Mar 27 '18 at 20:32
  • mice(mydata[,c(-1)],m=3,seed=123,maxit=500, method='cart') – Nikhil Kumar Oct 16 '18 at 08:08

0 Answers0