8

I have never had problems with R crashing before.

I am using the mice package (mice 2.13) to perform multiple imputations. The code works fine on some subsets of the data, but when I run it on other subsets, R crashes (not immediately - after some time). From the output in R just before it crashes, I believe it is using the 2l.pan method of imputation (from the pan package) I have run update.packages() already.

How can I diagnose this problem ?

Problem signature:
  Problem Event Name:   APPCRASH
  Application Name: Rgui.exe
  Application Version:  2.151.59607.0
  Application Timestamp:    4fe47a63
  Fault Module Name:    R.dll
  Fault Module Version: 2.151.59607.0
  Fault Module Timestamp:   4fe47a4e
  Exception Code:   c0000005
  Exception Offset: 0000000000032ec8
  OS Version:   6.1.7601.2.1.0.256.4
  Locale ID:    2057
  Additional Information 1: 7782
  Additional Information 2: 77823beb5887f451c3dd7ae4fe931995
  Additional Information 3: 4491
  Additional Information 4: 4491b41bf90894717964f5eef2cccd84

Update

I have managed to create a reproducible example, with data:

require(foreign)
require(mice)
require(pan)

dt.fail <- read.csv("http://goo.gl/pg8um")
dt.fail$X <- NULL

dt.fail$out <- as.factor(dt.fail$out )
dt.fail$grp<- as.factor(dt.fail$grp)
dt.fail$v1<- as.factor(dt.fail$v1)
dt.fail$v2<- as.factor(dt.fail$v2)
dt.fail$v3 <- as.factor(dt.fail$v3)
dt.fail$v7<- as.factor(dt.fail$v7)
dt.fail$v8 <- as.factor(dt.fail$v8)
dt.fail$v9 <- as.factor(dt.fail$v9)
dt.fail$v11 <- as.factor(dt.fail$v11)
dt.fail$v12 <- as.factor(dt.fail$v12)


PredMatrix <- quickpred(dt.fail)
PredMatrix['CTP',] <- c(1,-2,0,0,0,0,0,0,0,0,1,0,1,1,0,2)



impute = mice(
data=dt.fail, 
    m = 1, 
    maxit = 1,
    imputationMethod = c(
    "logreg",   # out
    "",      # grp   ----> cluster grouping factor
    "pmm",  # v1
    "polyreg",  # v2
    "logreg",   # v3
    "pmm",  # v4
    "logreg",   # v5
    "logreg",   # v6
    "polyreg",  # v7 ----> auxilliary
    "polyreg",  # v8 ----> auxilliary
    "polyreg",  # v9 ----> auxilliary
    "polyreg",  # v10 ----> auxilliary
    "",     # v11 ----> complete
    "",     # v12 ----> complete
    "2l.pan",   # CTP ----> multilevel imputation
    ""),        # const ----> needed for multilevel impuitation

predictorMatrix = PredMatrix, seed = 101
)

And for completeness, here is the predictor matrix I was using:

    .     out grp v1 v2 v3 v4 v5 v6 v7 v8 v9 v10 v11 v12 CTP const
out     0   0  0  0  0  0  1  1  0  0  0   0   0   0   0     0
grp     0   0  0  0  0  0  0  0  0  0  0   0   0   1   0     0
v1      0   0  0  0  0  0  0  0  0  0  0   0   1   1   0     0
v2      0   0  0  0  0  1  1  1  0  1  0   0   1   1   1     0
v3      0   0  0  0  0  1  1  1  0  1  1   0   1   1   1     0
v4      0   0  0  1  1  0  1  1  0  1  1   0   1   1   1     0
v5      1   1  0  0  0  0  0  1  0  1  0   0   1   0   0     0
v6      1   1  0  1  0  1  1  0  0  1  0   0   1   0   0     0
v7      0   0  0  0  0  0  1  1  0  1  0   0   0   1   0     0
v8      0   0  0  0  0  0  1  1  0  0  0   0   1   1   0     0
v9      0   0  0  0  1  1  1  1  0  1  0   0   1   1   1     0
v10     0   0  0  0  0  0  1  1  0  1  0   0   1   1   0     0
v11     0   0  0  0  0  0  0  0  0  0  0   0   0   0   0     0
v12     0   0  0  0  0  0  0  0  0  0  0   0   0   0   0     0
CTP     1  -2  0  0  0  0  0  0  0  0  1   0   1   1   0     2
const   0   0  0  0  0  0  0  0  0  0  0   0   0   0   0     0
Robert Long
  • 53,316
  • 10
  • 84
  • 148
  • The first thing I would do is see if setting `options(error=dump.frames)` gets you anything after the error. R might be able to write the callstack to file before bottoming out. – Matthew Plourde Feb 01 '13 at 14:50
  • I say 'might', but I would be pleasantly surprised if R did in fact manage to do so. Take a look at the 'Just-in-time debugging' section at http://www.stats.uwo.ca/faculty/murdoch/software/debuggingR . The mingw debugger it mentions might be what you need. I haven't used it, but apparently it will display a stack dump at the time of a crash. – Matthew Plourde Feb 01 '13 at 15:15
  • I'd recommend contacting the authors of the package - this sort of a crash is usually an indication something is wrong with their C code – hadley Feb 01 '13 at 17:04
  • If `R` reliably (heh) crashes on certain data subsets and just as reliably doesn't crash on other subsets, then you've got a good start on describing the bug situation. See if you can write any intermediate results to a file, in order to further zoom in on the data and the specific function (or sub-function) call that's blowing up. – Carl Witthoft Feb 01 '13 at 17:32
  • @hadley thanks, I left a message for Stef Van Buuren on CrossValidated in a comment to an answer he gave to me earlier asking him to check here... – Robert Long Feb 01 '13 at 18:42
  • @CarlWitthoft Yes, it does "reliably" crash on some datasets and not on others, but so far it only crashes on quite large ones (~15000 rows with ~35 variables) so making a reproducible example is hard. – Robert Long Feb 01 '13 at 18:44

2 Answers2

5

I occasionally have problems with the 2l methods for large data, but have never seen R itself crash on it. My guess would be that they are related to sparse data (very small clusters). How many predictors do you have relative to cluster size?

Some suggestions:

In your data, you have several covariates that have incomplete data but that are not imputed. Please check whether mice removes them before imputation by setting maxit = 0 and inspects imp$log. If you want to use these as predictors, you should specify an imputation method for them.

The mice package does not use any own fortran or C code, but pan may (I don't know). If you are really determined to find the source of the problem, I suggest that you consult the book by Matloff, which contains chapter on advanced debugging techniques.

The obvious other route is to try to simplify the model. Remove superfluous predictors, use a flat file (e.g. pmm) with cluster allocation as a fixed factor, and check whether the intra-class correlations of the observed and impute data are similar.

The intercept term is automatically added by `mice.impute.2l.pan', so you do not need that.

Hope this helps.

Stef van Buuren
  • 2,081
  • 15
  • 13
  • Thanks ! In the original data (~18000 obs) the minimum cluster size was 10 and there were 4 predictors for CTP in the predictor matrix. I am now imputing all variables and I have obtained a smaller dataset (2800 obs, 15 vars) which is still causing R to crash - I have updated the question with a reproducible example. – Robert Long Feb 02 '13 at 19:33
4

I found what is causing the crash - there was one missing value in grp (which was not being imputed). Still, it does not seem quite right that it crashes R ! After running

dt.fail <- dt.fail[!is.na(dt.fail$grp),]

it no longer crashes, but instead generates the following error:

Error in order(dfr$group) : argument 1 is not a vector

I will post a seperate question about that.

Robert Long
  • 53,316
  • 10
  • 84
  • 148