Refers to a general class of methods used to "fill in" missing data. Methods used for doing this typically are related to interpolation (http://en.wikipedia.org/wiki/Interpolation) and require assumptions about why the data is missing (e.g. "missing at random")
Questions tagged [data-imputation]
596 questions
30
votes
3 answers
R caret and NAs
I very much prefer caret for its parameter tuning ability and uniform interface, but I have observed that it always requires complete datasets (i. e. without NAs) even if the applied "naked" model allows NAs. That is very bothersome, regarding that…

Fredrik
- 671
- 1
- 5
- 8
28
votes
5 answers
Imputation of missing values for PCA
I used the prcomp() function to perform a PCA (principal component analysis) in R. However, there's a bug in that function such that the na.action parameter does not work. I asked for help on stackoverflow; two users there offered two different ways…

user969113
- 611
- 1
- 5
- 8
24
votes
6 answers
What are the disadvantages of using mean for missing values?
I have an assignment (Data Mining course) and there is a part which asks: "What are the disadvantages of using mean for missing values?" in Missing Value section.
So I searched a little bit and the most common answer was: "Because it reduces the…

ali
- 241
- 2
- 3
21
votes
3 answers
How to combine confidence intervals for a variance component of a mixed-effects model when using multiple imputation
The logic of multiple imputation (MI) is to impute the missing values not once but several (typically M=5) times, resulting in M completed datasets. The M completed datasets are then analyzed with complete-data methods upon which the M estimates and…

Rok
- 331
- 2
- 5
20
votes
1 answer
XGBoost can handle missing data in the forecasting phase
Recently I have reviewed XGBoost algorithm and I have noticed that this algorithm can handle missing data (without requiring imputation) in the training phase. I was wondering if XGboost can handle missing data (without requiring imputation) when it…

Ricardo UES
- 461
- 1
- 3
- 8
18
votes
3 answers
Methods to work around the problem of missing data in machine learning
Virtually any database we want to make predictions using machine learning algorithms will find missing values for some of the characteristics.
There are several approaches to address this problem, to exclude lines that have missing values until…

sn3fru
- 165
- 2
- 14
18
votes
5 answers
A 6th response option ("I don't know") was added to a 5-point Likert scale. Is the data lost?
I need a little bit of help salvaging the data from a questionnaire.
One of my colleagues applied a questionnaire, but inadvertently, instead of using the original 5-point Likert scale (strongly disagree to strongly agree), he inserted a 6th answer…

streamline
- 199
- 1
- 7
16
votes
5 answers
KNN imputation R packages
I am looking for a KNN imputation package. I have been looking at imputation package (http://cran.r-project.org/web/packages/imputation/imputation.pdf) but for some reason the KNN impute function (even when following the example from the…

Wouter
- 2,102
- 3
- 17
- 26
16
votes
2 answers
How to fill in missing data in time series?
I have a large set of pollution data that has been recorded every 10 minutes for the course of 2 years, however there are a number of gaps in the data (including some that go for a few weeks at a time).
The data does seem to be quite seasonal and…

Jamesm131
- 163
- 1
- 1
- 7
16
votes
1 answer
How do the number of imputations & the maximum iterations affect accuracy in multiple imputation?
The help page for MICE defines the function as:
mice(data, m = 5, method = vector("character", length = ncol(data)),
predictorMatrix = (1 - diag(1, ncol(data))),
visitSequence = (1:ncol(data))[apply(is.na(data), 2, any)],
form =…

119631
- 335
- 1
- 2
- 11
15
votes
1 answer
Pooling calibration plots after multiple imputation
I would like advice on pooling the calibration plots/statistics after multiple imputation. In the setting of developing statistical models in order to predict a future event (e.g. using data from hospital records to predict post hospital discharge…

IWS
- 2,554
- 13
- 30
14
votes
2 answers
Using Kalman filters to impute Missing Values in Time Series
I am interested in how Kalman Filters can be used to impute missing values in Time Series Data. Is it also applicable if some consecutive time points are missing? I cannot find much on this topic. Any explanations, comments and links are welcome and…

GS9
- 233
- 1
- 3
- 7
14
votes
2 answers
Imputation of missing data before or after centering and scaling?
I want to impute missing values of a dataset for machine learning (knn imputation). Is it better to scale and center the data before the imputation or afterwards?
Since the scaling and centering might rely on min and max values, in the first case…

aldorado
- 283
- 3
- 9
14
votes
4 answers
How to handle with missing values in order to prepare data for feature selection with LASSO?
My situation:
small sample size: 116
binary outcome variable
long list of explanatory variables: 44
explanatory variables did not come from the top of my head; their choice was based on the literature.
most cases in the sample and most variables…

Puzzled
- 365
- 1
- 2
- 15
13
votes
2 answers
using neighbor information in imputing data or find off-data (in R)
I have dataset with assumption that nearest neighbors are best predictors. Just a perfect example of two-way gradient visualized-
Suppose we have case where few values are missing, we can easily predict based on neighbors and trend.…

rdorlearn
- 3,493
- 6
- 26
- 29