Questions tagged [missing-data]

When the data present lack of information (gaps), i.e., are not complete. Hence, it is important to consider this feature when performing an analysis or test.

In statistics, missing data, or missing values, occur when no data value is stored for the variable in an observation. Missing data are a common occurrence and can have a significant effect on the conclusions that can be drawn from the data.

Tag wiki reference: Wikipedia

1427 questions

votes

7 answers

Why doesn't Random Forest handle missing values in predictors?

What are theoretical reasons to not handle missing values? Gradient boosting machines, regression trees handle missing values. Why doesn't Random Forest do that?

random-forest missing-data boosting

asked May 16 '14 at 13:08

Fedorenko Kristina

votes

3 answers

How does R handle missing values in lm?

I'd like to regress a vector B against each of the columns in a matrix A. This is trivial if there are no missing data, but if matrix A contains missing values, then my regression against A is constrained to include only rows where all values are…

r missing-data linear-model

asked May 19 '11 at 21:03

David Quigley

votes

5 answers

Why do some people use -999 or -9999 to replace missing values?

I have a dataset. There are lots of missing values. For some columns, the missing value was replaced with -999, but other columns, the missing value was marked as 'NA'. Why would we use -999 to replace the missing value?

missing-data

asked Jul 22 '16 at 19:47

qqqwww

votes

3 answers

Propensity score matching after multiple imputation

I refer to this paper: Hayes JR, Groner JI. "Using multiple imputation and propensity scores to test the effect of car seats and seat belt usage on injury severity from trauma registry data." J Pediatr Surg. 2008 May;43(5):924-7. In this study,…

missing-data propensity-scores

asked Sep 09 '12 at 12:46

Joe King

3,024
6
32
58

votes

2 answers

Why is the Expectation Maximization algorithm guaranteed to converge to a local optimum?

I have read a couple of explanations of EM algorithm (e.g. from Bishop's Pattern Recognition and Machine Learning and from Roger and Gerolami First Course on Machine Learning). The derivation of EM is ok, I understand it. I also understand why the…

missing-data convergence expectation-maximization

asked Jan 26 '14 at 14:09

michal

1,138
3
11
14

votes

3 answers

R caret and NAs

I very much prefer caret for its parameter tuning ability and uniform interface, but I have observed that it always requires complete datasets (i. e. without NAs) even if the applied "naked" model allows NAs. That is very bothersome, regarding that…

r missing-data data-imputation caret

asked Apr 05 '15 at 15:56

Fredrik

votes

1 answer

How do decision tree learning algorithms deal with missing values (under the hood)

What are the methods that decision tree learning algorithms use to deal with missing values. Do they simply full the slot in using a value called missing? Thanks.

missing-data cart

asked May 02 '14 at 00:52

user1172468

1,505
5
21
36

votes

5 answers

Imputation of missing values for PCA

I used the prcomp() function to perform a PCA (principal component analysis) in R. However, there's a bug in that function such that the na.action parameter does not work. I asked for help on stackoverflow; two users there offered two different ways…

r pca missing-data data-imputation

asked Sep 02 '12 at 10:47

user969113

votes

2 answers

Full information maximum likelihood for missing data in R

Context: Hierarchical regression with some missing data. Question: How do I use full information maximum likelihood (FIML) estimation to address missing data in R? Is there a package you would recommend, and what are typical steps? Online…

r maximum-likelihood missing-data

asked Feb 28 '13 at 01:56

Sootica

1,178
1
14
24

votes

5 answers

A statistical approach to determine if data are missing at random

I have a large set of feature vectors which I will use to attack a binary classification problem (using scikit learn in Python). Before I start to think about imputation, I am interested in trying to determine from the remaining parts of the data if…

missing-data randomness

asked Sep 13 '15 at 15:39

graffe

1,799
1
22
34

votes

1 answer

Difference between missing data and sparse data in machine learning algorithms

What are main differences between sparse data and missing data? And how does it influences machine learning? More specifically, what effect sparse data and missing data have on classification algorithms and regression (predicting numbers) type of…

machine-learning dataset missing-data sparse

asked Mar 14 '17 at 06:45

tired and bored dev

votes

5 answers

Machine learning algorithms to handle missing data

I am trying to develop a predictive model using high-dimensional clinical data including laboratory values. The data space is sparse with 5k samples and 200 variables. The idea is to rank the variables using a feature selection method (IG, RF etc)…

machine-learning missing-data

asked Jun 16 '14 at 00:50

Khader Shameer

votes

4 answers

EM maximum likelihood estimation for Weibull distribution

Note: I am posting a question from a former student of mine unable to post on his own for technical reasons. Given an iid sample $x_1,\ldots,x_n$ from a Weibull distribution with pdf $$ f_k(x) = k x^{k-1} e^{-x^k} \quad x>0 $$ is there a useful…

optimization missing-data expectation-maximization weibull-distribution gumbel-distribution

asked Feb 14 '12 at 10:09

Xi'an

90,397
9
157
575

votes

6 answers

What are the disadvantages of using mean for missing values?

I have an assignment (Data Mining course) and there is a part which asks: "What are the disadvantages of using mean for missing values?" in Missing Value section. So I searched a little bit and the most common answer was: "Because it reduces the…

mathematical-statistics missing-data data-mining data-imputation

asked Apr 02 '20 at 20:13

ali

votes

1 answer

How the 'NA' values are treated in glm in R

I have a data table T1, that contains nearly a thousand variables (V1) and around 200 million data points. The data is sparse and most of the entries are NA. Each datapoints have a unique id and date pair to distinguish from another. I have another…

r generalized-linear-model missing-data

asked Dec 29 '12 at 20:52

user1140126

2 3

…

95 96 Next