Questions tagged [cooks-distance]

A measure of the influence of a single observation in regression modelling.

Cook's distance is a measure of the influence a single observation has. It is sometimes referred to as Cook's deleted residual statistic. It is a function of the distance which the parameter estimates would move if the observation were deleted. Further details can be obtained from the Wikipedia entry or from Cook's original article.

61 questions
53
votes
2 answers

How to read Cook's distance plots?

Does anyone know how to work out whether points 7, 16 and 29 are influential points or not? I read somewhere that because Cook's distance is lower than 1, they are not. Am, I right?
Platypezid
  • 1,197
  • 3
  • 13
  • 16
18
votes
1 answer

Removing outliers based on cook's distance in R Language

I have this R code for linear regression: fit <- lm(target ~ age+sales+income, data = new) How to identify influential observations based upon cook's distance and removing the same from data in R ?
user3459010
  • 331
  • 2
  • 3
  • 5
18
votes
7 answers

Correcting for outliers in a running average

We have a daemon that reads in data from some sensors, and among the things it calculates (besides simply just reporting the state) is the average time it takes for the sensors to change from one value to another. It keeps a running average of 64…
Edward Z. Yang
11
votes
1 answer

Cook's distance cut-off value

I have been reading on cook's distance to identify outliers which have high influence on my regression. In Cook's original study he says that a cut-off rate of 1 should be comparable to identify influencers. However, various other studies use…
dissertationhelp
  • 511
  • 4
  • 6
  • 16
11
votes
2 answers

Generalized Linear Mixed Models: Diagnostics

I have a random intercept logistic regression (due to repeated measurements) and I would like to do some diagnostics, specifically concerning outliers and influential observations. I looked at residuals to see if there are observations that stand…
Emilia
  • 297
  • 1
  • 3
  • 8
11
votes
3 answers

Residuals for logistic regression and Cook's distance

Are there any particular assumptions regarding the errors for logistic regression such as the constant variance of the error terms and the normality of the residuals? Also typically when you have points that have a Cook's distance larger than 4/n,…
lord12
  • 653
  • 3
  • 7
  • 13
11
votes
1 answer

What kind of residuals and Cook's distance are used for GLM?

Does anybody know what the formula for Cook's distance is? The original Cook's distance formula uses studentized residuals, but why is R using std. Pearson residuals when computing the Cook's distance plot for a GLM. I know that studentized…
MarkDollar
  • 5,575
  • 14
  • 44
  • 60
9
votes
2 answers

Why the `cooks.distance()` function doesn't detect an obvious outlier?

I have the next plot: I want to detect outliers to delete them. I apply next code to detect them and delete them: model <- lm(VeDBA.X16 ~ VeDBA.V13AP, data = data) cooksD <- cooks.distance(model) n <- nrow(data) influential_obs <-…
Dekike
  • 411
  • 1
  • 10
8
votes
2 answers

Cook's distance in detecting outliers

According to my understanding, Cook's distance measures the influence of each observation by excluding points when fitting a model. So I assume it could be an reasonable approach for outlier detection? My questions, assume data are categorized into…
Roy C
  • 103
  • 1
  • 5
7
votes
1 answer

Checking for outliers in a glmer (lme4 package) with 3 random factors

I have a question relating to the checking for outliers and / or influential points in my dataset using a glmer model with 3 random variables. I'm investigating the detection rate (SumDetections) of receivers over increasing distance…
FlyingDutch
  • 253
  • 1
  • 4
  • 7
6
votes
2 answers

Identifying outliers in the data

Sample data dat <- structure(list(yld.harvest = c(1800, 2400, 2000, 2400, 2160, 2400, 2400, 2250, 2400, 2280, 2400, 3120, 3300, 3300, 3000, 3000, 2400, 2700, 3000), year = c(1996, 1997, 1998,…
89_Simple
  • 751
  • 1
  • 9
  • 23
6
votes
2 answers

Cook's distance vs. hat values

What exactly does Cook's distance measure? And how is this different from what hat values measure? I know hat values measure how distant a point it form its corresponding fitted point. I also know Cook's distance measures the influence of a point…
Sara
  • 109
  • 1
  • 1
  • 3
6
votes
1 answer

Outliers in Linear Regression that ONLY revert significance

When doing linear regression, all sorts of influence checks (Cook's Distance, leverage, dffits, dfbetas, covratio) can be conducted on the data points. Each of these are literature-supplied with some cut-off levels, i.e. Cook's D > 4/n. However, I…
anspiess
  • 106
  • 4
6
votes
1 answer

Elastic net: dealing with wide data with outliers

Recently I was working on a dataset with ~300 observations and 1500 predictors. I used the glmnet package in R to fit an elastic net model, which gave me a cross-validated (regularised) R-square of 99%. It was suggested by subject matter experts…
6
votes
1 answer

Interpreting case influence statistics (leverage, studentized residuals, and Cook's distance)

I just wanted to clarify some things about leverage, studentized residuals, and Cook's distance: Does a large (in absolute value) studentized residual mean that a case is an outlier? Does a large Cook's distance mean that a case is influential for…
K23
  • 161
  • 7
1
2 3 4 5