0

I am doing some regression diagnostics in R.

I use plot() function and look at the four graphs. However, when I reach Cook's distance graph, I receive a warning saying, "not plotting observations with leverage one: 2174 ".

What does this mean, and how do I find the data point that causes trouble?

What's annoying is that r gives me IDs for those data points (e.g. 2174; 4588, etc.), but I can't figure out what these numbers stand for. Any ideas on how to find out?enter image description here

This is probably "row names", so simply their number. But I can't understand why they're leverage points.

It's probably because they have low values for X (they're 0s), but they have some value for Y. The point 2174 seems to have the highest Y value out of all marked data points by R.

Ken Lee
  • 321
  • 7
  • 1
    To understand leverage see https://stats.stackexchange.com/questions/65912/precise-meaning-of-and-comparison-between-influential-point-high-leverage-point. A point with leverage 1 is effective a data point which has its own parameter in the model, for instance, a categorical variable which has one level with only one observation will generate leverage 1 for the data point with that one occurrence of the level! Do something like that occur in your data? – kjetil b halvorsen May 21 '21 at 01:39
  • I think I've figured out that this is due to the fact that I include `factor(countryname)` in my model. However, I am pretty sure that all of my countries have more than one observation (have done a `count()` for that). – Ken Lee May 21 '21 at 08:14

0 Answers0