2

I know the definition of leverage points in regression, that is $h_{ii}=x_{i}'(X'X)^{-1}x_{i}. $

In many places and text books, they always say that leverage is a standardized measure of the distance of the ith observation from the center of the x space. But I don't understand why.

I think that if you want to measure that distance, you need to use Mahalanobis distance, that is $(x_{i}-\mu)'\Sigma^{-1}(x_{i}-\mu)$. Clearly, it is not the same with $h_{ii}$. So I am confused why they say that.

Thanks in advance for you reply!

Thank you for your reminding that there is a similar question: Prove the relation between Mahalanobis distance and Leverage?

But in the answer, the author assumed that the mean of the regressors is 0, I don't understand this assumption. What if the mean isn't 0? In this case, the following proceeds can not go through(the matrix $X'X$ is not diagonal matrix and so it is not easy to get its inverse).

Because I don't have enough reputation to leave a comment to the author, so I ask a new question here. I know that if we have proofed the relationship between Mahalanobis distance and Leverage, then my question is solved. But maybe there is another way to answer my question. Thank you!

jzLi
  • 21
  • 2
  • The very beginning of the answer you reference contains a reminder that you can always recenter the regressors before doing the regression. – whuber Apr 29 '16 at 16:58
  • Thank you for your answering! Yes, you are right, in practice, we can recenter the regressors first. But in theory, can you still prove the equation $D^2=(N-1)(h-1/N)$ if the mean of regressors is not 0? PS: By simulation, I find even if you don't recenter the regressors, leverage can still reflect the location in x space, so I wonder if the equation still true or can you explain why this happen. Thanks! – jzLi Apr 30 '16 at 06:44
  • I did some simulations just now, and found that the leverage remains the same whether you recenter regressors or not. So if we can prove $h_{ii}=h^c_{ii}$, where $h_{ii}$ is the leverage of the raw data and $h^c_{ii}$ is the leverage of the recentered data, then problem will be solved! Unfortunately, I can not prove this... – jzLi Apr 30 '16 at 08:25
  • If you read the duplicate *as well as [the thread it references](http://stats.stackexchange.com/a/62147/919),* you will have all the information needed to prove that. – whuber May 01 '16 at 14:53
  • @jzLi I think your question is meaningful. The reason why the diagonal elements of uncentered hat matrix can be used to measure the distance from the centeroid is that centered hat matrix and uncentered hat matrix has the linear relationship. Actually one is just shifted from the other. I will write question and answer about your question and paste the link of it here. – KDG Aug 24 '18 at 12:26
  • https://stats.stackexchange.com/a/363784/212274 Here is the link for you – KDG Aug 24 '18 at 15:12

0 Answers0