3

I ran into an impasse while attempting to write code for Cook's Distance: when a regression model reaches only a moderate size, I can't derive a Hat Matrix through my normal matrix math routines without receiving intractable locking errors (it's an inherent limitation of the platform). The good news is that I can calculate pretty much every other type of regression stat effortlessly on far larger models, like RSS, Norms of Residuals, MSE, RMSD, MAPE, Explained Sum of Squares, Fraction of Variation Unexplained, R2, Lack of Fit Sum of Squares, Pure Error, Total Sum of Squares, inverse covariance matrices, you name it. Observed, predicted and residual values are trivial. Is there any alternate route to calculate the Leverage values backwards from these or any other regression stats? Hat Matrices are probably the most mathematically succinct method of reaching these answers, but in this case they're not an option so I need an equivalent method. I did some searches on CrossValidated over the last few months for the terms "Hat Matrix" and "Leverage," but have yet to find any posts that would address my issue; the only remotely relevant candidate I spotted was an equation posted by Glen_B in Which of these points in this plot has the highest leverage and why? , but I'm not certain that it applies. As always, my overall goal is to teach myself these concepts and acquire the skills to code them myself, so replies that incorporate some problem-solving on my part are even better than ready-made answers.

SQLServerSteve
  • 1,121
  • 1
  • 13
  • 34
  • 3
    Hmm.. Your residual stats can't help because they are about the prediction of Y. Leverages are about relations among Xs (predictors) only. Will you be able to compute squared Mahalanobis distance between each data point of the data cloud X and its centroid? (Note this operation will call for matrix inversion.) If you can - then divide the squared distance by `n-1` and you get the leverage value. – ttnphns Jun 26 '15 at 09:28
  • 1
    Yes - I'm having some technical problems with my Mahalanobis implementation, but the figures aren't far off. Once I get that code fixed, I might be able to take your suggestion and compute it backwards from the squared Mahalanobis. Thanks - I'll give it a try. :) I'd still be interested in hearing about more alternatives, if anyone else can suggest any to complement ttphns' workaround. – SQLServerSteve Jun 26 '15 at 13:13
  • 2
    I'm baffled as to how you cannot get hat matrix elements when you obviously have the coefficient estimates $\hat\beta$ and can make predictions $\hat y = X\hat\beta y = H y$. In other words, the coefficients of the linear combinations you are computing when you make predictions (or equivalently, find the residuals $y-\hat y$) are precisely the entries of the hat matrix $H$. Why aren't those coefficients directly available to you? If they somehow aren't, then can you make predictions based on alternative responses $y^\prime$? Using suitable values you can obtain the columns of $H$. – whuber Jun 26 '15 at 14:06
  • 2
    That is just what I was looking for. The confusion arose because I'm deriving my slopes, intercepts, etc. from the usual arithmetic operations (translated into a set-based language), whereas the Hat formulas I've seen to date use matrix ops. The standard notation \H=X(X^{T}X)^-1 X^T is good if you can use matrices and are calculating from scratch. I sensed the overlap, but couldn't discern which matrix ops matched the figures I'd already derived. It was like climbing a mountain, using a map from the other side; I was where I needed to be, yet still lost. I should be on track now. Thanks. – SQLServerSteve Jun 26 '15 at 15:26
  • The Hat Matrix formula I posted above may need some formatting fixes. This was my first attempt at using TeX. – SQLServerSteve Jun 26 '15 at 15:34

0 Answers0