How does GLM in R behave when given highly correlated predictors

Question

I tried to use GLM in R, the y value is given below

y <- c(2.0875796, 1.0857121, 0.2783329, 0.7866724, 1.7395036, 1.6341974, 0.1919819, -0.9013408, -1.1337154, 0.4611232, 2.1412645, 2.0390984, 1.8198061, 1.9267886, 1.7026195, 1.6506483, 1.8580555, 1.9983037, 1.9643366, -1.0870642, 0.2191361, 1.9835259, 1.4358765, 2.1582919, 0.5332927, 0.5527285, 1.0580714, 0.3343562, -0.1912542, 1.9449530, 0.5128612, -0.1934113, -0.1820204, -0.4954643, -0.1028118, -0.3829145)

The other two predictor is (let's call it x1 and x2 respectively):

x1 <- c(16, 1, 1, 0, 0, 0, 0, 2, 7, 2, 3, 0, 2, 1, 4, 4, 6, 3, 6, 1, 16, 7, 4, 3, 0, 2, 10, 17, 11, 15, 11, 8, 1, 1, 8, 0)

x2 <- c(2, 19, 18, 19, 18, 19, 20, 17, 12, 16, 15, 18, 16, 17, 14, 14, 12, 15, 12, 20, 0, 11, 14, 15, 19, 16, 8, 1, 7, 3, 7, 10, 18, 17, 10, 19)

When fitting the two variables seperately, the result is kind of expected. However, when I combined those 2 variables (which in this case is highly negatively correlated, with correlation around -0.992), I expected the coefficients of the 2 variables to be more or less similar in terms of magnitude, but the sign is reversed. But in fact, the result gave me high intercept value and the coefficient for both variable is negative which is very weird. Am I missing something here? Thank you in advance

The code I use to build the model is something like below:

dat <- data.frame(x1,x2,y)

model <- glm (y ~ x1 + x2, data = dat)

Practically yes, since it demonstrated how the distribution of the parameter if there are significant multicollinearity in the variables. But, I am still not grasping your mathematical explanation. Would you mind to elaborate on the math in more details? — Andreas Adinatha, Aug 18 '21 at 13:31
Oh, you said,"Then, one of the eigenvalues should be close to 0 (I think)". I am not sure why is it the case. The second one is, is it impossible for the value in matrix Q and it's inverse, to somehow counteract the small value in the eigenvalue matrix A? I am sorry if it might sound like a stupid question, but I really am not touching this subject for years now. — Andreas Adinatha, Aug 18 '21 at 14:05
At the moment I can't prove that approximately collinear columns should have a small eigenvalue, but I do know that completely col linear columns will result in an eigenvalue of 0. I should go prove this for my own edification. As to the second point, Q and Q inverse can not "counteract" anything. This is a decomposition of the gram matrix, it is an equality. — Demetri Pananos, Aug 18 '21 at 14:36
Alright, I got it. I would be very glad if you can share with me once you can prove the first question, but I already understand why you think that is the case. Thanks once again. — Andreas Adinatha, Aug 18 '21 at 15:04
See here for a proof (I have not verified it yet, I'm a busy at the moment) https://math.stackexchange.com/questions/4227454/shrinking-eigenvalues-as-two-columns-become-collinear/4227557#4227557 — Demetri Pananos, Aug 18 '21 at 17:12
Let us [continue this discussion in chat](https://chat.stackexchange.com/rooms/128725/discussion-between-andreas-adinatha-and-demetri-pananos). — Andreas Adinatha, Aug 19 '21 at 09:18

How does GLM in R behave when given highly correlated predictors

0 Answers0