0

I am finding it difficult to differentiate between exact colinearity and multi colinearity in multiple regression models.

If we have a multiple regression model i:e: yi = b1+ b2*xi2+ b3*xi3 + ei where yi is dependent variable and x2 and x3 are independent variable.

  1. If there is a relationship between x2 and x3 like x2 = m*x3 (they are linearly correlated), Can we apply the multiple regression models on the above equation and predict precisely the value of y.

  2. Will the least square procedure still holds true or fails?

  3. we can estimate the b2 as attached in an image(if variables xi2 and xi3 are not linearly correlated). Estimated b2

We have to keep the other variables constant we are reducing the effect of other variables in our model to calculate the b2 (coefficient of xi2) like we are subtracting the variance between (y1 and xi3) and (xi2 and xi3) from the total effect.

But why we are not able to calculate the same if xi2 and xi3 are linearly related.(If xi2 and xi3 are exponentially related we can apply the model and predict the results)

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
  • 2
    You are incorrectly opposing two terms "exact collinearity" (or complete collinearity) and "multicollinearity". Multicollinearity is simply collinearity found among variables (vectors) which are 3+ in number. Another term for it could be "hypercollinearity" like we say of "hyperplane". Whatever the potential dimensionality (equal to the number of vectors), 2 or 3+, the (multi)collinearity can be complete and not complete (near collinearity). You might want to look at [these](https://stats.stackexchange.com/a/70910/3277) pictures about it in regression. – ttnphns Sep 16 '17 at 18:46
  • 2
    In my experience, collinearity and multicollinearity are used as synonyms – Peter Flom Aug 11 '19 at 19:42

2 Answers2

1

A linear regression can only be based on the number of linearly independent predictors. If $x2_i=m*x3_i$ and $m$ is known, then you could simply write your model as $$y_i = b1+(b2*m+b3)*x3_i+ e_i=b1+b4*x3_i+ e_i,$$ where $b4=(b2*m+b3)$, and back-solve for $b2$ and $b3$ from the value of $b4$ determined by the regression. If $m$ isn't known, then any combination of $m$ and $b2$ that keeps their product constant would be possible, so you couldn't get a unique solution for all the coefficients.

With multicollinearity other than such exact linear dependence, you can get estimates for as many linear regression coefficients as you have predictors, as well as for the intercept. The process, however, might be numerically unstable in extreme cases, and different samples from the same population are likely to provide greatly different coefficient value estimates for collinear predictors.

I would caution against simply plugging into the formulas that you linked via a picture, particularly if you are in a situation with multicollinearity. Standard statistical software packages use algorithms that might be less sensitive to predictors sets that are close to linear dependence.

EdM
  • 57,766
  • 7
  • 66
  • 187
1

First of all, let's talk the difference between collinearity and multi-collinearity:

Collinearity: If in multiple regression analysis, one of the predictors is linearly associated/dependent on other predictor, then this issue is known as collinearity.

For example, let’s consider the linear model Y = αx1 + β1x1 + β2x2 ... (1)

If predictor x1 can be expressed as linear combination of x2, say, x1 = 3*x2 Then this is known as collinearity among the predictors.

Note that there will be perfect (or very high) correlation between the predictors as opposed to the assumption of linear regression model (All predictors are assumed to be independent).

Essentially it means that one of the independent variables is not really necessary to the model because its effect/impact on the model is already captured by some of the other variables.

This variable is not contributing anything extra to the predictions and can be removed. If we have true collinearity (perfect correlation as in the example above), then one of the predictor is automatically deleted by some of the software’s like R, the other shows an error or warning for the same.

The effects of collinearity are seen in the variances of the parameter estimates, not in the parameter estimates themselves.

Multicollinearity: Unfortunately, not all collinearity problems can be detected by inspection of the correlation matrix. It is possible for collinearity to exist between three or more variables even if no pair of variables has a particularly high correlation. This situation is known as multicollinearity.

E.g., If in the model (1), x3 is linear combination of two other predictors, say, x3 = x1 + x2 but correlation between pairs (x1, x3) and (x2, x3) is not high.

To measure multicollinearity, one can calculate VIF or tolerance etc.

To deal with multicollinearity, the simplest solution is to drop one of the problematic variables from the regression. This can usually be done without much compromise to the regression fit, since the presence of collinearity implies that the information that this variable provides about the response is redundant in the presence of the other variables.

For variable selection, automated processes (like step-wise method) should not be used, as stepwise regressions are controversial and might lead to model misspecification. [Paper by Peter Flom: http://www.lexjansen.com/pnwsug/2008/DavidCassell-StoppingStepwise.pdf]

Alternatively, you can use PCA to reduce dimensions of the model or use can use regularization/ penalization (L1/L2 norm or combination of both) or least angle regression (LAR).

The other solution is to combine the collinear variables together into a single predictor.

I believe, this description clarifies all three of your questions.

Read more: I've written multiple answers on similar queries on research gate, few links are: https://www.researchgate.net/post/how_to_deal_with_multicolinearity#view=580ef132ed99e1c1046fcf01 https://www.researchgate.net/post/How_to_explain_the_difference_between_collinearity_and_correlation_And_what_is_the_relationship_between_them#view=5cdebfc7d7141b76ae6bb8c6

Dr Nisha Arora
  • 884
  • 1
  • 8
  • 21