Let us first distinguish between perfect multi-collinearity (model matrix not of full rank, so that usual matrix inversions fail. Usually due to misspecification of the predictors) and non-perfect multi-collinearity (some of the predictors are correlated without leading to computational problems). This answer is about the second type, which occurs in almost any multivariable linear model since the predictors have no reason to be uncorrelated.
A simple example with strong multi-collinearity is a quadratic regression. So the only predictors are $X_1 = X$ and $X_2=X^2$:
set.seed(60)
X1 <- abs(rnorm(60))
X2 <- X1^2
cor(X1,X2) # Result: 0.967
This example illustrates your questions/claims:
1. Multicollinearity doesnt affect the regression of the model as a whole.
Let's have a look at an example model:
Y <- 0.5*X1 + X2 + rnorm(60)
fit <- lm(Y~X1+X2)
summary(fit)
#Result
[...]
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.3439 0.3214 -1.070 0.289
X1 1.3235 0.8323 1.590 0.117
X2 0.5861 0.3931 1.491 0.141
Residual standard error: 1.014 on 57 degrees of freedom
Multiple R-squared: 0.7147, Adjusted R-squared: 0.7047
F-statistic: 71.39 on 2 and 57 DF, p-value: 2.996e-16
Global statements about the model are just fine:
- R-Squared: $X$ explains about 71% of the variability of $Y$
- Global F-test: At the 5% level, there is really an association between $X$ and $Y$
- Predictions: For persons with $X$-value 2, a best guess for his $Y$-value is
$$
-0.3439 + 1.3235\cdot 2 + 0.5861 \cdot 2^2 = 4.6475
$$
2. But if we start looking at the effect of individual variable Xs on the explained variable, then we are going to have inaccurate estimates.
The estimates are accurate, this is not the problem. The problem with the standard interpretation of isolated effects is that we hold all other predictors fixed, which is strange if there are strong correlations to those other predictors. In our example it is even wrong to say "the average $Y$ value increases by 1.3235 if we increase $X_1$ by 1 and hold $X_2$ fixed, because $X_2 = X_1^2$. Since we cannot interpret isolated effects descriptively, also all inductive statements about them are not useful: Look at the t-tests in the output. Both are above the 5% level, although the global test of association gives us a p-value below 5%. The null hypothesis of such a t-test is "the effect of the predictor is zero" or, in other words, "the inclusion of this predictor does not increase the true R-squared in the population". Because $X_1$ and $X_2$ are almost perfectly correlated, the model has almost the same R-squared if we drop one of the two variables:
summary(lm(Y~X1))
# Gives
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.7033 0.2148 -3.274 0.00179 **
X1 2.5232 0.2151 11.733 < 2e-16 ***
Residual standard error: 1.025 on 58 degrees of freedom
Multiple R-squared: 0.7036, Adjusted R-squared: 0.6985
F-statistic: 137.7 on 1 and 58 DF, p-value: < 2.2e-16
This already illustrates the first part of the statement:
One other thing to keep in mind is that the tests on the individual coefficients each assume that all of the other predictors are in the model. In other words each predictor is not significant as long as all of the other predictors are in the model. There must be some interaction or interdependence between two or more of your predictors.
The last statement here is plainly wrong.