First of all, let's talk the difference between collinearity and multi-collinearity:
Collinearity: If in multiple regression analysis, one of the predictors is linearly associated/dependent on other predictor, then this issue is known as collinearity.
For example, let’s consider the linear model
Y = αx1 + β1x1 + β2x2 ... (1)
If predictor x1 can be expressed as linear combination of x2, say, x1 = 3*x2
Then this is known as collinearity among the predictors.
Note that there will be perfect (or very high) correlation between the predictors as opposed to the assumption of linear regression model (All predictors are assumed to be independent).
Essentially it means that one of the independent variables is not really necessary to the model because its effect/impact on the model is already captured by some of the other variables.
This variable is not contributing anything extra to the predictions and can be removed.
If we have true collinearity (perfect correlation as in the example above), then one of the predictor is automatically deleted by some of the software’s like R, the other shows an error or warning for the same.
The effects of collinearity are seen in the variances of the parameter estimates, not in the parameter estimates themselves.
Multicollinearity:
Unfortunately, not all collinearity problems can be detected by inspection of the correlation matrix. It is possible for collinearity to exist between three or more variables even if no pair of variables has a particularly high correlation. This situation is known as multicollinearity.
E.g., If in the model (1), x3 is linear combination of two other predictors, say, x3 = x1 + x2 but correlation between pairs (x1, x3) and (x2, x3) is not high.
To measure multicollinearity, one can calculate VIF or tolerance etc.
To deal with multicollinearity, the simplest solution is to drop one of the problematic variables from the regression. This can usually be done without much compromise to the regression fit, since the presence of collinearity implies that the information that this variable provides about the response is redundant in the presence of the other variables.
For variable selection, automated processes (like step-wise method) should not be used, as stepwise regressions are controversial and might lead to model misspecification. [Paper by Peter Flom: http://www.lexjansen.com/pnwsug/2008/DavidCassell-StoppingStepwise.pdf]
Alternatively, you can use PCA to reduce dimensions of the model or use can use regularization/ penalization (L1/L2 norm or combination of both) or least angle regression (LAR).
The other solution is to combine the collinear variables together into a single predictor.
I believe, this description clarifies all three of your questions.
Read more: I've written multiple answers on similar queries on research gate, few links are: https://www.researchgate.net/post/how_to_deal_with_multicolinearity#view=580ef132ed99e1c1046fcf01
https://www.researchgate.net/post/How_to_explain_the_difference_between_collinearity_and_correlation_And_what_is_the_relationship_between_them#view=5cdebfc7d7141b76ae6bb8c6