This is one of those situations where what is theoretically true, and what is true in practice can be quite different. I'll try to give an example.
Let's suppose we have centered ant standardized both $X$ and $y$ so that:
- The predictor covariance is $\Sigma = X^{t} X$.
- The intercept estimate is $\beta_0 = 0$.
I'll focus on the case of a linear regression, and try to say something about general glm's at the end. I'll also assume we have two predictors, because it captures all the essential points of the situation.
The solution parameter estimates for a linear model satisfy the equation
$$ X^t X \beta = X^t y $$
which under our assumptions, can be written as
$$ \Sigma \beta = X^t y $$
On the right hand side, are simply taking the dot product of the response vector $y$ with $X$, so we can write
$$ X^t y = \left( \begin{array}{c} cov(X_1, y) \\ cov(X_2, y) \end{array} \right) $$
On the left hand side, we get
$$ \Sigma \beta = \left( \begin{array}{c} \beta_1 + cov(X_1, X_2) \beta_2 \\ cov(X_1, X_2) \beta_1 + \beta_2 \end{array} \right) $$
So the system of equations is
$$ \beta_1 + cov(X_1, X_2) \beta_2 = cov(X_1, y) $$
$$ cov(X_1, X_2) \beta_1 + \beta_2 = cov(X_2, y) $$
As a sanity check, if the predictors are uncorrelated, we get
$$ \beta_1 = cov(X_1, y) $$
$$ \beta_2 = cov(X_2, y) $$
which is intuitive.
So, now, what if the predictors are corellated? Then we can solve by multiplying the bottom equation through by $cov(X_1, X_2)$ to get
$$cov(X_1, X_2)^2 \beta_1 + cov(X_1, X_2) \beta_2 = cov(X_1, X_2) cov(X_2, y) $$
And then subtracting from the top equation cancels out the $\beta_2$'s
$$(1 - cov(X_1, X_2)^2) \beta_1 = cov(X_1, y) - cov(X_1, X_2) cov(X_2, y) $$
Which can be solved for $\beta_1$
$$\beta_1 = \frac{cov(X_1, y) - cov(X_1, X_2) cov(X_2, y)}{(1 - cov(X_1, X_2)^2) } $$
So what do we see here?
- The numerator is what you would get naively, if you first regressed $Y$ on $X_2$, and then regressed $X_1$ on the residual.
- The denominator is the "correction" to the above procedure. If you follow the step-by-step procedure from the first bullet point, it seems you are under-explaining the variance in $Y$ due to $X_1$ and $X_2$. This makes sense, because you have ignored the additional variance due to the fact that $X_1$ and $X_2$ tend to move together.
- If $X_1$ and $X_2$ are tightly correlated, then the denominator is close to zero. This means that any errors in estimation of $cov(X_1, X_2)$ get magnified in the final coefficient estimates. This explains why parameter estimates can be so unstable in high correlation situations.
An analysis on a general glm is much harder to work through, but I'll mention one thing. The glm fitting algorithm reduces to a repeated application of the linear fitting algorithm (using newton's method, this is usually called iteratively re-weighted least squares. The same considerations will hold at each step of that procedure, so you can see how the same general phenomena will be true for the final estimates as well.