I have a reasonable understanding of why multicollinearity is a problem is regression models, along the lines of this excellent post.
To summarise my understanding, for a regression model of $y = \alpha + \beta_1x + \beta_2z$ (where $x$ and $z$ are correlated), beta coefficient estimates (as well as being unstable) are difficult to interpret, as a situation where you might increase $z$ without increasing $x$ is unlikely to occur, and not supported by the data.
I understand multicollinearity is less harmful to purely predictive as opposed to explanatory or descriptive models.
I'm interested in another interpretation:
If I decided to increase $z$, and let $x$ vary as it pleases in reaction, what would I see happen to $y$, accounting for the fact that $x$ is likely to move with $z$, and also have it's own effect?
In other words, accepting the causal interpretation that $x$ and $z$ both cause $y$, and are themselves correlated to some extent (.7 say), how would all three variables move if $z$ is (linearly) increased by some amount?
I've tried to model this sort of thing before, fitting $y = \alpha + \beta_1x + \beta_2z$ (model 1), and $x = \alpha + \beta_1z$ (model 2). Hypothetical increased $z$ values are produced, and resulting $x$ values are predicted with model 2. The hypothetical $x$ and $z$ values are used to predict $y$ using model 1. However this feels very unsatisfactory, complicated simulations are required to capture uncertainty (I used sim
in arm
). Additionally, my gut tells me that apart from being painfully inelegant, it's a bad idea for other reasons I can't put my finger on.
- Is such an 'observational'/conditional-when-I-feel-like-it interpretation possible?
- Does anyone know of a better method for this interpretation?
- Can anyone recommend a paper or
R
package along these lines? - Is the above multi-model mess at-all valid?
I'm aware that a model along the lines of $y = \alpha + \beta_1z$ would yield a similar answer to the two-stage mess above, but would lose information in $x$.
I understand that these ideas are similar to structural equation modelling, but apart from having scant knowledge of SEM, I'm yet to find an R
package which allows flexibly extending these models with different link functions for proportional odds models, etc.