I have a problem with a multiple regression I performed:
- model without constant term;
- one dependent continuous variable;
- first set of dummies: derived from 2 continuous variables, I used the median value of them as a threshold to derive two binary variables; from these two binaries, I derived 4 dummies, one for each combination (10, 01, 00, 11);
- second set of dummies: 3 dummies derived from one categorical variable;
- two continuous variables.
This model has a r-squared value of 98% (and similar adjusted r squared): I think it is too high, but I don't know how to interpret it correctly and assess its eventual validity; I know that r squared tend to increase with the number of explanatory variables, but I don't know if the number of dummies has an influence in its value and validity as an indicator of a good regression. Moreover, this model present high VIF values, indicating collinearity: are these measures still valid or not?
I have to say I have also tested the model with constant term (and $k-1$ and $n-1$ dummies), which has a very low r squared (around 10%) but no collinearity problems: I would use this model if only I could separate the effect of the two reference dummies on the constant term (and I don't know how to do it).