Problem statement:
I'm working on a multiple regression after running an RCT, to confirm treatment effectiveness and quantify effect size. Initially, when I regressed a dependent variable against some dummy treatment variables, I have significant results, although my Pseudo $R^2$ wasn't that fantastic (0.3%).
When I added some control variables to 'disentangle effects' - the Pseudo $R^2$ went up (1.3%), which is great. Dummy variables remained significant. 2 of the control variables weren't significant, 2 were.
When I started adding interaction terms (20-30), practically all coefficients, including those for my dummy variables, lost significance. However, my Pseudo $R^2$ went up to ~2%.
I'm trying to find the point at which I stop and decide that I've found my ideal model, but this mass of inputs is messing me up, so I want to disentangle my own conceptual misunderstandings.
.....
Question(s):
Why does $R^2$ continue to increase (and remain significant) with more terms added, even if practically all my coefficients are nonsignificant? What does this mean?
If multiple regression is about 'disentangling effects' - why would terms which were originally significant, become insignificant after adding more control variables? Shouldn't they 'hold their ground' since they are significant (or insignificant)?
At which point do I stop - and decide that a model is best? The one with 40 nonsignificant coefficients but the highest $R^2$, or the parsimonious model with fewer coefficients (4 sig, 2 non-sig), with a lower $R^2$?