I am doing some research on the association of variable Y (binary) and X(discrete). When I run a ttest on the levels of Y within X, I get a non-significant association (p-value = 0.3). My friend told me to add more independent variables and run a regression. This would help decrease the variance of effect of X on Y and you might get of the noise this way. Therefore, your association, controlling for other factors, might become significant.
I have three questions: - Is this a viable solution?
What are the variables that I need to include? Are they the confounders that I guess can affect both X and Y?
I have a hard time getting the intuition behind regression. How the variance is reduced this way? How to interpret the results of regression in this case (assuming the p-value drops heavily).
p.s.: Some context:
- what kind of regression are you trying to perform. I can do whatever at this point of proof of concept, I am doing a simple multiple linear regression. Planning to do more complex ML models as the time passes.
- what actually are Y and X? Context? Let's say X is a self-reported survey value (how happy you are with our product?), and Y being their loyalty (coming back to our store in the next year). If we take an effort to increase X by 10%, can we expect user to come back more to the store? Something on that line.
- How are you fitting your regression? SPSS, Excel, R, Python..? Python mainly. R also works.