0

I am doing some research on the association of variable Y (binary) and X(discrete). When I run a ttest on the levels of Y within X, I get a non-significant association (p-value = 0.3). My friend told me to add more independent variables and run a regression. This would help decrease the variance of effect of X on Y and you might get of the noise this way. Therefore, your association, controlling for other factors, might become significant.

I have three questions: - Is this a viable solution?

  • What are the variables that I need to include? Are they the confounders that I guess can affect both X and Y?

  • I have a hard time getting the intuition behind regression. How the variance is reduced this way? How to interpret the results of regression in this case (assuming the p-value drops heavily).

p.s.: Some context:

  • what kind of regression are you trying to perform. I can do whatever at this point of proof of concept, I am doing a simple multiple linear regression. Planning to do more complex ML models as the time passes.
  • what actually are Y and X? Context? Let's say X is a self-reported survey value (how happy you are with our product?), and Y being their loyalty (coming back to our store in the next year). If we take an effort to increase X by 10%, can we expect user to come back more to the store? Something on that line.
  • How are you fitting your regression? SPSS, Excel, R, Python..? Python mainly. R also works.
aghd
  • 249
  • 2
  • 8

0 Answers0