3

I try to predict whether households use a certain service (TRUE or FALSE) based on various variables, using logistic (LASSO) regression.

Among many others, I have the variables percentage man and percentage woman, which have a -.85 Pearson's correlation coefficient with each other. However, when I run the logistic regression they both have a beta-coefficient of respectively 3.34 and 3.16, which puts them both in the top 40 of most predictive variables among the 150 variables I use.

How can they both be a positive predictor for the label when they are so negatively correlated with each other?

EDIT: some extra info that might be of interest: percentage man correlates with the outcome variable by a Pearson's correlation coefficient of 0.041, and percentage woman by -0.045

Tom
  • 31
  • 3
  • 1
    In an answer at https://stats.stackexchange.com/a/46508/919 I provide a dataset with very nearly these properties: two variables, `x1` and `x2`, have a correlation coefficient of $-0.78.$ That example is constructed to make the coefficients $\beta$ equal to $5$ and $-1,$ but by changing them both to positive numbers (try $5$ for both and generate 40 points) you can create a dataset with the properties you describe. *Ergo,* the explanations of this phenomenon in that thread will answer your question. (There is no essential difference between logistic regression and OLS for this purpose.) – whuber Jun 03 '19 at 15:27

0 Answers0