0

I have data in which there are 1 dependent continuous variable ($Y$) and 20 independent continuous variables ($X$s). I discretized $Y$ into two categories ($low$ and $high$). Then, I applied independent samples t test taking the discretized $Y$ as grouping variables. Next, according to to test result, of the 20 continuous variables, 5 variables had different means Y. Finally, I applied multivariate logistic regression model taking discretized $Y$ as the dependent variable. This time, it was seen that none of the 20 variables had significant p values.

How should I interpret this?

Thanks

Günal

Günal
  • 819
  • 3
  • 10
  • 21
  • 3
    It's not at all clear to us what you mean by " 5 variables had different means Y." What five variables? What different means Y? Do you mean to say that only 5 of the 20 variables you tested appeared to be "statistically significant" when you performed t-tests one by one? If so, did you control from Type I error inflation when you did so? Also, why do you assume there should be consistence between a logistic regression and a t-test? One uses the logistic distribution, the other a t-distribution. They are entirely different tests on arbitrarily grouped data. – StatsStudent Apr 21 '19 at 15:15

1 Answers1

3
  1. What exactly do you expect to be consistent? In a t test you have t and p values while in a logistic regression you have odds ratio, wald statistic and p values. So it is unclear to me what statistic you expect to be the same. The p values?

  2. Why do you expect the results being the same? In a t test you compare the means between two groups while in a logistic regression the model is trying to estimate a function that maps the covariate values to how p changes where p determines the shape of the Bernoulli distribution.

  3. As long as the predictors are not orthogonal (= uncorrelated) it can always be the case that in a bivariat analysis one predictor is significant but in a multivariat analysis the same predictor isn't significant anymore So the situation you describe doesn't seem unsual to me in this regard. But the opposite can be true, as well: in caae of a suppression a predictor that was not significant in a bivariat analysis the same predictor can be significant in a multivariate analysis.

@ "How should I interpret this?": You should choose the analysis that answers your research question. If you are interested in mean difference between two groups you can use the t test. If you want to predict a binary variable you can use the logistic regression. Alhtough as I understood you broke up a continuous variable into two groups which can be a bad idea (see here or here). If your main problem is to decide what variables to include in the model because the bivariate and the multivariate p values differ you should read about variable selection methods (without ignoring the theoretical background).