Normalize discrete variables in logistic regression?

Question

I am running the a logistic regression model to test the effects of task variables on choice (left/right). I set up a logistic regression model per subject and test the regression coefficients against zero across subjects later on. One predictor is continuous and I normalize it to account for different possible values across subjects. One regressor is binary and I don't normalize it. One regressor can take on four different values (10,20,30,40) whereas their order and distances are meaningful. However it is still a discrete parameter. Would you normalize the regressor in this case? The results are different whether I do or don't and I wanted to hear your opinion.

I use matlabsglmfitto regress the design matrixxonywith the following optionsbetas = glmfit (x,y,'binomial','link','logit'). When normalize all variables, the respective regression weights for one example subject are (-7.14 4.283 -0.47 -0.49; intercept included). When I only normalize the continuous variablex1` the respective weights are (-5.51 4.283 -0.088 -1.01).

The t values against zero across all participants are [41.52 -3.985 and -0.032] if I normalize all values. If I only normalize the continuous variable they are [20.14 -3.89 -0.48].

Can you specify **exactly** what you did to *normalize* the data? Normally (pun intended) it should not matter for logistic regression, *unless you are using regularization or something else you didn't tell us*. Please tell us more of the context. — kjetil b halvorsen, Aug 05 '19 at 09:04
Sure! I z-score the data and I do not use any kind of regularization. — Laurie, Aug 05 '19 at 09:36
Then, can you explain in which sense the results differ? They should not ... Edit your post to include some computer output — kjetil b halvorsen, Aug 05 '19 at 09:39

score 1 · Answer 1 · answered Aug 05 '19 at 10:48

1

From your latest edit we can see that the estimated coefficients (which you call weights) have changed. They must, since their role is to be multiplied with the $x$'s, which was changed with the normalization (which I would have called standardization). But the models are equivalent, in the sense that the fitted probabilities (logistic regression is a regression for probabilities) will be the same.

To check that, ask your software for the fitted probabilities, and compare them. A simple way is to get the two sets of fitted probabilities and plot them against each other. I don't know how you do that in matlab, but it should be simple.

answered Aug 05 '19 at 10:48

kjetil b halvorsen

63,378
26
142
467

Thanks for your answer. I get why the regression weights are different. What I don`t understand is why the t values across participants differ that much. Sure the weights should be different across partcipants but so should the variance of the distribution of weights be? – Laurie Aug 06 '19 at 07:34
Can you please post complete output? The $t$-values should not differ, but I cannot guess more at what has happened without looking at the output. – kjetil b halvorsen Aug 06 '19 at 07:51

Normalize discrete variables in logistic regression?

1 Answers1