Logistic regression interpretation

Question

I have trouble interpreting logistic regressions. And reading through several materials just confused me even more. Perhaps an example along with my course of thinking might clarify - what exactly I'm getting wrong.

Say, we have a data set for any (generic) product sales, lets make it cars for sake of example (data is completely made up, so results might as well lack any logic). We have a sample of people which are car owners where some own particular car and the rest have any sort of other car. Goal is to estimate probabilities of selling this particular car to some potential client with given set of parameters or put in other words - try to make artificial segmentation of market by ranking these segments by sales probability. E.g. We'll define 2 parameters - age of client and age of car. And lets make it all categorical - we'll define 3 client age groups and 3 car age groups.

car: 1 0
client_age: [under 30] [31 - 40] [over 41]
car_age: [under 2] [2 - 5] [over 5]

So, regression would look like car ~ client_age + car_age

And the output:

Intercept: -0.8
client_age[31 - 40]: 0.5
client_age[over 41]: -0.6
car_age[2 - 5]: 0.2
car_age[over 5]: -0.9

With all coefficients being significant at 95%.

So.. now the course of thinking. The general probability of buying a particular car would simply be the ratio of car == 1 to the size of sample. Lets make it 3%. And as I understand it - logistic regression for categorical variables shows improvement in odds (exponent of coefficient) over the baseline. e.g. client_age[31 - 40] would have exp(0.5) or 1.648 or 65% more likely to own this particular car over client_age[21 - 30]. With everything else held constant. Similarly client_age[over 41] would have exp(-0.6) or 0.54881 or 45% less likely to own this car over client_age[21 - 30]. Same applies to car age.

So, my list of questions:

What is the role/interpretation of intercept?
Would it be possible to get all combinations of categorical values and rank them by probability relative to average 3% probability of buying the car?
Is there a point in using logistic regression in this particular example? By that I mean - estimating logistic regression only with categorical variables when I can simply calculate probability for every particular subset of general data set? How would these results compare?

Have you searched our site? See e.g [Interpretation of reference category in logistic regression](http://stats.stackexchange.com/q/33240/17230), [Intercept term in logistic regression](http://stats.stackexchange.com/q/92903/17230), or [Interpreting Intercept when doing logistic regression with categorical data in R](http://stats.stackexchange.com/q/63494/17230) for answers to your first question. — Scortchi - Reinstate Monica, Nov 16 '15 at 13:40

score 2 · Accepted Answer · answered Nov 16 '15 at 15:33

First off, you need to understand that, because the "regression function" is S-shaped, the first derivative is not constant. You therefore need to calculate the marginal effect at a certain point. Comparatively, in a linear regression, we do not have to care about that since the slope of a line is the same everywhere on this line. You have different possibilities: calculate the marginal effect at the mean (in general the default option), at the median, or at particular data points.

That being said:

The intercept is the probability of buying a car when all the other characteristics are zero. Make sure that zero is in the definition domains of your data. The values of the coefficients evaluated at a certain point give the usual interpretation: "if I face a client in the category 31-40, I have an increase of XX% to sell him a car, with respect to my referent class and all else being equal."
What do you mean ?
Yes, in my opinion it makes sense to use a logistic regression in this context. You can use categorical data but pay attention to your referent class.

Logistic regression interpretation

1 Answers1

Linked

Related