3

My case is quite simple.

Customers are faced with series of choices of unique alternatives. For example:

  • Customer 1 chooses one of 1,2,3,4,5,6
  • Customer 1 chooses one of 7,8,9
  • Customer 2 chooses one of 10,11,12,13
  • Customer 2 chooses one of 14,15,16 etc

All the alternatives 1,2,... are unique and can't be naturally categorized. Every alternative has the same set of attributes. I'm interested in effects of alternatives' attributes on customer's choice.

For example, a person travels through a country and chooses a hotel to stay. Every day she is in a new city and the hotels to choose are completely new. The hotels differ by price and rating. One can expect that price has negative and rating has positive effect on customer's choice.

It seems that multinomial logit is not applicable here as it needs a categorical dependent variable.

My understanding is that there should be a simple statistical method for the problem because mathematics behind it seems to be quite clear. Indeed, assume the alternatives have 2 continuous attributes.

Let $U= \alpha X+\beta Y$ be the utility function of the attributes $X$ and $Y$. Every observation can be represented by several points on the $(X,Y)$-plane. Assume there are four alternatives $A, B, C, D$ and $A$ is chosen. For every vector with tail at $A$ calculate the fraction of points in the left half-plane given by this vecor. Thus one gets a circle partitioned into the union of arcs and corresponding numbers so that the sum of two numbers corresponding to the opposite arcs equals 1. picture After aggregation a similar partitioned circle appears. Then the tangents of the vectors from the arc with maximal number are the estimators of $\alpha/\beta$.

8k14
  • 181
  • 7
  • 1
    I think you mean *categorical* instead of categorised. And from your description, it sounds like A1 - A4 are categorical, which means that a multinomial model should work. – mkt Jul 07 '17 at 20:57
  • 1
    Is the customer's choice binary (e.g. buy vs. not buy)? If so you don't need multinomial logistic regression. – Will Jul 07 '17 at 20:58
  • @Will, thanks. The customer chooses one of the alternatives. – 8k14 Jul 07 '17 at 21:07
  • Right, sorry that was reasonably clear I just didn't read the question properly. Your picture isn't showing for me, btw. Also some context might help, i.e. what are these alternatives, why are they different for each customer, and what are the attributes you're hoping to consider? – Will Jul 07 '17 at 21:14
  • @8k14 I'm not sure you got my point: are A1 - A4 (which you now call 1 - 4, I guess) distinct categories or not? https://en.wikipedia.org/wiki/Categorical_variable . Your question seems to contradict itself about this point. – mkt Jul 07 '17 at 21:18
  • @Will For example, a person travels through a country and chooses a hotel to stay. Every day she is in a new city and the hotels to choose are completely new. – 8k14 Jul 07 '17 at 21:25
  • @mkt 1-4 are not categories.What is the contradiction? – 8k14 Jul 07 '17 at 21:27
  • 3
    I'm getting a 404 on the picture file, too. The business about lines and circular arcs makes no sense to me: I do not see how these are an adequate metaphor for any statistical model. – whuber Jul 07 '17 at 21:39
  • @8k14 These statements are confusing to me: "Customers are faced with series of choices of unique alternatives." "All the alternatives 1,2,... are unique and can't be naturally categorized" None of this suggests an ordinal or continuous variable. Also, I second whuber ; the second half of this question is incomprehensible. – mkt Jul 07 '17 at 21:50
  • @mkt Alternatives have attributes which are continious variables For example, a hotel has the price and the distance from the downtown. – 8k14 Jul 07 '17 at 21:58
  • My understanding is that a multinomial logit requires each actor to have the same choice set, so there is a problem there. You could have each person choosing from all choices (even the NA ones), but that could potentially violate IIA. My suggestion is to either look at a nested logit, if you have any variables that can predict what choice set each player will end up with, or check out [this paper](http://web.mit.edu/teppei/www/research/dchoice.pdf) on Random Utility Models with varying choice sets. – Yannis Vassiliadis Jul 07 '17 at 22:11
  • @Yannis Vassiliadis, thanks. The varying choice set is not the only issue in my problem. Even if the number of the alternatives is constant there is no reason why given alternatives from different observation should belong to the same category – 8k14 Jul 07 '17 at 22:16
  • There *are* simple statistical models. They can be extremely difficult to fit, because they have to extract very partial information and in practice it can be impossible to assess goodness of fit. The problem is that the combination of each consumer's attributes with the attributes of the options needs to produce a "utility" to the consumer. However, we don't get to observe the utilities: all we know, from the choice the consumer made, is which option had the largest utility. Often we don't even know what option set was actually considered by the consumer! – whuber Jul 12 '17 at 22:46
  • 2
    @whuber Thanks. We can assume that the customer knows the option set. I understand that this data may be not sufficient to draw conclusions about relative impact of atributes. For example in some cases one can get a well-fit estimator which belongs to a large interval so that even the sign of $alpha/beta$ can not be defined. – 8k14 Jul 13 '17 at 05:33

1 Answers1

3

I extract a simple case : choosing between three hotels $H_1,H_1,H_2$ depending on their prices $(x_1,x_2,x_3)$.

This sounds like multinomial logistic regression but is not (as far as I can see). Like, in multinomial regression, the feature vector is $X=(x_1,x_2,x_3)$ and there are three "fixed" categories (even if they are not the same hostels, it does not matter). But in multinomial logistic regression the linear predictor associated to each category depends on the whole $X$. In terms of utility function, it's like the utility function of each hostel would depend on all $(x_1,x_2,x_3)$. Actually, in your model, the utility function of $H_1$ depends only on $x_1$ : $\beta x_1$. More than this the utility function of $H_2$ depends on $x_2$ with the same coefficient (no reason why the "second" hostel should be treated differently from the first one) : $\beta x_2$

One solution is to get inspired by multinomial logistic regression and build your own model. I use the same notations as https://en.wikipedia.org/wiki/Multinomial_logistic_regression. The intercept can be dropped because it plays a redundant role (with $Z$). Call $h$ the choice variable (outcome) with values in $\{1,2,3\}$.

The model can be:

$P(h=k)=\frac{1}{Z}e^{\beta x_k}$ with $Z$ being the re-normalization coefficient equal to $Z=\displaystyle\sum_{k=1}^3e^{\beta x_k}$

Now when the number of hostels varies, no problem, call $n$ the number of hostels:

$P(h=k|n\text{ choices})=\frac{1}{\displaystyle\sum_{k=1}^n e^{\beta x_k}}e^{\beta x_k}$

Then you can find $\beta$ with MLE.

NOTE : this question may be strongly related to your problem : Alternatives to the multinomial logit model. What I described seens to be exactly conditional logit

Benoit Sanchez
  • 7,377
  • 21
  • 43