4

I have elicited 10 attributes from $N$ subjects. Each subject rank ordered his own 10 attributes from the most to the least important one. I am interested in the relation between the order of elicitation (i.e. was it the 1st, 2nd, etc. elicited attribute) and the importance ranking (1, ..., 10). The hypothesis is, that attributes elicited early have higher importance ranks.

As the importance rankings are nested within subjects, I am not sure which models would be suitable to test this hypothesis. Any ideas?

Update 1: Sample data

Below I create some sample data. The sample has N=60 individuals. For each individual, 10 attributes were elicited and ranked with regard to importance (no ties).

library(reshape2)
library(dplyr)

set.seed(1)
N <- 60
d <- data.frame(id=rep(1:N, each=10),     # subject ID
                position = 1:10)          # position of attribute
d <- d %>% 
  group_by(id) %>%                        # generate rank based on position + noise
  mutate(importance_rank = rank(position + rnorm(n(), sd=2)))
head(d)

     id position importance_rank
1     1        1               1
2     1        2               3
3     1        3               2
4     1        4               6
5     1        5               5
6     1        6               4

Tabulating the data shows the dependency which I want to model/test.

dcast(d, position ~ importance_rank)

   position  1  2  3  4  5  6  7  8  9 10
1         1 30 16  7  3  3  1  0  0  0  0
2         2 18 11 14 11  2  3  1  0  0  0
3         3  5 14 16 13  7  5  0  0  0  0
4         4  4 13 11 11 14  5  0  2  0  0
5         5  1  4  7 13  9 10  8  4  2  2
6         6  2  0  2  5 10 10 17  5  8  1
7         7  0  2  3  3  5 17 15  4  8  3
8         8  0  0  0  1  5  4 10 18 16  6
9         9  0  0  0  0  4  5  7 17 13 14
10       10  0  0  0  0  1  0  2 10 13 34

Update 2: A model suggestion

This (mostly mathematical) book covers a variety of models for rank data. It appears, that the rank-ordered logit model (ROL) AKA exploded logit model is one model option to cover such scenarios. A more gentle article on ROL can be found here, and a nice blog post here. The model can be estimated using the mlogit R package. The vignette also has an ROL example (p.25ff.) What I tried:

library(mlogit)
md <- mlogit.data(d, shape = "long", choice = "importance_rank", 
                 alt.var = "position", ranked = TRUE)
summary(mlogit(importance_rank ~ position | 0 , md,
               reflevel = "1"))

Coefficients :
           Estimate Std. Error  t-value  Pr(>|t|)    
position2   0.19272    0.24995   0.7711   0.44068    
position3  -0.15338    0.24391  -0.6288   0.52945    
position4  -0.41315    0.24921  -1.6578   0.09735 .  
position5  -1.29994    0.25856  -5.0276 4.966e-07 ***
position6  -1.78532    0.26232  -6.8058 1.005e-11 ***
position7  -1.92077    0.26615  -7.2168 5.320e-13 ***
position8  -2.49722    0.27249  -9.1645 < 2.2e-16 ***
position9  -2.76654    0.27952  -9.8974 < 2.2e-16 ***
position10 -3.53177    0.27635 -12.7801 < 2.2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Log-Likelihood: -662.47

What I get are tests with position 1 as reference, if I see this correctly. Now the question remains,

  1. does this answer my question
  2. do I need to respecify the model, and
  3. how to properly interpret the results?

What I feel is missing is a single estimate for the effect of position. My understanding of the mode is still very rudimentary. So any suggestions how to model it better or correctly are very welcome :)

Mark Heckmann
  • 261
  • 1
  • 11
  • Is the mere existence of such an effect actually interesting to you? I doubt it. Merely knowing that elicitation order affects ratings *somehow* tells you basically nothing. So perhaps you have an idea of a more specific way that such an effect might work; e.g., traits asked about earlier might be rated higher (a monotonic relationship). – Kodiologist Jul 22 '16 at 19:19
  • @Kodiologist. Yes, the hypothesized relationship is that early attributes are rated higher. I modified the question to make it clearer. – Mark Heckmann Jul 22 '16 at 22:42
  • Hmm, I'm at a loss as to how you could make a regression model that properly accounted for the constraint of each subject using the importance ranks 1 through 10 exactly once each. It's a good question. – Kodiologist Jul 23 '16 at 20:17
  • Are the 10 attributes different for the subjects? Such as they were endorsing 10 their characteristic attributes out of a long checklist? – ttnphns Jul 24 '16 at 19:02
  • @ttnphns. The attributes were obtained by a free elicitation technique, i.e. they are different for each subject. – Mark Heckmann Jul 25 '16 at 19:51
  • Can you please share data frame structure? As I see it, you can create a variable which captures order of the attributes (id for each combination) while value of each attribute can remain as a separate column. You can also create extra features like presence of particular attribute in top 3. – wololo Jul 26 '16 at 10:37
  • @Nishad I added some sample data. – Mark Heckmann Jul 26 '16 at 11:50
  • I stumbled across the rank-ordered logit model which seems to be applied in such situations. See UPDATE 2 above. However, I struggle implementing it. Any ideas? :) – Mark Heckmann Jul 28 '16 at 08:34

1 Answers1

0

There are two approaches -

  1. Treat it as a normal multivariate model (which I thought would reliably work with some feature engineering)

  2. Multilevel modelling - This method is specifically for nested variables. The data in this case has a hierarchical/nested relationship which in away violates the assumption of independence of multivariate approach.

Please refer to this ppt for more detailed explanation & it's comparison with multivariate approach - http://www.biostat.jhsph.edu/~fdominic/teaching/bio656/lectures/1.intro.pdf

Please refer to following paper for more mathematical details- http://www.stat.columbia.edu/~gelman/research/published/multi2.pdf

Lastly the R implementation which also compares the results between multivariate & multilevel approach for nested variables- http://jaredknowles.com/journal/2013/11/25/getting-started-with-mixed-effect-models-in-r

Please take this as just a 'reference for googling further :)' as I have not implemented this (and so have have not tried to summarize any link). I realized that one of my survey model can use this approach so Thank you for the question!

wololo
  • 816
  • 5
  • 14
  • 1) What do you mean by normal multivariate model? A multivariate regression approach, treating the ranks as interval data? 2) The mixed model link you supplied is a general one, not taking into account the ranked nature of the dependent variable. All of this is fine for interval data, but please note that the question is about rank data. – Mark Heckmann Jul 28 '16 at 09:04
  • By normal multivariate I meant create variable which captures the sequence of attributes they are elicited & keep the ranks of the attributes as columns and run the model with 11 columns – wololo Jul 28 '16 at 10:36
  • The ranking of subjects in not interval but each unique order of subject elicitation is interval. On side note, by any chance it is a survey like data where users are ranking the products & the order is creating response bias or not is what you wan to test(https://en.wikipedia.org/wiki/Response_bias) I found number of suggestion to avoid it while designing the survey but could not find any statistical test to prove it (I'm still looking) which I think could be the answer to your question. – wololo Jul 28 '16 at 11:14
  • 1
    To make your answer more complete, how would you run the suggested model, given my sample data above? :) – Mark Heckmann Jul 28 '16 at 12:30