I have elicited 10 attributes from $N$ subjects. Each subject rank ordered his own 10 attributes from the most to the least important one. I am interested in the relation between the order of elicitation (i.e. was it the 1st, 2nd, etc. elicited attribute) and the importance ranking (1, ..., 10). The hypothesis is, that attributes elicited early have higher importance ranks.
As the importance rankings are nested within subjects, I am not sure which models would be suitable to test this hypothesis. Any ideas?
Update 1: Sample data
Below I create some sample data. The sample has N=60 individuals. For each individual, 10 attributes were elicited and ranked with regard to importance (no ties).
library(reshape2)
library(dplyr)
set.seed(1)
N <- 60
d <- data.frame(id=rep(1:N, each=10), # subject ID
position = 1:10) # position of attribute
d <- d %>%
group_by(id) %>% # generate rank based on position + noise
mutate(importance_rank = rank(position + rnorm(n(), sd=2)))
head(d)
id position importance_rank
1 1 1 1
2 1 2 3
3 1 3 2
4 1 4 6
5 1 5 5
6 1 6 4
Tabulating the data shows the dependency which I want to model/test.
dcast(d, position ~ importance_rank)
position 1 2 3 4 5 6 7 8 9 10
1 1 30 16 7 3 3 1 0 0 0 0
2 2 18 11 14 11 2 3 1 0 0 0
3 3 5 14 16 13 7 5 0 0 0 0
4 4 4 13 11 11 14 5 0 2 0 0
5 5 1 4 7 13 9 10 8 4 2 2
6 6 2 0 2 5 10 10 17 5 8 1
7 7 0 2 3 3 5 17 15 4 8 3
8 8 0 0 0 1 5 4 10 18 16 6
9 9 0 0 0 0 4 5 7 17 13 14
10 10 0 0 0 0 1 0 2 10 13 34
Update 2: A model suggestion
This (mostly mathematical) book covers a variety of models for rank data. It appears, that the rank-ordered logit model (ROL) AKA exploded logit model is one model option to cover such scenarios. A more gentle article on ROL can be found here, and a nice blog post here. The model can be estimated using the mlogit
R package. The vignette also has an ROL example (p.25ff.) What I tried:
library(mlogit)
md <- mlogit.data(d, shape = "long", choice = "importance_rank",
alt.var = "position", ranked = TRUE)
summary(mlogit(importance_rank ~ position | 0 , md,
reflevel = "1"))
Coefficients :
Estimate Std. Error t-value Pr(>|t|)
position2 0.19272 0.24995 0.7711 0.44068
position3 -0.15338 0.24391 -0.6288 0.52945
position4 -0.41315 0.24921 -1.6578 0.09735 .
position5 -1.29994 0.25856 -5.0276 4.966e-07 ***
position6 -1.78532 0.26232 -6.8058 1.005e-11 ***
position7 -1.92077 0.26615 -7.2168 5.320e-13 ***
position8 -2.49722 0.27249 -9.1645 < 2.2e-16 ***
position9 -2.76654 0.27952 -9.8974 < 2.2e-16 ***
position10 -3.53177 0.27635 -12.7801 < 2.2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Log-Likelihood: -662.47
What I get are tests with position 1 as reference, if I see this correctly. Now the question remains,
- does this answer my question
- do I need to respecify the model, and
- how to properly interpret the results?
What I feel is missing is a single estimate for the effect of position. My understanding of the mode is still very rudimentary. So any suggestions how to model it better or correctly are very welcome :)