Background: In some cognitive psychology research areas N-alternative forced choice tasks are common. The most common of these is a two alternative forced choice (2AFC). This usually takes the form of participants being given a stimulus and asked to make one of two judgement, e.g. the target stimuli is present/absent, the stimulus on the left is the same/different than the the one on the right, etc. Designs in which the experimental data is from a 2AFC but there is only one data point per subject are rare, but do exist, e.g. some eye-witness identification research. Since the dependent variable (correct/incorrect) is binary, these experiments are reasonable places to use logistic regression.
My question is this: since chance performance is 50% in a 2AFC trial, is it still reasonable to use the standard logistic link function? Specifically, the logistic function has a minimum value approaching 0% correct, but in practice participants in a 2AFC should be correct at least 50% of the time due to chance. I imagine the following case in which it may present a problem: an independent variable is assessing the difficulty of the discrimination (e.g. difficulty 1, easy - 5, hard; please note this is introduced in ordinal terms only for ease of comprehension - for the sake of this discussion consider this variable as being interval) - participants got 50% correct at 5 and 4, 75% correct at 3, 85% correct at 2, and 99% correct at 1. Would using a standard logistic link function cause us to underestimate the slope? [I think so, but please correct me if I'm wrong, see below]
Edit: Those who have answered my question so far have expressed that the way in which I set up the problem was unclear. I'm providing the sample below to help clear things up.
library(psyphy)
make.data <- function(zero,one)
{
return(c(rep(0,zero),rep(1,one)))
}
center <- function(x) {return(scale(x,scale=FALSE))}
logit.data <- data.frame(Score=c(make.data(50,50),make.data(50,50),make.data(25,75),make.data(15,85),make.data(1,99)), Difficulty=rep(5:1,each=100))
logit.data$Difficulty2 <- center(logit.data$Difficulty)^2
standard <- glm(Score~center(Difficulty),data=logit.data,family=binomial) #standard link function
standard.2 <- glm(Score~center(Difficulty)+Difficulty2,data=logit.data,family=binomial) #standard link function, but better with a quadradic
revised.link <- glm(Score~center(Difficulty),data=logit.data,family=binomial(mafc.logit(2)))
AIC(base)
AIC(base.2)
AIC(revised.link)
coef(base)
coef(base.2)
coef(revised.link)
#plot
plot(diffs,plogis(coef(standard)[1] +coef(standard)[2]*center(diffs)),xlab="Difficulty",ylab="Pr(Correct)",ylim=c(0,1),col="blue",type="l");abline(.5,0,col="Orange");lines(diffs,plogis(coef(standard.2)[1]+coef(standard.2)[2]*center(diffs)+coef(standard.2)[3]*center(diffs)^2),col="Cyan");lines(diffs,(p2afc(coef(revised.link)[1]+coef(revised.link)[2]*center(diffs))),col="Green");lines(5:1,c(.55,.60,.75,.85,.99),col="Black")
In the above image the orange horizontal line marks 50% correct responses. The jagged black line represents the data supplied to the estimation equation (note the values for 4 and 5 disappear behind the orange 50% marker). The blue line is the equation produced by a standard logistic link. Note that it estimates below 50% accuracy when discrimination is most difficult (5). The cyan line is the standard logistic link with a quadratic term. The green line is a non-standard link that takes into account that the data comes from a 2AFC experiment where performance is very unlikely to fall below 50%. Note that the AIC for a model fit using a non-standard link function is superior to the standard logistic link function. Also note that the slope for the standard equation is less than the slope for the standard equation with the quadratic term (which more accurately reflects the real data). Thus, using a logistic function blindly on 2AFC data does (at least) appear to underestimate the slope.
Is there a problem with my demonstration that means that I am not seeing what I think I am seeing? If I'm correct, then what other consequences (if any) are there of using the generic logistic function with 2AFC data [presumably extensible to NAFC cases]?