5

I have a dataset with four within-subject conditions. Participants gave binary answers to 8 questions in each condition, and these were averaged to form a proportion for each condition. I would like to compare the mean proportions of people choosing a particular answer between conditions. For example 80% chose a particular answer in the first condition, 60% in the second 40% in the third 20% in the fourth.

Here's a graph of the results: plot Mean probability of guessing RED drops as no. GREEN increases. I want to see whether the drop is significant.

  • How can I somehow summarize whether this drop is significant?
Jeromy Anglim
  • 42,044
  • 23
  • 146
  • 250
upabove
  • 2,657
  • 10
  • 30
  • 37
  • It sounds like you have one sample of people who answered a question 4 times in different circumstances. Why then you mentioned "unequal sample size"? What did you mean? – ttnphns Aug 18 '11 at 09:39
  • @Yes I was trying to mention something one step further so I edited it out. Yes basically its one sample with 4 within subject conidtions. I would like to see if participants chose a particular answer significantly less in each 4 conditions. – upabove Aug 18 '11 at 09:55
  • So in each condition, each participant completed 8 binary questions and then you use those 8 questions to form a proportion for each participant, and then you are comparing mean proportions across conditions of the design? – Jeromy Anglim Aug 18 '11 at 11:51
  • @Jeromy Anglim yes – upabove Aug 18 '11 at 11:54
  • 1
    @Daniel Okay. I've tried to update your question to make this clearer; feel free to make it even clearer; at first I thought you were comparing individual items; It's a very different question if you are comparing mean proportions. – Jeromy Anglim Aug 18 '11 at 12:00

2 Answers2

5

Here's a simple rule: when you can compute a proportion within an individual, don't. Dixon (2008) and Jaeger (2008) both demonstrate that this can lead to erroneous inferences. The proper approach to analysis of repeated binary data is to use an inferential approach that treats the data as binary. Here is code (for R) to grab the latest version of the ez package and compute likelihood ratios for your design's effects (and, by the way, treating your numeric variables as continuous but possibly non-linear via gam, thereby enhancing power):

#install CRAN ez
install.packages('ez')
library(ez)

#get ready to retrieve Dev version of ez
install.packages('RCurl')

#retrieve ezDev
source('https://raw.github.com/mike-lawrence/ez/master/R/ezDev.R')

#load Dev version of ez's functions into memory
ezDev()

#now run the model
my_mix = ezMixed(
    data = my_data
    , dv = .(choice_is_red)
    , random = .(participant)
    , fixed = .(num_green,num_red,message)
)
print(my_mix$summary)
#In the summary, the bits column represents the computed evidence 
#associated with each effect, on the log-base-2 (aka "bits") scale.
#The absolute value represents the strength of evidence while the sign 
#represents whether the effect (+) or its null (-) is supported.

#visualize the 3-way with CIs that eliminate between-participants variance
preds = ezPredict(
    fit = my_mix$models$'num_green:num_red:message'$unrestricted
)
p = ezPlot2(
    predictions = preds
    , x = .(num_green)
    , split = .(num_red)
    , row = .(message)
    , x_lab = 'No. green'
    , split_lab = 'No. red'
    , y_lab = 'Likelihood of choosing red (log-odds)'
)
print(p$plot)

This code assumes that your data is stored in the object my_data, which has the following structure (order of columns is unimportant, just that they're all there and that it is the raw question-by-question info for each participant):

participant question num_red num_green message choice_is_red
sub1        1        3       2         red     0
sub1        2        4       4         red     1
sub1        3        1       3         blue    0
...
sub2        1        2       1         blue    0
sub2        2        1       2         red     1
...
Mike Lawrence
  • 12,691
  • 8
  • 40
  • 65
  • thanks! So to get this clear: dv = number of participants choosing RED for the particular question (but how do I do this for all 8 question?), random = subject id, fixed= number of green (1,2,3,4) number of red (5,7,9), message received (RED or BLUE). The data looks the following: participants answer 8 questions, 2 questions each for the 4 conditions. So no.GREEN = 1 for question 1 and 2, no.GREEN=2 for question 3,4, etc. Here I want to see whether people choose RED significantly less as no.GREEN increases. THen a second question would be whether there's a difference between no.RED 5,7,9 – upabove Aug 18 '11 at 13:40
  • See the updated answer that describes the expected data format. – Mike Lawrence Aug 18 '11 at 15:43
  • I've updated the datafile: http://dl.dropbox.com/u/22681355/datafile2.xls Basically what differs is that num_red can only be 5, 7 and 9 and participants gave responses as RED / BLUE but in the graph I've made it mean choice from 0(blue) to 1 (red). How does this modify the code above? – upabove Aug 20 '11 at 07:56
  • also the command: #retrieve ezDev source('https://raw.github.com/mike-lawrence/ez/master/R/ezDev.R') doesn't seem to work: Error in file(file, "r", encoding = encoding) : cannot open the connection In addition: Warning message: In file(file, "r", encoding = encoding) – upabove Aug 20 '11 at 08:04
  • You have to run the source() command while connected to the internet. – Mike Lawrence Aug 20 '11 at 14:01
  • I was connected to the internet since I managed to install the package :) – upabove Aug 20 '11 at 17:44
1

[Note: The following reply pertains to older drafting of the question, "How to test for significant differences in proportions across four within subject conditions?" Later the question was cleared up in that the proportions are actually averaged across various variables.]

You might use McNemar's test which is a repeated-measures comparison of proportions. Classic form of the test is for binary response and is therefore suits you. The test is pairwise comparison: only two conditions at a time. The 2x2 frequency table (Yes, No one condition vs Yes, No another condition) is formed and the Ho that the table is symmetric about the diagonal is tested.

When variables are dichotomous (like yours) McNemar test is equivalent to Sign test so you could apply that either.

Check also Cochran Q test as an extension of McNemar's test from pairwise to omnibus comparison

ttnphns
  • 51,648
  • 40
  • 253
  • 462