2

In a between-subjects experiment, we have 2x2 conditions and several 7-point Likert dependent variables.

What would be the best way to visualize this data?

I am a statistics newbie, so please forgive my simple questions. I have, of course, searched the web and this forum. But most posts I found seem to be dealing with binary dependent variables. In contrast, we have, I think, binary independent variables.

Is it OK to treat the Likert answers as a continuous variable? I seem to recall some discussion (I forget where - maybe a textbook, maybe a paper) whether or not this is OK. It does not seem to be a categorial variable to me. Maybe it's an ordered categorial variable?

But, more importantly, what would you suggest to plot the data?

Simple box-plots (with whiskers) don't seem to work really well, because in almost all dependent variables we had a few participants answering with the lowest, and some with the highest scores.

Also, it would be nice if one could show the answers for one dependent variable for all 4 conditions in one single plot in an intuitive way.

chl
  • 50,972
  • 18
  • 205
  • 364
Gabriel
  • 131
  • 2
  • 2
    There are several threads related to Likert scales on this site, in particular the way they can be considered or analysed (as continuous or discrete ordered variables), depending on the question at hand. Howeber, you should defintely think about what you want to display: averaged values (in some meaningful way) across conditions or individual response distribution? – chl Oct 03 '20 at 18:02
  • How many data points do you have in each of your four conditions? Can you post an example of your data (or random data of the size and structure you have)? – Stephan Kolassa Oct 03 '20 at 18:41
  • 1
    The thread just cited is more relevant than its title implies. Each pair of binary predictors defines a composite variable with 4 conditions and each 7-point scale is another variable, so many two-way displays (bar charts, mosaic plots, heat maps, etc.) are immediately available. – Nick Cox Oct 03 '20 at 19:05
  • Thanks a lot for the comments! – Gabriel Oct 05 '20 at 16:02
  • Thanks a lot for the comments! We've got around 30 data points per condition. I was thinking of kind of a density or distribution plot per condition, unless you think other plots might provide more insights. I would like to spread the two binary, independent variables along the x and y axis. So, maybe the tab plot will help. The heat map seems a bit too much aggregation: AFAIK, it can show only the mean (or median). And, yes, thanks lot, the link is helpful! it does give me some ideas. – Gabriel Oct 05 '20 at 17:04

1 Answers1

1

As Nick Cox writes, it makes the most sense to facet your plot by the four conditions. Within each facet, you can do many things. Boxplots are one possibility, but they really compress your data too much, as you write.

Likert scales are indeed ordered categorical scales. You have only seven possible outcomes. This to me suggests a simple histogram or barplot with seven bars, so you would have a plot with four such histograms. Just make sure the vertical axes are of the same length so the plots are comparable. You could also add a vertical line to indicate some measure of central tendency - the mean if it makes sense (i.e., if you can meaningfull add your responses), or the median. Possibly add horizontal lines to indicate spreads, like the first and third quartile - or standard errors/confidence intervals for the central tendency.

Here is an example using simulated data in R. The vertical dashed lines indicate the means, the horizontal red lines span the first and third quartile.

plot

R code:

n_per_condition <- 100
set.seed(1)
dataset <- data.frame(condition_1=rep(c("A","B"),each=2*n_per_condition),
    condition_2=rep(c("X","Y"),times=2*n_per_condition),
    response=factor(sample(1:7,4*n_per_condition,replace=TRUE)))

plot_data <- function(condition_1,condition_2,ylim=c(0,30)) {
    index <- dataset$condition_1==condition_1 & dataset$condition_2==condition_2
    plot(c(0.5,7.5),ylim,type="n",xlab="",ylab="",xaxt="n",
        main=paste0("(",condition_1,",",condition_2,")"))
    axis(1,1:7)
    rect(xleft=-0.4+(1:7),
        ybottom=rep(0,7),
        xright=0.4+(1:7),
        ytop=as.numeric(table(dataset[index,"response"])),
        col="grey")
    abline(v=mean(as.numeric(as.character(dataset[index,"response"]))),lty=2,lwd=2)
    lines(quantile(as.numeric(as.character(dataset[index,"response"])),c(0.25,0.75)),
        rep(ylim[2],2),col="red",lwd=2)
}

opar <- par(mfrow=c(2,2),las=1,mai=c(.5,.5,.5,.1))
    plot_data("A","X")
    plot_data("B","X")
    plot_data("A","Y")
    plot_data("B","Y")
par(opar)
Stephan Kolassa
  • 95,027
  • 13
  • 197
  • 357