I have a dataset with 4025 participants across two time points. I have scored them on a three-point categorical variable (Unlikely, Possible, Probable
) at each time point. I would like to visualize the various patterns of change (e.g. going from Unlikely
at T1 to Possible
at T2 or going from Possible
at T1 to Unlikely
at T2). I would also like some way of representing the number of participants in each of these clusters on the graph (somehow weighing by N
and representing this by line thickness, number of lines etc.).
The data is currently on the form:
id1, id2, variable_t1, variable_t2
1 500 0 1
2 501 1 0
...
Any suggestions for how to do this? I have tried using ggplot2 and geom_line and grouping by id, but this graph looks very messy. I am looking for something more along the lines of a clustergram, but am open to suggestions.
Update: I recently discovered Parallel Sets which is very close to what I would like to achieve. The only downside to this program is that it allows for very little customization of the resulting plot (e.g. rotating plot, adding titles and axis labels, manually adjusting size etc.). Although this is possible with a bit of post processing of the png file that the program can export to. (Now, is there a way of achieving the same with R and ggplot?).
Solution: thanks to a reposting of my question by Tal (of R-bloggers fame) here there is now a solution for this question using R and lattice.