2

I have many customers. Each generates his own distribution of shopping times. Each customer thus has their own empirical distribution of shopping times.

If I want to get a sense of the average within-person variation (distribution, really) in trip times, how might I go about doing this? I don't want to just look at the aggregate distribution of trip times since that could be simply driven by between-person heterogeneity in shopping times. Ex. half the people shop at one time and half shop at another time would have no within-person variation, so the total distribution is wrong. Also the distribution of trip times might vary systematically with how often they shop, so this would overweight those people (when I don't want to in this case.)

Additionally, I don't want to use some summary statistic like variance for each customer and the plot the distribution of those. Is there something richer that can be done in these panel type situations?

wolfsatthedoor
  • 771
  • 1
  • 7
  • 21
  • Can you tell us what is the purpose? Then maybe we can propse something! – kjetil b halvorsen Dec 04 '14 at 15:32
  • I'm trying to demonstrate the existence of meaningful within-person variation in trip times. I also would like to be able to show, "on average" in some sense, what that variation looks like. Hence my problem with taking everyone's within-person variation and plotting the distribution of those. – wolfsatthedoor Dec 04 '14 at 15:35
  • 2
    The solution at http://stats.stackexchange.com/questions/13875/boxplot-for-several-distributions (perhaps sorted by median shopping time) would work for up to a few hundred customers. A slight variation of it--where the individual plots are not shown but their key points are connected into curves--might fit the bill. How many customers do you want to visualize at once? – whuber Dec 04 '14 at 15:45
  • Ah that's a great idea Whuber. I could randomly draw maybe a few hundred. I actually have about 150,000 unique customers though. – wolfsatthedoor Dec 04 '14 at 15:50
  • @whuber If the data are categorical (weekdays) or (hour of the day), this method doesn't work that well since someone with a lot of sundays (7) and (1) would have the most apparent variation, even though someone with Monday(1) and Wednesday(3) trips only has more. – wolfsatthedoor Dec 05 '14 at 21:05
  • I'm looking into ways to visualize periodicity now, but I appreciate any knowledge specific to this particular question. – wolfsatthedoor Dec 05 '14 at 21:10
  • 1
    You are referring to complications (the weekdays) that you have neither mentioned nor explained in your question. Perhaps you should edit the question if those things matter. – whuber Dec 05 '14 at 21:47
  • I ended up asking the question on stack differently..http://stackoverflow.com/questions/27325044/visualizing-time-series-in-spirals-using-r-or-python – wolfsatthedoor Dec 05 '14 at 21:49

0 Answers0