6

Let's say I am analyzing behavioral patterns over the course of an hour. I have recorded three different behaviors and the time stamps (start end) they occurred at. Something like:

yawning       stretching    whispering
2:21-2:22     3:31-3:33     1:21-1:30
3:42-3:45     8:23-8:59     9:27-9:33
9:20-925      9:34-9:44     14:04-14:07
14:45-14:32   15:01-15:06   18:00-18:22
.
.
.
45:40-45-43   45:23-45:30   44:19-44:44

Is there a statistical method for determining if certain behaviors correlate or cluster around certain time periods/to each other? For instance maybe I want to know if these three (or just 2) behaviors are found in close proximity to one another or maybe I want to know if these which behaviors are not in close proximity to each other. Which of the three behaviors tend to cluster together?

I don't even know what field of stats I'm looking at with this.

chl
  • 50,972
  • 18
  • 205
  • 364
Tyler Rinker
  • 680
  • 4
  • 22

2 Answers2

2

I'm presuming the rows in the way you've presented data don't necessarily mean anything ie there is no necessary link between the third yawn, third whisper, and third stretch. What you are interested in with the third yawn is "how close is this in time to any whisper - not just the third whisper".

For each yawn I would calculate the time to the nearest whisper and the time to the nearest stretch. And similarly for each whisper (calculate time to nearest stretch and time to nearest yawn); and for each stretch. Then I would calculate some kind of indicator statistics of the proximity of each behaviour to each of the other two - something like the trimmed mean distance in time to the nearest behaviour of the other type. (There will be six of these indicators, not just three, because the average time from a yawn to its nearest stretch is not the same as the average time from a stretch to its nearest yawn.)

This already will give you some sense of which behaviours are clustered together, but you also should check that this isn't plausibly due just to chance.

To check that, I would create simulated data generated by a model under the null hypothesis of no relation. Doing this would require generating data for each behaviour's time from a plausible null model, probably based on resampling the times between each event (eg between each yawn) to create a new set of time stamps for hypothetical null model events. Then calculate the same indicator statistic for this null model and compare to the indicator from your genuine data. By repeating this simulation a number of times, you could find out whether the indicator from your data is sufficiently different from the null model's simulated data (smaller average time from each yawn to the nearest stretch, for example) to count as statistically significant evidence against your null hypothesis.

Peter Ellis
  • 16,522
  • 1
  • 44
  • 82
1

I've similar problem my solution was naive - create new variables representing each minute of the day if given activite took place then mark that minute by 1 :

yawning     ->  yawning  
...             ...
2:21-2:22       2:21 1
3:42-3:45       2:22 1
9:20-925        2:23 0
14:45-14:32     . 
.               3:42 1
.               3:43 1
.               .
45:40-45-43     .

so we have now new time series, which we could analyse by more standard methods, this worked really good, I've tested in on simulated data on below logit model, where x is 0-1 variable, z - "driving" variable : p(x(t+1)=1|p(x)=1)=exp(x+B1*z)/denominator the same for y, the closer was B2 to B1 the better dependence between x and y measured by Hamming distance.

Methodological problem : what to do if total time of activity_11 during the day is 10 times higher then that of activity_2 ? Sometimes it doesn't matter, sometimes some weighted distance is needed - in the case when we want to build distance matrix.

Qbik
  • 1,457
  • 2
  • 17
  • 27