I'm presuming the rows in the way you've presented data don't necessarily mean anything ie there is no necessary link between the third yawn, third whisper, and third stretch. What you are interested in with the third yawn is "how close is this in time to any whisper - not just the third whisper".
For each yawn I would calculate the time to the nearest whisper and the time to the nearest stretch. And similarly for each whisper (calculate time to nearest stretch and time to nearest yawn); and for each stretch. Then I would calculate some kind of indicator statistics of the proximity of each behaviour to each of the other two - something like the trimmed mean distance in time to the nearest behaviour of the other type. (There will be six of these indicators, not just three, because the average time from a yawn to its nearest stretch is not the same as the average time from a stretch to its nearest yawn.)
This already will give you some sense of which behaviours are clustered together, but you also should check that this isn't plausibly due just to chance.
To check that, I would create simulated data generated by a model under the null hypothesis of no relation. Doing this would require generating data for each behaviour's time from a plausible null model, probably based on resampling the times between each event (eg between each yawn) to create a new set of time stamps for hypothetical null model events. Then calculate the same indicator statistic for this null model and compare to the indicator from your genuine data. By repeating this simulation a number of times, you could find out whether the indicator from your data is sufficiently different from the null model's simulated data (smaller average time from each yawn to the nearest stretch, for example) to count as statistically significant evidence against your null hypothesis.