Statistical differences between two hourly patterns

Question

I have 2 groups of subjects, namely:

Subjects that are younger than 67
Subjects that are older than 67

Each subject of each group wears a sensor that estimates the metabolic equivalent of tasks (METs) during a day (this measure represents how active the subject is during the day...it is similar to the energy expenditure).

For each patient I compute the average METs at each hour that means that each patient is represented by a time series with 24 data points.

In the picture you can find the hourly average METS for each group of subjects together with the 95% confidence intervals. This picture represents the hourly pattern of the 2 groups.

Is there a statistical test in order to compare (and emphasize differences between) the 2 hourly patterns?

The question is: How can I show (using a statistical test) that there are significant differences between the hourly pattern of people older than 67 and younger than 67?

In general, you cannot test differences between averages without information on variation around averages. In this case, you want to compare families of curves, which is quite an advanced problem, studied under headings like functional data analysis. There are many other methods such as fitting sinusoids to each individual and looking for patterns in the coefficients. This problem starts easily (you can plot the data) and gets much more difficult very quickly. You're the researcher but a split at 67 seems quite arbitrary to me: I would want to know ages of all patients. — Nick Cox, Nov 04 '15 at 08:59
Hi @NickCox I have information on variation around averages. Those are represented in the 95% confidence intervals that you can see in the picture. I know that significant hourly differences between >67 and <67 are determined as hours with non-overlapping confidence bands, but I was looking for a rigorous statistical test (with p etc..). Regarding the split at 67 it is OK for my problem. — gabboshow, Nov 04 '15 at 09:03
Unless you tell us otherwise the hourly confidence intervals are a dead end here if only because they don't take account of the dependence structure of the time series, e.g. successive hours are not independent; nor are the beginning and end of the 24 cycle independent. Other comments may differ but this is not a textbook problem with a short solution in my view. — Nick Cox, Nov 04 '15 at 09:07
Couldn't you perform a profile analysis using multivariate methods like MANOVA? These also take into account the repeated measures dependencies that Nick mentions. — StatsStudent, Nov 04 '15 at 10:15
Sure. Try Applied Multivariate Statistical Analysis 6th ed. by Johnson and Wichern. — StatsStudent, Nov 04 '15 at 10:40
Not trying to be negative, but how would MANOVA cope with cycles? It's fundamental that 1 am follows midnight. — Nick Cox, Nov 04 '15 at 10:50
Or I could assess statistical differneces hour by hour separately. Of course it won't be the same thing. I was looking for a test that could tell me curve A and curve B are significantly different (p<0.05). — gabboshow, Nov 04 '15 at 10:56
Profile analysis can handle the time components just fine. Check the reference. — StatsStudent, Nov 04 '15 at 11:01

score 3 · Accepted Answer · answered Jan 23 '20 at 04:17

3

This is an old question, but has no accepted answer, so let me offer my own.

Here is some data that, while not being exactly like yours, is close enough for our purposes.

Because the data are non-linear, I think a GAM might work well here. I'll use the mgcv library to first fit a simple gam which uses a smooth for time and an additive effect for age group (here labeled g).

Model Code:

model = gam(y ~ s(t) + g, data = d)

Let's take a look at the predictions.

Model looks ok, maybe the tail ends could be problematic. Let's fit a smooth which varies by group

model = gam(y ~ s(t, by = g), data = d)

Let's take a look at predictions

Ehhh... maybe we needed that additive effect. Finally, let's fit a model which varies by group, but also has an additive effect

model = gam(y ~ s(t, by = g) + g, data = d)

I think that is the best fit we are going to get. I should add that since this data is technically cyclical, we should pass bs = 'cc' to use cyclic cubic regression. The model has a summary functionality which looks a lot like the lm summary, complete with hypothesis tests. The tests for the fixed effects are similar to a linear model, but the null hypothesis for the smooths is a bit more complicated. Gavin Simpson, who is like The GAM Guy so far as I am concerned, has an excellent run down of the gam summary table here.

answered Jan 23 '20 at 04:17

Demetri Pananos

24,380
1
36
94

Interesting, thanks. I don't want to criticize but what can we do now with that? – Ben Jan 23 '20 at 06:17
@Ben I mean, having a model and asking "what can we do with it" is kind of putting the cart before the horse. We should always ask "what do I want to accomplish" and then select the model to help achieve that. In this case, OP wanted to know the affect of being >=69, so this model could be used to estimate that effect. – Demetri Pananos Jan 23 '20 at 15:23
Yes, sure, but how? I mean there are two curves now, how can I use them to get the effect resp. the significance of it? I guess it's not just their difference? – Ben Jan 24 '20 at 07:33
1

@Ben the models can be summarized, much like a linear model object. I mention that in the last paragraph. – Demetri Pananos Jan 24 '20 at 09:01
Alright, didn't see. Thanks! So in dumb speaking, I will look at the summaries and note any differences in significances (or maybe something else, whatever I'm interested in)? – Ben Jan 24 '20 at 09:14
That's more or less correct. The link provided to Gavin Simpson's answer to a different post is better than I could ever provide. – Demetri Pananos Jan 24 '20 at 09:28

score 0 · Answer 2 · answered Apr 18 '16 at 08:47

0

Why don't you split into few time bands of 2h duration, say 00:04, 04:06, etc. and for each band you apply a two-sample t-test for each band. You didn't mention how many patients for each group, but if they are only few a t-test should do the work. Then you will get a p-value and a confidence interval for each time band. You can reject the null hypothesis only if you can reject for all bands and use the largest p-value as p-value for the whole population.

answered Apr 18 '16 at 08:47

Davide C

409
1
3
9

Doing many tests risks a type one error. A model controlling for age would be more appropriate in this case. – Demetri Pananos Jan 23 '20 at 03:55

Statistical differences between two hourly patterns

2 Answers2

Linked