How to analyze the relationship between two variables in a time sequence

Question

I have a question about how to analyze the relationship between two variables in a time sequence.

It is an eye-tracking experiment. I recruited two separate groups of Mandarin speakers to describe pictures with two characters, one group with native language Mandarin (L1) and one with second language German (L2; Language is independent variable). I would like to measure the fixation distribution/proportion on the two characters along the time when they plan their descriptions. So the fixation proportion is the dependent variable. Besides，I worked out a dataset included time information as below (The whole time from the picture onset was divided into separate time bins, each time bin 40 ms).

Stimulus   Participant   Areas         time_bin     Language
1          M1             character1     1               1
1          M1             character1     2               1
1          M1             character1     3               1
1          M1             character1     4               1
1          M1             character2     5               1
1          M1             character2     6               1
1          M1             character2     7               1
1          M1             character2     8               1

1          G1             character1     1               2
1          G1             character1     2               2
1          G1             character1     3               2
1          G1             character2     4               2
...

The question is (1) within a speaking group, the relationship between time bin and fixation distribution and (2) between groups, the relationship between language and fixation distribution along the time. Still, it would be good to treat the stimulus as a random effect, because both groups described the same stimuli.

I do not know how to deal with analysis with time sequences. I have searched quite a lot and find there are different methods such as growth curve analysis or fixed-time effect testing and so on. But I am not sure which method would be good.

Do you have some ideas or suggestions about how to solve the problem?

score 0 · Accepted Answer · 2019-06-23T16:58:08.550

I'm going to be careful with this answer, since I think some things are not entirely clear. Also, bear in mind that there are multiple perspectives to data analyses, thus it is perfectly normal that there is more than one approach to analysing your data.

I will assume the below:

That pictures are the stimulus.
That there are several pictures shown to each participant.
Since you referred to growth curve analysis (GCA), that time is treated as a temporal continuous variable -- GCA is just a multilevel model with higher order polynomials (consider Mirman's book for a really basic and easy to read overview http://www.danmirman.org/gca).
That subjects are nested within language group, as each subject can only be within one language group (see more about nesting here: https://bbolker.github.io/mixedmodels-misc/glmmFAQ.html#nested-or-crossed and here: https://www.theanalysisfactor.com/the-difference-between-crossed-and-nested-factors/)

Considering these assumptions, I will follow the GCA approach. To answer 1 and 2, I think one plausible model could be:

Proportion ~ time_bin*Language + (1|Participant/Language) + (1|Stimulus)

However, the above assumes that the relationship between time and proportion is linear. People using GCA typically assume that time has a non-linear relationship with the outcome. If you assume that the relationship between time and proportion is non-linear, for example quadratic or cubic, you will need to incorporate these higher-order terms in your model. An added complexity is that polynomials are correlated, therefore the estimated parameters will be interdependent. You may want to create orthogonalized polynomials to avoid the potential problem of collinearity between the different time terms. The code below will create independent (orthogonal) polynomial time terms (linear, quadratic, and cubic).

t <- poly((unique(data$time_bin)), 3)
data[,paste("ot", 1:3, sep="")] <- t[data$time_bin, 1:3]

I do not know what kind of relationship is justified by theory/your research questions. To change the number of orthogonal terms generated you can change 3 to a larger or smaller number. For example, changing to 2 will create two additional columns in your dataframe with linear and quadratic terms of time, whereas changing it 4 four will add cubic and quartic. Be careful of over-fitting though.

After creating these polynomial terms, your model may look something like this:

Proportion ~ (ot1+ot2+ot3)*Language + (1|Participant/Language) + (1|Stimulus)

I have ignored something up to now, and that is that your outcome is a proportion (if I understood your question correctly). If that is the case, I would suggest that you consider running a binomial generalised linear mixed-effects model with weights or a beta regression (see: Fitting a binomial GLMM (glmer) to a response variable that is a proportion or fraction and How to apply binomial GLMM (glmer) to percentages rather than yes-no counts?).

Another thing to consider is the random effects structure. The candidate models above are considering variation in the intercepts, but you may also want to consider variation in the slopes. Once again, be careful of over-fitting (I would suggest Bates et al. https://arxiv.org/abs/1506.04967 and Matuschek et al. 2017 https://www.sciencedirect.com/science/article/pii/S0749596X17300013).

Hello @Michael R! Thanks for your answer! It's very helpful! I still have two questions: 1. I just tried to following code and recieved a "subscription out of bound": Gaze_600ms_proportion_compare_Poly — 内卡厨房Neckar Kitchen, Jun 24 '19 at 07:24
And my second question is about the type of dependent variable in GCA. You mentioned that the "proportion" is the dependent variable, which is linear. Is it possible to directly have the "areas" in the sample data as (categorical variable) in a GCA and then have a logit regression? Thanks again for your generous help! — 内卡厨房Neckar Kitchen, Jun 24 '19 at 07:32
Hi @内卡厨房NeckarKitchen. The code should work; copy and paste what I wrote; change only "data" and "time_bin". Where I wrote data, specify the name of your data frame. time_bin should be replaced with the name of your time variable in your data frame. If your outcome is continuous and approximately normal, you can run a linear mixed-effects model, if it is a proportion or binary, you can run a generalised linear mixed-effects model -- GCA is just a regression with polynomial terms; both outcome types are fine. In your question you mentioned proportion; see the links provided in my answer. — , Jun 24 '19 at 09:01

How to analyze the relationship between two variables in a time sequence

1 Answers1