I have a database with N firms, and observations for several variables (both numeric and binary) for each firm per year.
As an example:
df <- data.frame(
"Year" = c (2010,2010,2010,2011,2011,2011,2012,2012,2012)
"Firm" = c ("A","B","C","A","B","C","A","B","C")
"Holding" = c (TRUE,FALSE,FALSE,FALSE,TRUE,FALSE,TRUE,TRUE,FALSE)
"Male CEO" = c (TRUE,TRUE,TRUE,TRUE,FALSE,TRUE,TRUE,FALSE,FALSE))
I would like to check for an association between explanatory variables in time T (e.g. Revenue 2010, Holding 2010) and a dependent variable in T+1 (e.g. Male CEO 2011), of course not only on two years but for all the years on my sample. I am going to do it with a logistic regression.
My problem is not which model to use, but how to tell my model to take the dependent variable the year after the one of dependent variables. The way I modeled it right now it takes all the observations, no matter the year.
Do you have any suggestions on how to approach this problem?
Thanks