I have a dataset that contains 100 different patients over 5 year’s period. Every patient is examined each month with regard to particular illness and marked as healthy or ill (0 or 1). Every person appears 60 times in my sample (5 * 12 = 60).
Every month patient provides A = Average blood pressure in that month, B = Average daily exercise hours and C = Average number of Cigarettes smoked in that month.
The layout of the dataset is as follows:
ID (Unique Patient Identifier)
Month (1 to 60)
A (Average blood pressure in that month)
B (Average daily exercise hours)
C (Average number of Cigarettes smoked in that month)
Ill (Yes, No)
I was thinking of using Logistic Regression which uses information from last three months and gives a probability for patient to be flagged as Ill in next 2 months.
My problem is that logistic regression assumes that observations are independent whereas in my case they are obviously not.
What should I do? Should I use something like GEE or GLMM or something else?