I suspect that this is a - if not trivial - common question that betrays me as a newbie. Anyhow, here goes...
I have data that reflects behaviours of different demographic groups. There are 10 demographic groups, they are predetermined, and individuals allocated to them.
They have a strong correlation with age - roughly each group has an average age 5 years greater than the previous; but this varies between 0 and 12; ie some neighbouring groups have similar mean age.
I am trying to determine which demographic group has the highest survival rate. However, my problem is that I only know the group that individuals are in now. The analysis shows that the older groups have much longer expected survival. But this is obvious, and driven by the fact that those with longer survival are likely to be older!
So throwing statistical rigour to the wind, I looked at the starting age, minus the median survival. This gave me an estimated start age of between 45-65 for the 7 older groups. Which seems a reasonable interpretation.
How can I apply a bit more rigour to illustrate that these people are likely to come from groups 4,5,6 (or not)?
\begin{array} {|r|r|} \hline group & age(mean) & diff & mediansurvival & startage?\\ \hline 1 & 30.43 & & 2.08 & 28.34\\ \hline 2 & 38.58 & 8.15 & 2.36 & 36.21\\ \hline 3 & 43.57 & 4.99 & 2.36 & 41.21\\ \hline 4 & 52.86 & 9.29 & 4.32 & 48.54 \\ \hline 5 & 53.34 & 0.48 & 5.38 & 47.96\\ \hline 6 & 57.94 & 4.6 & 6.14 & 51.80\\ \hline 7 & 70.78 & 12.84 & 15.28 & 55.50\\ \hline 8 & 76.27 & 5.49 & 17.36 & 58.91\\ \hline 9 & 79.71 & 3.44 & 21.49 & 58.22\\ \hline 10 & 80.23 & 0.51 & 17.03 & 63.20\\ \hline \end{array}
ps - I threw statistical rigour to the wind because I was out of my depth. I've come here to try and right that wrong. :)
pps - for anyone who chances across this question, I think there are clues at Can an independent variable change during survival analysis? The key term for research seems to be time dependent covariates.