I am doing a secondary analysis of data from a local trial (N=450, mean follow-up longer than 10 years). I am specifically looking at a secondary outcome (diagnosis of hypertension) after 10 years from the start of the trial.
My problem is that the trial itself only lasted 2 years and, for my analysis (which is using a 10-year follow-up time), I have complete data only on approximately 50% of the sample (i.e. there are no clinical records for 50% of the original sample due to various reasons such as drop-off, death, etc). Considering the high proportion of missing data (without even testing the MAR assumption), I am not considering the multiple imputation as an option. Therefore, the two possibilities that seem worth exploring are i) still run the analysis using complete-case analysis ii) applying the inverse probability weighting.
Because I suspect that the lost at follow-up has no systematic reason (the trial was about the use of a coenzyme which hardly might be associated to the likelihood of dropping off) I would be inclined to analyse the data with complete-case analysis. What I initially did was to test differences in the distribution of study covariates between the population with data at follow-up and the population with missing data. I found no difference between the two populations. I came across this approach in old papers but I am not sure this is robust enough (also, I remember I have read these papers but cannot find them now for a closer look at the issue). Also, I am not sure whether the comparison should have been conducted the way I did (population with complete data vs population with missing data) or should have been between the initial population and the population with complete data - under the assumption that the population with complete data is still representative. Can you help me clarifying this?
However, I suspect this approach is not sufficient. I read about the Little test (which doesn't seem to be used much anyway as it is not a gold standard), but I also read that you can test this assumption by creating dummy variables for missing data and employing probit/logistic regression models using each covariate at time as independent variable. However, I am not sure I understood it correctly as I could not find a practical example.
Any help would be appreciated. Also, references would be very welcome.
Thanks