2

I am trying to assess whether a visit to a doctor being "on time", "too early" or "too late" has an impact on the total days of sickness.

The time passed between the start of the symptoms and the visit with the doctor (expressed in weeks) could be a possible confounding factor. In particular: the more time passes between the start of the symptoms and the visit, the more likely is that the visit is considered "too late" rather than "on time".

My assumption is that "on time" visits lead to a shorter duration of the sickness and therefore it is important to decide when to see a patient (as early as possible is different from "on time").

How would you approach this problem? I could run a survival analysis estimating what is the probability of recovering over the time for "on time" visits and "too late" visits, but I would need to stratify for weeks between the start of the symptoms and the visit. This will obscure the impact of being "on time".

Another approach would be to create a Cox Proportional-Hazards Model and have a look at the coefficients... Do you have any tip on how to correctly address this problem?

Regards and thanks

gabboshow
  • 641
  • 6
  • 17
  • 1
    Is there some reason why you can't just use the actual number of days between developing symptoms and seeing the doctor as a continuous predictor in a Cox model? Binning into a few groups is generally [not a good idea](https://stats.stackexchange.com/q/68834/28500). – EdM Sep 24 '20 at 19:57
  • Hi @EdM, it is more a practical reason: usually appointments are planned taking into considerations the weeks... but yes, for analysis purposes it is good to use the days, thanks for the suggestion! – gabboshow Sep 27 '20 at 20:48

1 Answers1

1

It's best to try to use all the information available in a continuous predictor. So if you know the actual number of days between developing symptoms and seeing the doctor, use that actual number of days rather than arbitrarily breaking that time down into categories like "on time", "too early" or "too late."

Particularly as you don't expect a simple linear relationship between that continuous predictor and outcome, perform a flexible fit that allows the shape of that relationship to be determined from the data. Restricted cubic splines are a particularly useful way to incorporate a continuous predictor into a model when you don't have a theoretical basis for any particular shape. I use the rcs() function from the rms package in R for such modeling. Then you can determine whether there is a substantial non-linearity in the relationship and plot its shape. If there is a clear minimum, the data will tell you what "on time" means in practice.

Two more things to consider with your colleagues as you pursue this study. First, if you perform a survival analysis with date of recovery as the end point, that will have to be carefully defined. (I assume that time=0 for each patient will be the date of symptom onset.) Second, when interpreting the results: what about patients who never visit the doctor, despite symptoms?

EdM
  • 57,766
  • 7
  • 66
  • 187
  • Thanks for your answer and your reflection points! Very appreciated! About the patients who never visit the doctor: usually these patients are sick for a shorter time as the problems are solved without the need of a doctor (e.g. flue etc.). However, these patients are not taken into account as the study aims at analyzing the differences between patients seen on time and not (I simplified the problem by merging the too early and too late categories into not on time) – gabboshow Oct 05 '20 at 20:33
  • regarding rcs()... do you mean something like this: fit – gabboshow Oct 05 '20 at 20:39
  • @gabboshow `lrm` is for logistic models; use `cph` for survival models in `rms`. You don't use a categorical `timing` variable as a predictor at all, but let the `rcs` term handle how outcome is affected by elapsed time between symptoms and visit, as a continuous predictor. I assume that "recovery" is the event for the survival model. When you plot that relationship (e.g. with the Predict function in rms), your hypothesis is that at very short and long elapsed times the "hazard" of recovery is low, but at intermediate times it is "high." That will _show directly_ what's too early or late. – EdM Oct 05 '20 at 21:40