0

I'm trying to run regression analysis on a dataset that features a pair of continuous variables that are collected at a certain time (in days). Whilst the data should be collected at a specific time, due to various restraints there can be a considerable difference between the intended collection time and the actual collection time, which has in turn means that I have less trust in the data (later means less efficacy).

What could be some good ways to account for this variability, as it's on an observation-by-observation basis?

Off the top of my head I was thinking of scaling the variable by an appropriate distribution that relates to the efficacy change, or just by the number of days itself, but I'm no statistician and could do with some ideas!

Thanks, Dan

Dan Adams
  • 1
  • 1
  • Less trust in a data point means more uncertainty in the value of that point. If you have software that can perform weighted regression, the weight for each data point would be inversely proportional to actual collection time - that is, a small value for collection time would mean a larger weight in the regression. – James Phillips Dec 03 '19 at 15:32
  • That's kind of what I was thinking by transforming the data myself, as I don't think the software that I'm using can weight individual instances of the same column differently. – Dan Adams Dec 03 '19 at 15:33

0 Answers0