2

I've got a dataset where I'm attempting to predict when an individual will develop a particular disease based on a set of biomarkers. I'm able to find a pretty good fitting model, but it has a high degree of heteroskedasticity. However, this heteroskedasticity is expected--it makes sense that the model will have smaller residuals as the individual nears diagnosis. I began thinking about various "fixes," but wasn't sure if I should fix it. Any thoughts on this?

dfife
  • 477
  • 1
  • 3
  • 11
  • 1
    There are a number of ways to address heteroscedasticity. I demonstrate many of them in my answer here: [Alternatives to one-way ANOVA for heteroscedastic data](http://stats.stackexchange.com/a/91881/7290). – gung - Reinstate Monica May 21 '14 at 16:24
  • 1
    Is your dataset cross-sectional? – Sergio May 21 '14 at 16:24
  • @gung--these are great suggestions, but I'm wondering whether they need to be fixed at all. – dfife May 21 '14 at 16:28
  • @Sergio--these data are actually longitudinal. – dfife May 21 '14 at 16:29
  • @gung (again)...I'm rethinking the weighted least squares. It seems to make intuitive sense: we weight more heavily those observations closer to diagnosis than those further away. And if it fixes heteroskedasticity, that's a bonus. – dfife May 21 '14 at 16:46
  • 2
    If your data are longitudinal (multiple measures on multiple patients) we're playing a different ballgame. You need to differentiate between changing residual variance for each individual's data vs diverging individual trends over time (which is common). You can fit a mixed effects model w/ random intercepts, slopes & a correlation b/t them, get predicted trends for each patient & their individual residuals. The key question is: does that variance change over time? – gung - Reinstate Monica May 21 '14 at 17:21
  • Ahh...good point @gung. I hadn't thought of that. I'll take a look at it. Thanks! – dfife May 21 '14 at 19:33
  • Okay...finally getting to the point of looking at it. Although the data are longitudinal, I'm actually only looking at it cross-sectionally (for a good reason). I think I like the idea of weighted least squares--it makes sense to give more weight to observations closer to diagnosis. – dfife May 23 '14 at 13:57
  • Look into gamlss – kjetil b halvorsen May 17 '21 at 02:40

0 Answers0