0

I have a binary classification model predicting sports result with features covering 10 years worth of matches. However, how would I feed new tracking data that is only limited to the last 3 years. Would I assign a NaN or zero value for matches that do not have this data? Will it likely create overfitting problems? I'm only using XGBoost now.

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
Mike
  • 13
  • 3
  • Possible duplicate of [How do you deal with "nested" variables in a regression model?](https://stats.stackexchange.com/questions/372257/how-do-you-deal-with-nested-variables-in-a-regression-model) – kjetil b halvorsen Jan 10 '19 at 09:31
  • 2
    Is this a 'nested' variable in the sense of the proposed duplicate? This sounds more like a missing data situation where the tracking data (whatever that is) could exist from more than 3 years ago too, but it just was not measured. The duplicate would apply if eg. a rule change introduced a new element to the game 3 years ago – Juho Kokkala Jan 10 '19 at 18:12
  • 1
    Could you give us some more detail? Which "binary classification model" are you using? Logistic regression? Something else? How big sample? That tracking variable (what is it measuring?) was it defined more than 3 years ago, only not measured? If so, maybe you can build a prediction model for that variable using data from last 3 years, and using those predictions in the model? – kjetil b halvorsen Jan 10 '19 at 22:45
  • @kjetilbhalvorsen XGBoost for now, logit doesn't give high performance unless extremely bagged and properly tuned (expensive). XGBoost is just way faster with barely any tuning. Sample is 12500 rows, tracking variable is measuring lots of aspects of the game (e.g. accurate speed), which was measured with new installed IoT. If I use only for the last 3 years (about 3900 rows), will it create any overfitting issues? Also let's say I augment my sample size to 50,000 rows, and use 3,900 rows for that tracking variable. Will this increase overfitting even more? – Mike Jan 11 '19 at 17:35

0 Answers0