0

I saw this, this and this questions, but maybe a new alternative has emerged so far.

I need to model a weekly time series that counts the number of enrolled students in a given school and grade. I have 5 years of historical data. The problem is that in general, 70% of my data is 0. Also, I need to apply it for 20 schools and 17 grades (so, the procedure, hopefully, should find the optimal number of parameters automatically).

Is there a way to do it? What I have tested so far:

  • Prophet and ARIMA provide negative values for the predict (and using y<0 -> y = 0 as a 'hack', does not help at all, because it increases the total number of enrolled students;

  • Croston's method is so smooth that does not detect any seasonality.

  • Tscount package in R is a possible approach, but my solution needs to be in Python (so I haven't tested).

  • A Poisson autoregressive (or a ZIP autoregressive) is an option (and the one that I am thinking to use). I haven't found any fully implemented package in Python, but I can reproduce this blog post.

  • LSTM with ReLU activation function is an option but I haven't tested it so far. The problem with this methodology is that although will not produce negative values, it will forecast real values (not integers).

Do you guys have any suggestions?

  • I need a methodology that is fast to estimate because I have 340 scenarios in total.
  • I will implement in Python.
  • Forecast integer values.
  • I am planning to forecast 4 steps ahead (or to forecast 1 step ahead as the last resource).
  • If I could include covariates would be great too.

This is my time series:

enter image description here

You can note that there is a 'long' period that there are 0 enrollments.

Two other pieces of information:

  • When I checked out my ACF plot, it detected lags 1, 2, 5, 6 (early lags) and later lags (51 and 52) as important (so, there is seasonality)

  • My data is not a white noise

My data itself:

from pandas import Timestamp
dd = pd.DataFrame.from_dict({'y': {0: 20, 1: 2, 2: 0, 3: 0, 4: 0, 5: 13, 6: 15, 7: 0, 8: 1, 9: 1, 10: 0, 11: 9, 12: 2, 13: 4, 14: 0, 15: 0, 16: 0, 17: 0, 18: 0, 19: 0, 20: 0, 21: 0, 22: 0, 23: 0, 24: 0, 25: 0, 26: 0, 27: 0, 28: 0, 29: 0, 30: 0, 31: 0, 32: 0, 33: 0, 34: 0, 35: 0, 36: 0, 37: 0, 38: 0, 39: 0, 40: 0, 41: 0, 42: 0, 43: 0, 44: 0, 45: 0, 46: 0, 47: 0, 48: 0, 49: 0, 50: 0, 51: 25, 52: 0, 53: 1, 54: 5, 55: 4, 56: 9, 57: 3, 58: 9, 59: 1, 60: 4, 61: 1, 62: 6, 63: 1, 64: 8, 65: 3, 66: 4, 67: 2, 68: 1, 69: 2, 70: 0, 71: 0, 72: 0, 73: 0, 74: 0, 75: 0, 76: 0, 77: 0, 78: 0, 79: 0, 80: 0, 81: 0, 82: 0, 83: 0, 84: 0, 85: 0, 86: 0, 87: 0, 88: 0, 89: 0, 90: 0, 91: 0, 92: 0, 93: 0, 94: 0, 95: 0, 96: 0, 97: 0, 98: 0, 99: 11, 100: 1, 101: 1, 102: 2, 103: 0, 104: 4, 105: 0, 106: 1, 107: 3, 108: 3, 109: 3, 110: 1, 111: 0, 112: 0, 113: 2, 114: 14, 115: 6, 116: 3, 117: 3, 118: 1, 119: 0, 120: 0, 121: 0, 122: 0, 123: 0, 124: 0, 125: 0, 126: 0, 127: 0, 128: 0, 129: 0, 130: 0, 131: 0, 132: 0, 133: 0, 134: 0, 135: 0, 136: 0, 137: 0, 138: 0, 139: 0, 140: 0, 141: 0, 142: 0, 143: 0, 144: 0, 145: 0, 146: 0, 147: 0, 148: 0, 149: 0, 150: 0, 151: 0, 152: 0, 153: 0, 154: 12, 155: 0, 156: 1, 157: 2, 158: 2, 159: 2, 160: 2, 161: 1, 162: 10, 163: 0, 164: 2, 165: 4, 166: 11, 167: 5, 168: 9, 169: 5, 170: 3, 171: 0, 172: 0, 173: 2, 174: 0, 175: 0, 176: 1, 177: 0, 178: 0, 179: 0, 180: 0, 181: 0, 182: 0, 183: 0, 184: 0, 185: 0, 186: 0, 187: 0, 188: 0, 189: 0, 190: 0, 191: 0, 192: 0, 193: 0, 194: 0, 195: 0, 196: 0, 197: 0, 198: 0, 199: 0, 200: 0, 201: 0, 202: 0, 203: 0, 204: 0, 205: 0, 206: 0, 207: 4, 208: 0, 209: 0, 210: 0, 211: 0, 212: 0, 213: 0, 214: 12, 215: 2, 216: 2, 217: 2, 218: 5, 219: 4, 220: 7, 221: 3, 222: 2, 223: 1}, 'Covariate2': {0: 1, 1: 1, 2: 0, 3: 0, 4: 0, 5: 1, 6: 1, 7: 0, 8: 1, 9: 1, 10: 0, 11: 1, 12: 1, 13: 1, 14: 0, 15: 0, 16: 0, 17: 0, 18: 0, 19: 0, 20: 0, 21: 0, 22: 0, 23: 0, 24: 0, 25: 0, 26: 0, 27: 0, 28: 0, 29: 0, 30: 0, 31: 0, 32: 0, 33: 0, 34: 0, 35: 0, 36: 0, 37: 0, 38: 0, 39: 0, 40: 0, 41: 0, 42: 0, 43: 0, 44: 0, 45: 0, 46: 0, 47: 0, 48: 0, 49: 0, 50: 0, 51: 1, 52: 0, 53: 1, 54: 1, 55: 1, 56: 1, 57: 1, 58: 1, 59: 1, 60: 1, 61: 1, 62: 1, 63: 1, 64: 1, 65: 1, 66: 1, 67: 1, 68: 1, 69: 1, 70: 0, 71: 0, 72: 0, 73: 0, 74: 0, 75: 0, 76: 0, 77: 0, 78: 0, 79: 0, 80: 0, 81: 0, 82: 0, 83: 0, 84: 0, 85: 0, 86: 0, 87: 0, 88: 0, 89: 0, 90: 0, 91: 0, 92: 0, 93: 0, 94: 0, 95: 0, 96: 0, 97: 0, 98: 0, 99: 1, 100: 1, 101: 1, 102: 1, 103: 0, 104: 1, 105: 0, 106: 1, 107: 1, 108: 1, 109: 1, 110: 1, 111: 0, 112: 0, 113: 1, 114: 1, 115: 1, 116: 1, 117: 1, 118: 1, 119: 0, 120: 0, 121: 0, 122: 0, 123: 0, 124: 0, 125: 0, 126: 0, 127: 0, 128: 0, 129: 0, 130: 0, 131: 0, 132: 0, 133: 0, 134: 0, 135: 0, 136: 0, 137: 0, 138: 0, 139: 0, 140: 0, 141: 0, 142: 0, 143: 0, 144: 0, 145: 0, 146: 0, 147: 0, 148: 0, 149: 0, 150: 0, 151: 0, 152: 0, 153: 0, 154: 1, 155: 0, 156: 1, 157: 1, 158: 1, 159: 1, 160: 1, 161: 1, 162: 1, 163: 0, 164: 1, 165: 1, 166: 1, 167: 1, 168: 1, 169: 1, 170: 1, 171: 0, 172: 0, 173: 1, 174: 0, 175: 0, 176: 1, 177: 0, 178: 0, 179: 0, 180: 0, 181: 0, 182: 0, 183: 0, 184: 0, 185: 0, 186: 0, 187: 0, 188: 0, 189: 0, 190: 0, 191: 0, 192: 0, 193: 0, 194: 0, 195: 0, 196: 0, 197: 0, 198: 0, 199: 0, 200: 0, 201: 0, 202: 0, 203: 0, 204: 0, 205: 0, 206: 0, 207: 1, 208: 0, 209: 0, 210: 0, 211: 0, 212: 0, 213: 0, 214: 1, 215: 1, 216: 1, 217: 1, 218: 1, 219: 1, 220: 1, 221: 1, 222: 1, 223: 1}, 'Covariate1': {0: 0, 1: 0, 2: 0, 3: 0, 4: 0, 5: 0, 6: 0, 7: 0, 8: 1, 9: 1, 10: 0, 11: 0, 12: 0, 13: 0, 14: 0, 15: 0, 16: 0, 17: 0, 18: 0, 19: 0, 20: 0, 21: 0, 22: 0, 23: 0, 24: 0, 25: 0, 26: 0, 27: 0, 28: 0, 29: 0, 30: 0, 31: 0, 32: 0, 33: 0, 34: 0, 35: 0, 36: 0, 37: 0, 38: 0, 39: 0, 40: 0, 41: 0, 42: 0, 43: 0, 44: 0, 45: 0, 46: 0, 47: 0, 48: 0, 49: 0, 50: 0, 51: 0, 52: 0, 53: 0, 54: 0, 55: 0, 56: 0, 57: 0, 58: 0, 59: 0, 60: 1, 61: 1, 62: 0, 63: 0, 64: 0, 65: 0, 66: 0, 67: 0, 68: 0, 69: 0, 70: 0, 71: 0, 72: 0, 73: 0, 74: 0, 75: 0, 76: 0, 77: 0, 78: 0, 79: 0, 80: 0, 81: 0, 82: 0, 83: 0, 84: 0, 85: 0, 86: 0, 87: 0, 88: 0, 89: 0, 90: 0, 91: 0, 92: 0, 93: 0, 94: 0, 95: 0, 96: 0, 97: 0, 98: 0, 99: 0, 100: 0, 101: 0, 102: 0, 103: 0, 104: 0, 105: 0, 106: 0, 107: 0, 108: 1, 109: 1, 110: 1, 111: 0, 112: 0, 113: 0, 114: 0, 115: 0, 116: 0, 117: 0, 118: 0, 119: 0, 120: 0, 121: 0, 122: 0, 123: 0, 124: 0, 125: 0, 126: 0, 127: 0, 128: 0, 129: 0, 130: 0, 131: 0, 132: 0, 133: 0, 134: 0, 135: 0, 136: 0, 137: 0, 138: 0, 139: 0, 140: 0, 141: 0, 142: 0, 143: 0, 144: 0, 145: 0, 146: 0, 147: 0, 148: 0, 149: 0, 150: 0, 151: 0, 152: 0, 153: 0, 154: 0, 155: 0, 156: 0, 157: 0, 158: 0, 159: 0, 160: 0, 161: 1, 162: 1, 163: 0, 164: 0, 165: 0, 166: 0, 167: 0, 168: 0, 169: 0, 170: 0, 171: 0, 172: 0, 173: 0, 174: 0, 175: 0, 176: 0, 177: 0, 178: 0, 179: 0, 180: 0, 181: 0, 182: 0, 183: 0, 184: 0, 185: 0, 186: 0, 187: 0, 188: 0, 189: 0, 190: 0, 191: 0, 192: 0, 193: 0, 194: 0, 195: 0, 196: 0, 197: 0, 198: 0, 199: 0, 200: 0, 201: 0, 202: 0, 203: 0, 204: 0, 205: 0, 206: 0, 207: 0, 208: 0, 209: 0, 210: 0, 211: 0, 212: 0, 213: 0, 214: 1, 215: 0, 216: 0, 217: 0, 218: 0, 219: 0, 220: 0, 221: 0, 222: 0, 223: 0}, 'Date': {0: Timestamp('2016-11-14 00:00:00'), 1: Timestamp('2016-11-21 00:00:00'), 2: Timestamp('2016-11-28 00:00:00'), 3: Timestamp('2016-12-05 00:00:00'), 4: Timestamp('2016-12-12 00:00:00'), 5: Timestamp('2016-12-19 00:00:00'), 6: Timestamp('2016-12-26 00:00:00'), 7: Timestamp('2017-01-02 00:00:00'), 8: Timestamp('2017-01-09 00:00:00'), 9: Timestamp('2017-01-16 00:00:00'), 10: Timestamp('2017-01-23 00:00:00'), 11: Timestamp('2017-01-30 00:00:00'), 12: Timestamp('2017-02-06 00:00:00'), 13: Timestamp('2017-02-13 00:00:00'), 14: Timestamp('2017-02-20 00:00:00'), 15: Timestamp('2017-02-27 00:00:00'), 16: Timestamp('2017-03-06 00:00:00'), 17: Timestamp('2017-03-13 00:00:00'), 18: Timestamp('2017-03-20 00:00:00'), 19: Timestamp('2017-03-27 00:00:00'), 20: Timestamp('2017-04-03 00:00:00'), 21: Timestamp('2017-04-10 00:00:00'), 22: Timestamp('2017-04-17 00:00:00'), 23: Timestamp('2017-04-24 00:00:00'), 24: Timestamp('2017-05-01 00:00:00'), 25: Timestamp('2017-05-08 00:00:00'), 26: Timestamp('2017-05-15 00:00:00'), 27: Timestamp('2017-05-22 00:00:00'), 28: Timestamp('2017-05-29 00:00:00'), 29: Timestamp('2017-06-05 00:00:00'), 30: Timestamp('2017-06-12 00:00:00'), 31: Timestamp('2017-06-19 00:00:00'), 32: Timestamp('2017-06-26 00:00:00'), 33: Timestamp('2017-07-03 00:00:00'), 34: Timestamp('2017-07-10 00:00:00'), 35: Timestamp('2017-07-17 00:00:00'), 36: Timestamp('2017-07-24 00:00:00'), 37: Timestamp('2017-07-31 00:00:00'), 38: Timestamp('2017-08-07 00:00:00'), 39: Timestamp('2017-08-14 00:00:00'), 40: Timestamp('2017-08-21 00:00:00'), 41: Timestamp('2017-08-28 00:00:00'), 42: Timestamp('2017-09-04 00:00:00'), 43: Timestamp('2017-09-11 00:00:00'), 44: Timestamp('2017-09-18 00:00:00'), 45: Timestamp('2017-09-25 00:00:00'), 46: Timestamp('2017-10-02 00:00:00'), 47: Timestamp('2017-10-09 00:00:00'), 48: Timestamp('2017-10-16 00:00:00'), 49: Timestamp('2017-10-23 00:00:00'), 50: Timestamp('2017-10-30 00:00:00'), 51: Timestamp('2017-11-06 00:00:00'), 52: Timestamp('2017-11-13 00:00:00'), 53: Timestamp('2017-11-20 00:00:00'), 54: Timestamp('2017-11-27 00:00:00'), 55: Timestamp('2017-12-04 00:00:00'), 56: Timestamp('2017-12-11 00:00:00'), 57: Timestamp('2017-12-18 00:00:00'), 58: Timestamp('2017-12-25 00:00:00'), 59: Timestamp('2018-01-01 00:00:00'), 60: Timestamp('2018-01-08 00:00:00'), 61: Timestamp('2018-01-15 00:00:00'), 62: Timestamp('2018-01-22 00:00:00'), 63: Timestamp('2018-01-29 00:00:00'), 64: Timestamp('2018-02-05 00:00:00'), 65: Timestamp('2018-02-12 00:00:00'), 66: Timestamp('2018-02-19 00:00:00'), 67: Timestamp('2018-02-26 00:00:00'), 68: Timestamp('2018-03-05 00:00:00'), 69: Timestamp('2018-03-12 00:00:00'), 70: Timestamp('2018-03-19 00:00:00'), 71: Timestamp('2018-03-26 00:00:00'), 72: Timestamp('2018-04-02 00:00:00'), 73: Timestamp('2018-04-09 00:00:00'), 74: Timestamp('2018-04-16 00:00:00'), 75: Timestamp('2018-04-23 00:00:00'), 76: Timestamp('2018-04-30 00:00:00'), 77: Timestamp('2018-05-07 00:00:00'), 78: Timestamp('2018-05-14 00:00:00'), 79: Timestamp('2018-05-21 00:00:00'), 80: Timestamp('2018-05-28 00:00:00'), 81: Timestamp('2018-06-04 00:00:00'), 82: Timestamp('2018-06-11 00:00:00'), 83: Timestamp('2018-06-18 00:00:00'), 84: Timestamp('2018-06-25 00:00:00'), 85: Timestamp('2018-07-02 00:00:00'), 86: Timestamp('2018-07-09 00:00:00'), 87: Timestamp('2018-07-16 00:00:00'), 88: Timestamp('2018-07-23 00:00:00'), 89: Timestamp('2018-07-30 00:00:00'), 90: Timestamp('2018-08-06 00:00:00'), 91: Timestamp('2018-08-13 00:00:00'), 92: Timestamp('2018-08-20 00:00:00'), 93: Timestamp('2018-08-27 00:00:00'), 94: Timestamp('2018-09-03 00:00:00'), 95: Timestamp('2018-09-10 00:00:00'), 96: Timestamp('2018-09-17 00:00:00'), 97: Timestamp('2018-09-24 00:00:00'), 98: Timestamp('2018-10-01 00:00:00'), 99: Timestamp('2018-10-08 00:00:00'), 100: Timestamp('2018-10-15 00:00:00'), 101: Timestamp('2018-10-22 00:00:00'), 102: Timestamp('2018-10-29 00:00:00'), 103: Timestamp('2018-11-05 00:00:00'), 104: Timestamp('2018-11-12 00:00:00'), 105: Timestamp('2018-11-19 00:00:00'), 106: Timestamp('2018-11-26 00:00:00'), 107: Timestamp('2018-12-03 00:00:00'), 108: Timestamp('2018-12-10 00:00:00'), 109: Timestamp('2018-12-17 00:00:00'), 110: Timestamp('2018-12-24 00:00:00'), 111: Timestamp('2018-12-31 00:00:00'), 112: Timestamp('2019-01-07 00:00:00'), 113: Timestamp('2019-01-14 00:00:00'), 114: Timestamp('2019-01-21 00:00:00'), 115: Timestamp('2019-01-28 00:00:00'), 116: Timestamp('2019-02-04 00:00:00'), 117: Timestamp('2019-02-11 00:00:00'), 118: Timestamp('2019-02-18 00:00:00'), 119: Timestamp('2019-02-25 00:00:00'), 120: Timestamp('2019-03-04 00:00:00'), 121: Timestamp('2019-03-11 00:00:00'), 122: Timestamp('2019-03-18 00:00:00'), 123: Timestamp('2019-03-25 00:00:00'), 124: Timestamp('2019-04-01 00:00:00'), 125: Timestamp('2019-04-08 00:00:00'), 126: Timestamp('2019-04-15 00:00:00'), 127: Timestamp('2019-04-22 00:00:00'), 128: Timestamp('2019-04-29 00:00:00'), 129: Timestamp('2019-05-06 00:00:00'), 130: Timestamp('2019-05-13 00:00:00'), 131: Timestamp('2019-05-20 00:00:00'), 132: Timestamp('2019-05-27 00:00:00'), 133: Timestamp('2019-06-03 00:00:00'), 134: Timestamp('2019-06-10 00:00:00'), 135: Timestamp('2019-06-17 00:00:00'), 136: Timestamp('2019-06-24 00:00:00'), 137: Timestamp('2019-07-01 00:00:00'), 138: Timestamp('2019-07-08 00:00:00'), 139: Timestamp('2019-07-15 00:00:00'), 140: Timestamp('2019-07-22 00:00:00'), 141: Timestamp('2019-07-29 00:00:00'), 142: Timestamp('2019-08-05 00:00:00'), 143: Timestamp('2019-08-12 00:00:00'), 144: Timestamp('2019-08-19 00:00:00'), 145: Timestamp('2019-08-26 00:00:00'), 146: Timestamp('2019-09-02 00:00:00'), 147: Timestamp('2019-09-09 00:00:00'), 148: Timestamp('2019-09-16 00:00:00'), 149: Timestamp('2019-09-23 00:00:00'), 150: Timestamp('2019-09-30 00:00:00'), 151: Timestamp('2019-10-07 00:00:00'), 152: Timestamp('2019-10-14 00:00:00'), 153: Timestamp('2019-10-21 00:00:00'), 154: Timestamp('2019-10-28 00:00:00'), 155: Timestamp('2019-11-04 00:00:00'), 156: Timestamp('2019-11-11 00:00:00'), 157: Timestamp('2019-11-18 00:00:00'), 158: Timestamp('2019-11-25 00:00:00'), 159: Timestamp('2019-12-02 00:00:00'), 160: Timestamp('2019-12-09 00:00:00'), 161: Timestamp('2019-12-16 00:00:00'), 162: Timestamp('2019-12-23 00:00:00'), 163: Timestamp('2019-12-30 00:00:00'), 164: Timestamp('2020-01-06 00:00:00'), 165: Timestamp('2020-01-13 00:00:00'), 166: Timestamp('2020-01-20 00:00:00'), 167: Timestamp('2020-01-27 00:00:00'), 168: Timestamp('2020-02-03 00:00:00'), 169: Timestamp('2020-02-10 00:00:00'), 170: Timestamp('2020-02-17 00:00:00'), 171: Timestamp('2020-02-24 00:00:00'), 172: Timestamp('2020-03-02 00:00:00'), 173: Timestamp('2020-03-09 00:00:00'), 174: Timestamp('2020-03-16 00:00:00'), 175: Timestamp('2020-03-23 00:00:00'), 176: Timestamp('2020-03-30 00:00:00'), 177: Timestamp('2020-04-06 00:00:00'), 178: Timestamp('2020-04-13 00:00:00'), 179: Timestamp('2020-04-20 00:00:00'), 180: Timestamp('2020-04-27 00:00:00'), 181: Timestamp('2020-05-04 00:00:00'), 182: Timestamp('2020-05-11 00:00:00'), 183: Timestamp('2020-05-18 00:00:00'), 184: Timestamp('2020-05-25 00:00:00'), 185: Timestamp('2020-06-01 00:00:00'), 186: Timestamp('2020-06-08 00:00:00'), 187: Timestamp('2020-06-15 00:00:00'), 188: Timestamp('2020-06-22 00:00:00'), 189: Timestamp('2020-06-29 00:00:00'), 190: Timestamp('2020-07-06 00:00:00'), 191: Timestamp('2020-07-13 00:00:00'), 192: Timestamp('2020-07-20 00:00:00'), 193: Timestamp('2020-07-27 00:00:00'), 194: Timestamp('2020-08-03 00:00:00'), 195: Timestamp('2020-08-10 00:00:00'), 196: Timestamp('2020-08-17 00:00:00'), 197: Timestamp('2020-08-24 00:00:00'), 198: Timestamp('2020-08-31 00:00:00'), 199: Timestamp('2020-09-07 00:00:00'), 200: Timestamp('2020-09-14 00:00:00'), 201: Timestamp('2020-09-21 00:00:00'), 202: Timestamp('2020-09-28 00:00:00'), 203: Timestamp('2020-10-05 00:00:00'), 204: Timestamp('2020-10-12 00:00:00'), 205: Timestamp('2020-10-19 00:00:00'), 206: Timestamp('2020-10-26 00:00:00'), 207: Timestamp('2020-11-02 00:00:00'), 208: Timestamp('2020-11-09 00:00:00'), 209: Timestamp('2020-11-16 00:00:00'), 210: Timestamp('2020-11-23 00:00:00'), 211: Timestamp('2020-11-30 00:00:00'), 212: Timestamp('2020-12-07 00:00:00'), 213: Timestamp('2020-12-14 00:00:00'), 214: Timestamp('2020-12-21 00:00:00'), 215: Timestamp('2020-12-28 00:00:00'), 216: Timestamp('2021-01-04 00:00:00'), 217: Timestamp('2021-01-11 00:00:00'), 218: Timestamp('2021-01-18 00:00:00'), 219: Timestamp('2021-01-25 00:00:00'), 220: Timestamp('2021-02-01 00:00:00'), 221: Timestamp('2021-02-08 00:00:00'), 222: Timestamp('2021-02-15 00:00:00'), 223: Timestamp('2021-02-22 00:00:00')}})
  • Maybe dups: https://stats.stackexchange.com/questions/198887/time-series-with-a-sequence-of-zeros, https://stats.stackexchange.com/questions/372125/forecasting-daily-time-series-sales-revenue-with-many-zero-entries – kjetil b halvorsen Oct 25 '21 at 18:31
  • Why do you require integer forecasts? Put differently, which functional of the unknown future distribution do you want to elicit? An unbiased expectation forecast will not be integer, but a quantile (e.g., the median) will be. Essentially, this question boils down to what you later want to use the forecast for. – Stephan Kolassa Oct 26 '21 at 05:40
  • I require an integer forecast because I am forecasting the # of enrolled students (integer number), that's why I thought this way. Also because I have a lot of 0 (any procedure like ARIMA or prophet will produce negative values. (If I am wrong, I can do it differently)... I want to do a forecast that contains the trend and seasonality (basically), plus covariate (if possible) – Guilherme Parreira Oct 26 '21 at 12:01

1 Answers1

1

I would stick to something simple like the average of each period. It is clearly seasonal and as you noted some models won't like a lot of the zeros.

You could try a package I maintain ThymeBoost which will do a simple average for a given set of parameters.

We can define a simple model such as a median 'trend' plus classic (average each period) seasonality, and it seems to have reasonable results:

from pandas import Timestamp
from ThymeBoost import ThymeBoost as tb
dd = pd.DataFrame.from_dict({'date': {0: Timestamp('2016-11-14 00:00:00'), 1: Timestamp('2016-11-21 00:00:00'), 2: Timestamp('2016-11-28 00:00:00'), 3: Timestamp('2016-12-05 00:00:00'), 4: Timestamp('2016-12-12 00:00:00'), 5: Timestamp('2016-12-19 00:00:00'), 6: Timestamp('2016-12-26 00:00:00'), 7: Timestamp('2017-01-02 00:00:00'), 8: Timestamp('2017-01-09 00:00:00'), 9: Timestamp('2017-01-16 00:00:00'), 10: Timestamp('2017-01-23 00:00:00'), 11: Timestamp('2017-01-30 00:00:00'), 12: Timestamp('2017-02-06 00:00:00'), 13: Timestamp('2017-02-13 00:00:00'), 14: Timestamp('2017-02-20 00:00:00'), 15: Timestamp('2017-02-27 00:00:00'), 16: Timestamp('2017-03-06 00:00:00'), 17: Timestamp('2017-03-13 00:00:00'), 18: Timestamp('2017-03-20 00:00:00'), 19: Timestamp('2017-03-27 00:00:00'), 20: Timestamp('2017-04-03 00:00:00'), 21: Timestamp('2017-04-10 00:00:00'), 22: Timestamp('2017-04-17 00:00:00'), 23: Timestamp('2017-04-24 00:00:00'), 24: Timestamp('2017-05-01 00:00:00'), 25: Timestamp('2017-05-08 00:00:00'), 26: Timestamp('2017-05-15 00:00:00'), 27: Timestamp('2017-05-22 00:00:00'), 28: Timestamp('2017-05-29 00:00:00'), 29: Timestamp('2017-06-05 00:00:00'), 30: Timestamp('2017-06-12 00:00:00'), 31: Timestamp('2017-06-19 00:00:00'), 32: Timestamp('2017-06-26 00:00:00'), 33: Timestamp('2017-07-03 00:00:00'), 34: Timestamp('2017-07-10 00:00:00'), 35: Timestamp('2017-07-17 00:00:00'), 36: Timestamp('2017-07-24 00:00:00'), 37: Timestamp('2017-07-31 00:00:00'), 38: Timestamp('2017-08-07 00:00:00'), 39: Timestamp('2017-08-14 00:00:00'), 40: Timestamp('2017-08-21 00:00:00'), 41: Timestamp('2017-08-28 00:00:00'), 42: Timestamp('2017-09-04 00:00:00'), 43: Timestamp('2017-09-11 00:00:00'), 44: Timestamp('2017-09-18 00:00:00'), 45: Timestamp('2017-09-25 00:00:00'), 46: Timestamp('2017-10-02 00:00:00'), 47: Timestamp('2017-10-09 00:00:00'), 48: Timestamp('2017-10-16 00:00:00'), 49: Timestamp('2017-10-23 00:00:00'), 50: Timestamp('2017-10-30 00:00:00'), 51: Timestamp('2017-11-06 00:00:00'), 52: Timestamp('2017-11-13 00:00:00'), 53: Timestamp('2017-11-20 00:00:00'), 54: Timestamp('2017-11-27 00:00:00'), 55: Timestamp('2017-12-04 00:00:00'), 56: Timestamp('2017-12-11 00:00:00'), 57: Timestamp('2017-12-18 00:00:00'), 58: Timestamp('2017-12-25 00:00:00'), 59: Timestamp('2018-01-01 00:00:00'), 60: Timestamp('2018-01-08 00:00:00'), 61: Timestamp('2018-01-15 00:00:00'), 62: Timestamp('2018-01-22 00:00:00'), 63: Timestamp('2018-01-29 00:00:00'), 64: Timestamp('2018-02-05 00:00:00'), 65: Timestamp('2018-02-12 00:00:00'), 66: Timestamp('2018-02-19 00:00:00'), 67: Timestamp('2018-02-26 00:00:00'), 68: Timestamp('2018-03-05 00:00:00'), 69: Timestamp('2018-03-12 00:00:00'), 70: Timestamp('2018-03-19 00:00:00'), 71: Timestamp('2018-03-26 00:00:00'), 72: Timestamp('2018-04-02 00:00:00'), 73: Timestamp('2018-04-09 00:00:00'), 74: Timestamp('2018-04-16 00:00:00'), 75: Timestamp('2018-04-23 00:00:00'), 76: Timestamp('2018-04-30 00:00:00'), 77: Timestamp('2018-05-07 00:00:00'), 78: Timestamp('2018-05-14 00:00:00'), 79: Timestamp('2018-05-21 00:00:00'), 80: Timestamp('2018-05-28 00:00:00'), 81: Timestamp('2018-06-04 00:00:00'), 82: Timestamp('2018-06-11 00:00:00'), 83: Timestamp('2018-06-18 00:00:00'), 84: Timestamp('2018-06-25 00:00:00'), 85: Timestamp('2018-07-02 00:00:00'), 86: Timestamp('2018-07-09 00:00:00'), 87: Timestamp('2018-07-16 00:00:00'), 88: Timestamp('2018-07-23 00:00:00'), 89: Timestamp('2018-07-30 00:00:00'), 90: Timestamp('2018-08-06 00:00:00'), 91: Timestamp('2018-08-13 00:00:00'), 92: Timestamp('2018-08-20 00:00:00'), 93: Timestamp('2018-08-27 00:00:00'), 94: Timestamp('2018-09-03 00:00:00'), 95: Timestamp('2018-09-10 00:00:00'), 96: Timestamp('2018-09-17 00:00:00'), 97: Timestamp('2018-09-24 00:00:00'), 98: Timestamp('2018-10-01 00:00:00'), 99: Timestamp('2018-10-08 00:00:00'), 100: Timestamp('2018-10-15 00:00:00'), 101: Timestamp('2018-10-22 00:00:00'), 102: Timestamp('2018-10-29 00:00:00'), 103: Timestamp('2018-11-05 00:00:00'), 104: Timestamp('2018-11-12 00:00:00'), 105: Timestamp('2018-11-19 00:00:00'), 106: Timestamp('2018-11-26 00:00:00'), 107: Timestamp('2018-12-03 00:00:00'), 108: Timestamp('2018-12-10 00:00:00'), 109: Timestamp('2018-12-17 00:00:00'), 110: Timestamp('2018-12-24 00:00:00'), 111: Timestamp('2018-12-31 00:00:00'), 112: Timestamp('2019-01-07 00:00:00'), 113: Timestamp('2019-01-14 00:00:00'), 114: Timestamp('2019-01-21 00:00:00'), 115: Timestamp('2019-01-28 00:00:00'), 116: Timestamp('2019-02-04 00:00:00'), 117: Timestamp('2019-02-11 00:00:00'), 118: Timestamp('2019-02-18 00:00:00'), 119: Timestamp('2019-02-25 00:00:00'), 120: Timestamp('2019-03-04 00:00:00'), 121: Timestamp('2019-03-11 00:00:00'), 122: Timestamp('2019-03-18 00:00:00'), 123: Timestamp('2019-03-25 00:00:00'), 124: Timestamp('2019-04-01 00:00:00'), 125: Timestamp('2019-04-08 00:00:00'), 126: Timestamp('2019-04-15 00:00:00'), 127: Timestamp('2019-04-22 00:00:00'), 128: Timestamp('2019-04-29 00:00:00'), 129: Timestamp('2019-05-06 00:00:00'), 130: Timestamp('2019-05-13 00:00:00'), 131: Timestamp('2019-05-20 00:00:00'), 132: Timestamp('2019-05-27 00:00:00'), 133: Timestamp('2019-06-03 00:00:00'), 134: Timestamp('2019-06-10 00:00:00'), 135: Timestamp('2019-06-17 00:00:00'), 136: Timestamp('2019-06-24 00:00:00'), 137: Timestamp('2019-07-01 00:00:00'), 138: Timestamp('2019-07-08 00:00:00'), 139: Timestamp('2019-07-15 00:00:00'), 140: Timestamp('2019-07-22 00:00:00'), 141: Timestamp('2019-07-29 00:00:00'), 142: Timestamp('2019-08-05 00:00:00'), 143: Timestamp('2019-08-12 00:00:00'), 144: Timestamp('2019-08-19 00:00:00'), 145: Timestamp('2019-08-26 00:00:00'), 146: Timestamp('2019-09-02 00:00:00'), 147: Timestamp('2019-09-09 00:00:00'), 148: Timestamp('2019-09-16 00:00:00'), 149: Timestamp('2019-09-23 00:00:00'), 150: Timestamp('2019-09-30 00:00:00'), 151: Timestamp('2019-10-07 00:00:00'), 152: Timestamp('2019-10-14 00:00:00'), 153: Timestamp('2019-10-21 00:00:00'), 154: Timestamp('2019-10-28 00:00:00'), 155: Timestamp('2019-11-04 00:00:00'), 156: Timestamp('2019-11-11 00:00:00'), 157: Timestamp('2019-11-18 00:00:00'), 158: Timestamp('2019-11-25 00:00:00'), 159: Timestamp('2019-12-02 00:00:00'), 160: Timestamp('2019-12-09 00:00:00'), 161: Timestamp('2019-12-16 00:00:00'), 162: Timestamp('2019-12-23 00:00:00'), 163: Timestamp('2019-12-30 00:00:00'), 164: Timestamp('2020-01-06 00:00:00'), 165: Timestamp('2020-01-13 00:00:00'), 166: Timestamp('2020-01-20 00:00:00'), 167: Timestamp('2020-01-27 00:00:00'), 168: Timestamp('2020-02-03 00:00:00'), 169: Timestamp('2020-02-10 00:00:00'), 170: Timestamp('2020-02-17 00:00:00'), 171: Timestamp('2020-02-24 00:00:00'), 172: Timestamp('2020-03-02 00:00:00'), 173: Timestamp('2020-03-09 00:00:00'), 174: Timestamp('2020-03-16 00:00:00'), 175: Timestamp('2020-03-23 00:00:00'), 176: Timestamp('2020-03-30 00:00:00'), 177: Timestamp('2020-04-06 00:00:00'), 178: Timestamp('2020-04-13 00:00:00'), 179: Timestamp('2020-04-20 00:00:00'), 180: Timestamp('2020-04-27 00:00:00'), 181: Timestamp('2020-05-04 00:00:00'), 182: Timestamp('2020-05-11 00:00:00'), 183: Timestamp('2020-05-18 00:00:00'), 184: Timestamp('2020-05-25 00:00:00'), 185: Timestamp('2020-06-01 00:00:00'), 186: Timestamp('2020-06-08 00:00:00'), 187: Timestamp('2020-06-15 00:00:00'), 188: Timestamp('2020-06-22 00:00:00'), 189: Timestamp('2020-06-29 00:00:00'), 190: Timestamp('2020-07-06 00:00:00'), 191: Timestamp('2020-07-13 00:00:00'), 192: Timestamp('2020-07-20 00:00:00'), 193: Timestamp('2020-07-27 00:00:00'), 194: Timestamp('2020-08-03 00:00:00'), 195: Timestamp('2020-08-10 00:00:00'), 196: Timestamp('2020-08-17 00:00:00'), 197: Timestamp('2020-08-24 00:00:00'), 198: Timestamp('2020-08-31 00:00:00'), 199: Timestamp('2020-09-07 00:00:00'), 200: Timestamp('2020-09-14 00:00:00'), 201: Timestamp('2020-09-21 00:00:00'), 202: Timestamp('2020-09-28 00:00:00'), 203: Timestamp('2020-10-05 00:00:00'), 204: Timestamp('2020-10-12 00:00:00'), 205: Timestamp('2020-10-19 00:00:00'), 206: Timestamp('2020-10-26 00:00:00'), 207: Timestamp('2020-11-02 00:00:00'), 208: Timestamp('2020-11-09 00:00:00'), 209: Timestamp('2020-11-16 00:00:00'), 210: Timestamp('2020-11-23 00:00:00'), 211: Timestamp('2020-11-30 00:00:00'), 212: Timestamp('2020-12-07 00:00:00'), 213: Timestamp('2020-12-14 00:00:00'), 214: Timestamp('2020-12-21 00:00:00'), 215: Timestamp('2020-12-28 00:00:00'), 216: Timestamp('2021-01-04 00:00:00'), 217: Timestamp('2021-01-11 00:00:00'), 218: Timestamp('2021-01-18 00:00:00'), 219: Timestamp('2021-01-25 00:00:00'), 220: Timestamp('2021-02-01 00:00:00'), 221: Timestamp('2021-02-08 00:00:00'), 222: Timestamp('2021-02-15 00:00:00'), 223: Timestamp('2021-02-22 00:00:00')}, 'y': {0: 20, 1: 2, 2: 0, 3: 0, 4: 0, 5: 13, 6: 15, 7: 0, 8: 1, 9: 1, 10: 0, 11: 9, 12: 2, 13: 4, 14: 0, 15: 0, 16: 0, 17: 0, 18: 0, 19: 0, 20: 0, 21: 0, 22: 0, 23: 0, 24: 0, 25: 0, 26: 0, 27: 0, 28: 0, 29: 0, 30: 0, 31: 0, 32: 0, 33: 0, 34: 0, 35: 0, 36: 0, 37: 0, 38: 0, 39: 0, 40: 0, 41: 0, 42: 0, 43: 0, 44: 0, 45: 0, 46: 0, 47: 0, 48: 0, 49: 0, 50: 0, 51: 25, 52: 0, 53: 1, 54: 5, 55: 4, 56: 9, 57: 3, 58: 9, 59: 1, 60: 4, 61: 1, 62: 6, 63: 1, 64: 8, 65: 3, 66: 4, 67: 2, 68: 1, 69: 2, 70: 0, 71: 0, 72: 0, 73: 0, 74: 0, 75: 0, 76: 0, 77: 0, 78: 0, 79: 0, 80: 0, 81: 0, 82: 0, 83: 0, 84: 0, 85: 0, 86: 0, 87: 0, 88: 0, 89: 0, 90: 0, 91: 0, 92: 0, 93: 0, 94: 0, 95: 0, 96: 0, 97: 0, 98: 0, 99: 11, 100: 1, 101: 1, 102: 2, 103: 0, 104: 4, 105: 0, 106: 1, 107: 3, 108: 3, 109: 3, 110: 1, 111: 0, 112: 0, 113: 2, 114: 14, 115: 6, 116: 3, 117: 3, 118: 1, 119: 0, 120: 0, 121: 0, 122: 0, 123: 0, 124: 0, 125: 0, 126: 0, 127: 0, 128: 0, 129: 0, 130: 0, 131: 0, 132: 0, 133: 0, 134: 0, 135: 0, 136: 0, 137: 0, 138: 0, 139: 0, 140: 0, 141: 0, 142: 0, 143: 0, 144: 0, 145: 0, 146: 0, 147: 0, 148: 0, 149: 0, 150: 0, 151: 0, 152: 0, 153: 0, 154: 12, 155: 0, 156: 1, 157: 2, 158: 2, 159: 2, 160: 2, 161: 1, 162: 10, 163: 0, 164: 2, 165: 4, 166: 11, 167: 5, 168: 9, 169: 5, 170: 3, 171: 0, 172: 0, 173: 2, 174: 0, 175: 0, 176: 1, 177: 0, 178: 0, 179: 0, 180: 0, 181: 0, 182: 0, 183: 0, 184: 0, 185: 0, 186: 0, 187: 0, 188: 0, 189: 0, 190: 0, 191: 0, 192: 0, 193: 0, 194: 0, 195: 0, 196: 0, 197: 0, 198: 0, 199: 0, 200: 0, 201: 0, 202: 0, 203: 0, 204: 0, 205: 0, 206: 0, 207: 4, 208: 0, 209: 0, 210: 0, 211: 0, 212: 0, 213: 0, 214: 12, 215: 2, 216: 2, 217: 2, 218: 5, 219: 4, 220: 7, 221: 3, 222: 2, 223: 1}})
y = dd['y']
y.index = dd['date']
boosted_model = tb.ThymeBoost(verbose=1)

output = boosted_model.fit(y,
                           trend_estimator='median',
                           seasonal_estimator='classic',
                           seasonal_period=52,
                           global_cost='maicc',
                           fit_type='global',
                           )
predicted_output = boosted_model.predict(output, forecast_horizon=100)
boosted_model.plot_results(output, predicted_output)

enter image description here

The error bounds are off obviously but the actual predictions have a min of 0.

One issue with this will be that it isn't made for counts so you will have floats not integers so you will need to round, but since it is just a simple average you will never dip below 0 here.

EDIT with exogenous

Yes we can add exogenous but it will mess with our bounds since it is no longer a simple average but the average taking into account the extra features.

Feeding exogenous would be similar to prophet except you don't have to create the dataframe with the time series features just the future exogenous.

Here I will split it up into a basic train and test split. The exogenous estimator is a decision tree (you could also use 'ols' but tree looked better here) with depth of 1 and I switched the global_cost to mse as trees are iteration hungry in this setup.

import pandas as pd
from pandas import Timestamp
import matplotlib.pyplot as plt
from ThymeBoost import ThymeBoost as tb

dd = pd.DataFrame.from_dict({'y': {0: 20, 1: 2, 2: 0, 3: 0, 4: 0, 5: 13, 6: 15, 7: 0, 8: 1, 9: 1, 10: 0, 11: 9, 12: 2, 13: 4, 14: 0, 15: 0, 16: 0, 17: 0, 18: 0, 19: 0, 20: 0, 21: 0, 22: 0, 23: 0, 24: 0, 25: 0, 26: 0, 27: 0, 28: 0, 29: 0, 30: 0, 31: 0, 32: 0, 33: 0, 34: 0, 35: 0, 36: 0, 37: 0, 38: 0, 39: 0, 40: 0, 41: 0, 42: 0, 43: 0, 44: 0, 45: 0, 46: 0, 47: 0, 48: 0, 49: 0, 50: 0, 51: 25, 52: 0, 53: 1, 54: 5, 55: 4, 56: 9, 57: 3, 58: 9, 59: 1, 60: 4, 61: 1, 62: 6, 63: 1, 64: 8, 65: 3, 66: 4, 67: 2, 68: 1, 69: 2, 70: 0, 71: 0, 72: 0, 73: 0, 74: 0, 75: 0, 76: 0, 77: 0, 78: 0, 79: 0, 80: 0, 81: 0, 82: 0, 83: 0, 84: 0, 85: 0, 86: 0, 87: 0, 88: 0, 89: 0, 90: 0, 91: 0, 92: 0, 93: 0, 94: 0, 95: 0, 96: 0, 97: 0, 98: 0, 99: 11, 100: 1, 101: 1, 102: 2, 103: 0, 104: 4, 105: 0, 106: 1, 107: 3, 108: 3, 109: 3, 110: 1, 111: 0, 112: 0, 113: 2, 114: 14, 115: 6, 116: 3, 117: 3, 118: 1, 119: 0, 120: 0, 121: 0, 122: 0, 123: 0, 124: 0, 125: 0, 126: 0, 127: 0, 128: 0, 129: 0, 130: 0, 131: 0, 132: 0, 133: 0, 134: 0, 135: 0, 136: 0, 137: 0, 138: 0, 139: 0, 140: 0, 141: 0, 142: 0, 143: 0, 144: 0, 145: 0, 146: 0, 147: 0, 148: 0, 149: 0, 150: 0, 151: 0, 152: 0, 153: 0, 154: 12, 155: 0, 156: 1, 157: 2, 158: 2, 159: 2, 160: 2, 161: 1, 162: 10, 163: 0, 164: 2, 165: 4, 166: 11, 167: 5, 168: 9, 169: 5, 170: 3, 171: 0, 172: 0, 173: 2, 174: 0, 175: 0, 176: 1, 177: 0, 178: 0, 179: 0, 180: 0, 181: 0, 182: 0, 183: 0, 184: 0, 185: 0, 186: 0, 187: 0, 188: 0, 189: 0, 190: 0, 191: 0, 192: 0, 193: 0, 194: 0, 195: 0, 196: 0, 197: 0, 198: 0, 199: 0, 200: 0, 201: 0, 202: 0, 203: 0, 204: 0, 205: 0, 206: 0, 207: 4, 208: 0, 209: 0, 210: 0, 211: 0, 212: 0, 213: 0, 214: 12, 215: 2, 216: 2, 217: 2, 218: 5, 219: 4, 220: 7, 221: 3, 222: 2, 223: 1}, 'Covariate2': {0: 1, 1: 1, 2: 0, 3: 0, 4: 0, 5: 1, 6: 1, 7: 0, 8: 1, 9: 1, 10: 0, 11: 1, 12: 1, 13: 1, 14: 0, 15: 0, 16: 0, 17: 0, 18: 0, 19: 0, 20: 0, 21: 0, 22: 0, 23: 0, 24: 0, 25: 0, 26: 0, 27: 0, 28: 0, 29: 0, 30: 0, 31: 0, 32: 0, 33: 0, 34: 0, 35: 0, 36: 0, 37: 0, 38: 0, 39: 0, 40: 0, 41: 0, 42: 0, 43: 0, 44: 0, 45: 0, 46: 0, 47: 0, 48: 0, 49: 0, 50: 0, 51: 1, 52: 0, 53: 1, 54: 1, 55: 1, 56: 1, 57: 1, 58: 1, 59: 1, 60: 1, 61: 1, 62: 1, 63: 1, 64: 1, 65: 1, 66: 1, 67: 1, 68: 1, 69: 1, 70: 0, 71: 0, 72: 0, 73: 0, 74: 0, 75: 0, 76: 0, 77: 0, 78: 0, 79: 0, 80: 0, 81: 0, 82: 0, 83: 0, 84: 0, 85: 0, 86: 0, 87: 0, 88: 0, 89: 0, 90: 0, 91: 0, 92: 0, 93: 0, 94: 0, 95: 0, 96: 0, 97: 0, 98: 0, 99: 1, 100: 1, 101: 1, 102: 1, 103: 0, 104: 1, 105: 0, 106: 1, 107: 1, 108: 1, 109: 1, 110: 1, 111: 0, 112: 0, 113: 1, 114: 1, 115: 1, 116: 1, 117: 1, 118: 1, 119: 0, 120: 0, 121: 0, 122: 0, 123: 0, 124: 0, 125: 0, 126: 0, 127: 0, 128: 0, 129: 0, 130: 0, 131: 0, 132: 0, 133: 0, 134: 0, 135: 0, 136: 0, 137: 0, 138: 0, 139: 0, 140: 0, 141: 0, 142: 0, 143: 0, 144: 0, 145: 0, 146: 0, 147: 0, 148: 0, 149: 0, 150: 0, 151: 0, 152: 0, 153: 0, 154: 1, 155: 0, 156: 1, 157: 1, 158: 1, 159: 1, 160: 1, 161: 1, 162: 1, 163: 0, 164: 1, 165: 1, 166: 1, 167: 1, 168: 1, 169: 1, 170: 1, 171: 0, 172: 0, 173: 1, 174: 0, 175: 0, 176: 1, 177: 0, 178: 0, 179: 0, 180: 0, 181: 0, 182: 0, 183: 0, 184: 0, 185: 0, 186: 0, 187: 0, 188: 0, 189: 0, 190: 0, 191: 0, 192: 0, 193: 0, 194: 0, 195: 0, 196: 0, 197: 0, 198: 0, 199: 0, 200: 0, 201: 0, 202: 0, 203: 0, 204: 0, 205: 0, 206: 0, 207: 1, 208: 0, 209: 0, 210: 0, 211: 0, 212: 0, 213: 0, 214: 1, 215: 1, 216: 1, 217: 1, 218: 1, 219: 1, 220: 1, 221: 1, 222: 1, 223: 1}, 'Covariate1': {0: 0, 1: 0, 2: 0, 3: 0, 4: 0, 5: 0, 6: 0, 7: 0, 8: 1, 9: 1, 10: 0, 11: 0, 12: 0, 13: 0, 14: 0, 15: 0, 16: 0, 17: 0, 18: 0, 19: 0, 20: 0, 21: 0, 22: 0, 23: 0, 24: 0, 25: 0, 26: 0, 27: 0, 28: 0, 29: 0, 30: 0, 31: 0, 32: 0, 33: 0, 34: 0, 35: 0, 36: 0, 37: 0, 38: 0, 39: 0, 40: 0, 41: 0, 42: 0, 43: 0, 44: 0, 45: 0, 46: 0, 47: 0, 48: 0, 49: 0, 50: 0, 51: 0, 52: 0, 53: 0, 54: 0, 55: 0, 56: 0, 57: 0, 58: 0, 59: 0, 60: 1, 61: 1, 62: 0, 63: 0, 64: 0, 65: 0, 66: 0, 67: 0, 68: 0, 69: 0, 70: 0, 71: 0, 72: 0, 73: 0, 74: 0, 75: 0, 76: 0, 77: 0, 78: 0, 79: 0, 80: 0, 81: 0, 82: 0, 83: 0, 84: 0, 85: 0, 86: 0, 87: 0, 88: 0, 89: 0, 90: 0, 91: 0, 92: 0, 93: 0, 94: 0, 95: 0, 96: 0, 97: 0, 98: 0, 99: 0, 100: 0, 101: 0, 102: 0, 103: 0, 104: 0, 105: 0, 106: 0, 107: 0, 108: 1, 109: 1, 110: 1, 111: 0, 112: 0, 113: 0, 114: 0, 115: 0, 116: 0, 117: 0, 118: 0, 119: 0, 120: 0, 121: 0, 122: 0, 123: 0, 124: 0, 125: 0, 126: 0, 127: 0, 128: 0, 129: 0, 130: 0, 131: 0, 132: 0, 133: 0, 134: 0, 135: 0, 136: 0, 137: 0, 138: 0, 139: 0, 140: 0, 141: 0, 142: 0, 143: 0, 144: 0, 145: 0, 146: 0, 147: 0, 148: 0, 149: 0, 150: 0, 151: 0, 152: 0, 153: 0, 154: 0, 155: 0, 156: 0, 157: 0, 158: 0, 159: 0, 160: 0, 161: 1, 162: 1, 163: 0, 164: 0, 165: 0, 166: 0, 167: 0, 168: 0, 169: 0, 170: 0, 171: 0, 172: 0, 173: 0, 174: 0, 175: 0, 176: 0, 177: 0, 178: 0, 179: 0, 180: 0, 181: 0, 182: 0, 183: 0, 184: 0, 185: 0, 186: 0, 187: 0, 188: 0, 189: 0, 190: 0, 191: 0, 192: 0, 193: 0, 194: 0, 195: 0, 196: 0, 197: 0, 198: 0, 199: 0, 200: 0, 201: 0, 202: 0, 203: 0, 204: 0, 205: 0, 206: 0, 207: 0, 208: 0, 209: 0, 210: 0, 211: 0, 212: 0, 213: 0, 214: 1, 215: 0, 216: 0, 217: 0, 218: 0, 219: 0, 220: 0, 221: 0, 222: 0, 223: 0}, 'Date': {0: Timestamp('2016-11-14 00:00:00'), 1: Timestamp('2016-11-21 00:00:00'), 2: Timestamp('2016-11-28 00:00:00'), 3: Timestamp('2016-12-05 00:00:00'), 4: Timestamp('2016-12-12 00:00:00'), 5: Timestamp('2016-12-19 00:00:00'), 6: Timestamp('2016-12-26 00:00:00'), 7: Timestamp('2017-01-02 00:00:00'), 8: Timestamp('2017-01-09 00:00:00'), 9: Timestamp('2017-01-16 00:00:00'), 10: Timestamp('2017-01-23 00:00:00'), 11: Timestamp('2017-01-30 00:00:00'), 12: Timestamp('2017-02-06 00:00:00'), 13: Timestamp('2017-02-13 00:00:00'), 14: Timestamp('2017-02-20 00:00:00'), 15: Timestamp('2017-02-27 00:00:00'), 16: Timestamp('2017-03-06 00:00:00'), 17: Timestamp('2017-03-13 00:00:00'), 18: Timestamp('2017-03-20 00:00:00'), 19: Timestamp('2017-03-27 00:00:00'), 20: Timestamp('2017-04-03 00:00:00'), 21: Timestamp('2017-04-10 00:00:00'), 22: Timestamp('2017-04-17 00:00:00'), 23: Timestamp('2017-04-24 00:00:00'), 24: Timestamp('2017-05-01 00:00:00'), 25: Timestamp('2017-05-08 00:00:00'), 26: Timestamp('2017-05-15 00:00:00'), 27: Timestamp('2017-05-22 00:00:00'), 28: Timestamp('2017-05-29 00:00:00'), 29: Timestamp('2017-06-05 00:00:00'), 30: Timestamp('2017-06-12 00:00:00'), 31: Timestamp('2017-06-19 00:00:00'), 32: Timestamp('2017-06-26 00:00:00'), 33: Timestamp('2017-07-03 00:00:00'), 34: Timestamp('2017-07-10 00:00:00'), 35: Timestamp('2017-07-17 00:00:00'), 36: Timestamp('2017-07-24 00:00:00'), 37: Timestamp('2017-07-31 00:00:00'), 38: Timestamp('2017-08-07 00:00:00'), 39: Timestamp('2017-08-14 00:00:00'), 40: Timestamp('2017-08-21 00:00:00'), 41: Timestamp('2017-08-28 00:00:00'), 42: Timestamp('2017-09-04 00:00:00'), 43: Timestamp('2017-09-11 00:00:00'), 44: Timestamp('2017-09-18 00:00:00'), 45: Timestamp('2017-09-25 00:00:00'), 46: Timestamp('2017-10-02 00:00:00'), 47: Timestamp('2017-10-09 00:00:00'), 48: Timestamp('2017-10-16 00:00:00'), 49: Timestamp('2017-10-23 00:00:00'), 50: Timestamp('2017-10-30 00:00:00'), 51: Timestamp('2017-11-06 00:00:00'), 52: Timestamp('2017-11-13 00:00:00'), 53: Timestamp('2017-11-20 00:00:00'), 54: Timestamp('2017-11-27 00:00:00'), 55: Timestamp('2017-12-04 00:00:00'), 56: Timestamp('2017-12-11 00:00:00'), 57: Timestamp('2017-12-18 00:00:00'), 58: Timestamp('2017-12-25 00:00:00'), 59: Timestamp('2018-01-01 00:00:00'), 60: Timestamp('2018-01-08 00:00:00'), 61: Timestamp('2018-01-15 00:00:00'), 62: Timestamp('2018-01-22 00:00:00'), 63: Timestamp('2018-01-29 00:00:00'), 64: Timestamp('2018-02-05 00:00:00'), 65: Timestamp('2018-02-12 00:00:00'), 66: Timestamp('2018-02-19 00:00:00'), 67: Timestamp('2018-02-26 00:00:00'), 68: Timestamp('2018-03-05 00:00:00'), 69: Timestamp('2018-03-12 00:00:00'), 70: Timestamp('2018-03-19 00:00:00'), 71: Timestamp('2018-03-26 00:00:00'), 72: Timestamp('2018-04-02 00:00:00'), 73: Timestamp('2018-04-09 00:00:00'), 74: Timestamp('2018-04-16 00:00:00'), 75: Timestamp('2018-04-23 00:00:00'), 76: Timestamp('2018-04-30 00:00:00'), 77: Timestamp('2018-05-07 00:00:00'), 78: Timestamp('2018-05-14 00:00:00'), 79: Timestamp('2018-05-21 00:00:00'), 80: Timestamp('2018-05-28 00:00:00'), 81: Timestamp('2018-06-04 00:00:00'), 82: Timestamp('2018-06-11 00:00:00'), 83: Timestamp('2018-06-18 00:00:00'), 84: Timestamp('2018-06-25 00:00:00'), 85: Timestamp('2018-07-02 00:00:00'), 86: Timestamp('2018-07-09 00:00:00'), 87: Timestamp('2018-07-16 00:00:00'), 88: Timestamp('2018-07-23 00:00:00'), 89: Timestamp('2018-07-30 00:00:00'), 90: Timestamp('2018-08-06 00:00:00'), 91: Timestamp('2018-08-13 00:00:00'), 92: Timestamp('2018-08-20 00:00:00'), 93: Timestamp('2018-08-27 00:00:00'), 94: Timestamp('2018-09-03 00:00:00'), 95: Timestamp('2018-09-10 00:00:00'), 96: Timestamp('2018-09-17 00:00:00'), 97: Timestamp('2018-09-24 00:00:00'), 98: Timestamp('2018-10-01 00:00:00'), 99: Timestamp('2018-10-08 00:00:00'), 100: Timestamp('2018-10-15 00:00:00'), 101: Timestamp('2018-10-22 00:00:00'), 102: Timestamp('2018-10-29 00:00:00'), 103: Timestamp('2018-11-05 00:00:00'), 104: Timestamp('2018-11-12 00:00:00'), 105: Timestamp('2018-11-19 00:00:00'), 106: Timestamp('2018-11-26 00:00:00'), 107: Timestamp('2018-12-03 00:00:00'), 108: Timestamp('2018-12-10 00:00:00'), 109: Timestamp('2018-12-17 00:00:00'), 110: Timestamp('2018-12-24 00:00:00'), 111: Timestamp('2018-12-31 00:00:00'), 112: Timestamp('2019-01-07 00:00:00'), 113: Timestamp('2019-01-14 00:00:00'), 114: Timestamp('2019-01-21 00:00:00'), 115: Timestamp('2019-01-28 00:00:00'), 116: Timestamp('2019-02-04 00:00:00'), 117: Timestamp('2019-02-11 00:00:00'), 118: Timestamp('2019-02-18 00:00:00'), 119: Timestamp('2019-02-25 00:00:00'), 120: Timestamp('2019-03-04 00:00:00'), 121: Timestamp('2019-03-11 00:00:00'), 122: Timestamp('2019-03-18 00:00:00'), 123: Timestamp('2019-03-25 00:00:00'), 124: Timestamp('2019-04-01 00:00:00'), 125: Timestamp('2019-04-08 00:00:00'), 126: Timestamp('2019-04-15 00:00:00'), 127: Timestamp('2019-04-22 00:00:00'), 128: Timestamp('2019-04-29 00:00:00'), 129: Timestamp('2019-05-06 00:00:00'), 130: Timestamp('2019-05-13 00:00:00'), 131: Timestamp('2019-05-20 00:00:00'), 132: Timestamp('2019-05-27 00:00:00'), 133: Timestamp('2019-06-03 00:00:00'), 134: Timestamp('2019-06-10 00:00:00'), 135: Timestamp('2019-06-17 00:00:00'), 136: Timestamp('2019-06-24 00:00:00'), 137: Timestamp('2019-07-01 00:00:00'), 138: Timestamp('2019-07-08 00:00:00'), 139: Timestamp('2019-07-15 00:00:00'), 140: Timestamp('2019-07-22 00:00:00'), 141: Timestamp('2019-07-29 00:00:00'), 142: Timestamp('2019-08-05 00:00:00'), 143: Timestamp('2019-08-12 00:00:00'), 144: Timestamp('2019-08-19 00:00:00'), 145: Timestamp('2019-08-26 00:00:00'), 146: Timestamp('2019-09-02 00:00:00'), 147: Timestamp('2019-09-09 00:00:00'), 148: Timestamp('2019-09-16 00:00:00'), 149: Timestamp('2019-09-23 00:00:00'), 150: Timestamp('2019-09-30 00:00:00'), 151: Timestamp('2019-10-07 00:00:00'), 152: Timestamp('2019-10-14 00:00:00'), 153: Timestamp('2019-10-21 00:00:00'), 154: Timestamp('2019-10-28 00:00:00'), 155: Timestamp('2019-11-04 00:00:00'), 156: Timestamp('2019-11-11 00:00:00'), 157: Timestamp('2019-11-18 00:00:00'), 158: Timestamp('2019-11-25 00:00:00'), 159: Timestamp('2019-12-02 00:00:00'), 160: Timestamp('2019-12-09 00:00:00'), 161: Timestamp('2019-12-16 00:00:00'), 162: Timestamp('2019-12-23 00:00:00'), 163: Timestamp('2019-12-30 00:00:00'), 164: Timestamp('2020-01-06 00:00:00'), 165: Timestamp('2020-01-13 00:00:00'), 166: Timestamp('2020-01-20 00:00:00'), 167: Timestamp('2020-01-27 00:00:00'), 168: Timestamp('2020-02-03 00:00:00'), 169: Timestamp('2020-02-10 00:00:00'), 170: Timestamp('2020-02-17 00:00:00'), 171: Timestamp('2020-02-24 00:00:00'), 172: Timestamp('2020-03-02 00:00:00'), 173: Timestamp('2020-03-09 00:00:00'), 174: Timestamp('2020-03-16 00:00:00'), 175: Timestamp('2020-03-23 00:00:00'), 176: Timestamp('2020-03-30 00:00:00'), 177: Timestamp('2020-04-06 00:00:00'), 178: Timestamp('2020-04-13 00:00:00'), 179: Timestamp('2020-04-20 00:00:00'), 180: Timestamp('2020-04-27 00:00:00'), 181: Timestamp('2020-05-04 00:00:00'), 182: Timestamp('2020-05-11 00:00:00'), 183: Timestamp('2020-05-18 00:00:00'), 184: Timestamp('2020-05-25 00:00:00'), 185: Timestamp('2020-06-01 00:00:00'), 186: Timestamp('2020-06-08 00:00:00'), 187: Timestamp('2020-06-15 00:00:00'), 188: Timestamp('2020-06-22 00:00:00'), 189: Timestamp('2020-06-29 00:00:00'), 190: Timestamp('2020-07-06 00:00:00'), 191: Timestamp('2020-07-13 00:00:00'), 192: Timestamp('2020-07-20 00:00:00'), 193: Timestamp('2020-07-27 00:00:00'), 194: Timestamp('2020-08-03 00:00:00'), 195: Timestamp('2020-08-10 00:00:00'), 196: Timestamp('2020-08-17 00:00:00'), 197: Timestamp('2020-08-24 00:00:00'), 198: Timestamp('2020-08-31 00:00:00'), 199: Timestamp('2020-09-07 00:00:00'), 200: Timestamp('2020-09-14 00:00:00'), 201: Timestamp('2020-09-21 00:00:00'), 202: Timestamp('2020-09-28 00:00:00'), 203: Timestamp('2020-10-05 00:00:00'), 204: Timestamp('2020-10-12 00:00:00'), 205: Timestamp('2020-10-19 00:00:00'), 206: Timestamp('2020-10-26 00:00:00'), 207: Timestamp('2020-11-02 00:00:00'), 208: Timestamp('2020-11-09 00:00:00'), 209: Timestamp('2020-11-16 00:00:00'), 210: Timestamp('2020-11-23 00:00:00'), 211: Timestamp('2020-11-30 00:00:00'), 212: Timestamp('2020-12-07 00:00:00'), 213: Timestamp('2020-12-14 00:00:00'), 214: Timestamp('2020-12-21 00:00:00'), 215: Timestamp('2020-12-28 00:00:00'), 216: Timestamp('2021-01-04 00:00:00'), 217: Timestamp('2021-01-11 00:00:00'), 218: Timestamp('2021-01-18 00:00:00'), 219: Timestamp('2021-01-25 00:00:00'), 220: Timestamp('2021-02-01 00:00:00'), 221: Timestamp('2021-02-08 00:00:00'), 222: Timestamp('2021-02-15 00:00:00'), 223: Timestamp('2021-02-22 00:00:00')}})
dd.index = dd['Date']
dd_train = dd.iloc[:156, :]
dd_test = dd.iloc[156:, :]

y = dd_train['y']
y_test = dd_test['y']
exo = dd_train[['Covariate1', 'Covariate2']]
future_exo = dd_test[['Covariate1', 'Covariate2']]

boosted_model = tb.ThymeBoost(verbose=1)
output = boosted_model.fit(y,
                           trend_estimator='median',
                           seasonal_estimator='classic',
                           exogenous_estimator='decision_tree',
                           tree_depth=1,
                           exogenous=exo,
                           seasonal_period=52,
                           global_cost='mse',
                           fit_type='global',
                           )
predicted_output = boosted_model.predict(output,
                                         forecast_horizon=len(y_test),
                                         future_exogenous=future_exo)
boosted_model.plot_results(output, predicted_output)

enter image description here

Looks ok but now we have some negatives so in order to get exactly what you want here let's put in a floor and round:

#Now we have some issues with our bounds so we will put in a floor and round
predictions = predicted_output['predictions'].clip(lower=0)
predictions = predictions.round()
plt.plot(y_test, label='actuals')
plt.plot(predictions, label='predicted')
plt.legend()
plt.show()

enter image description here

Tylerr
  • 1,225
  • 5
  • 16
  • Tks Tylerr! I confess that I was thinking about using sth like this (an average of the recent and 1 year ago data). Can I also include covariates using your method? (do you have an example to show?). Because the large period with 0 is deterministic (and the one with spike two, but it can change every year). – Guilherme Parreira Oct 25 '21 at 19:26
  • I just included two covariates in the dataset example – Guilherme Parreira Oct 25 '21 at 19:37
  • 1
    Yep edited my answer. TLDR: yes we can add covariates and use either a decision_tree (for any rounds >1 this becomes a boosted tree) or ols estimator for them. This will mess with our bounds though. – Tylerr Oct 25 '21 at 20:26