1

I am using a random forest regression to model a count of species from a number of different survey areas. Each survey area has a different size.

My question is how to model the response variable, to account for these scaling effects? Should I scale the Count prior to fitting the model?

Count <- Count / Survey_area_km2
model < fit(Count ~ pred1 + pred2 + ...)

And then when predicting over a set of survey areas, adjust the response variable prediction by the size of the survey areas I am trying to predict. This will give a count that is in proportion to each survey area?

Count_predictions <- predict(model, prediction_areas)
Rescaled_Counts <- Count_predictions * prediction_areas
kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
Anthony W
  • 205
  • 2
  • 6
  • 1
    The usual model in your situation is a Poisson rate regression (search this site!), that is, log link function and log area as offset. See also https://stats.stackexchange.com/questions/50786/predicting-count-data-with-random-forest – kjetil b halvorsen Jul 18 '20 at 18:24
  • ... and [this stored google search](https://www.google.com/search?safe=off&sxsrf=ALeKk03I0EFwUALRm_NiGOzFDDKMW4qK1w%3A1595096460416&ei=jD0TX9KBGfqj5OUPtLqk-AM&q=poisson+regression+with+random+forrest&oq=poisson+regression+with+random+forrest&gs_lcp=CgZwc3ktYWIQAzIHCCMQsAIQJzIGCAAQFhAeUABYAGD7hAFoAHAAeACAAWGIAWGSAQExmAEAqgEHZ3dzLXdpeg&sclient=psy-ab&ved=0ahUKEwjS956vtdfqAhX6EbkGHTQdCT8Q4dUDCAs&uact=5) – kjetil b halvorsen Jul 18 '20 at 18:25
  • 1
    Thanks. This is helpful. I hadn't realised there was a function `gbm.fit` that allows specification for an offset - it doesn't seem feasible if just using `gbm`. So when I use this the model fit is calculated on the log link scale? Therefore I should define it as `model – Anthony W Jul 19 '20 at 10:50
  • You can find more information [here](https://www.google.com/search?q=R+gbm+with+poisson+regression+and+log+link+site:stats.stackexchange.com&safe=off&sxsrf=ALeKk03Nk-i0zCJ7zSVluj_PqiVY8PEvcg:1595177168373&sa=X&ved=2ahUKEwiL3uWD4tnqAhXbDrkGHblVDJIQrQIoBHoECAYQBQ&biw=1279&bih=615). – kjetil b halvorsen Jul 19 '20 at 16:54

0 Answers0