I'm trying to run a logistic regression in R to determine what independent variables may determine if a sea turtle becomes entangled in fishing net or not. My independent variables vary significantly from each other in both scale and class e.g. Mesh size (7mm-1500mm), Twine diameter(0.33-4mm) Colour (red,blue green etc.) Construction (Multi or Mono). Must I first convert all independent variables to a similar scale to run a glm command. If so how do I standardise factors such as Colour and Construction? Also is it necessary to produce a testing dataset and a training data set as I see some people do this and others incorporate the entire model in the glm
command ?
Asked
Active
Viewed 94 times
0

kjetil b halvorsen
- 63,378
- 26
- 142
- 467

Martin
- 79
- 1
- 11
-
Your main question is largely a duplicate of [Can I use multiple regression when I have mixed categorical and continuous predictors?](http://stats.stackexchange.com/q/6353/22228) (although you are performing logistic regression, the answer is the same). – Silverfish Sep 17 '16 at 14:10
-
The answer is probably no and no. But read [this post](http://stats.stackexchange.com/a/29782/67822) and [this post](http://stats.stackexchange.com/a/124477/67822). Note Prof. Harrell' [comment at the bottom of the question](http://stats.stackexchange.com/q/48360/67822). My understanding is that separating testing and training is done in the context of machine learning - you want to see your out of sample error with the idea not of identifying relationships between iv and dv, but rather to make predictions on future data. – Antoni Parellada Sep 17 '16 at 14:11