30

I'm using caret to run a cross validated random forest over a dataset. The Y variable is a factor. There are no NaN's, Inf's, or NA's in my dataset. However when running the random forest, I get

Error in randomForest.default(m, y, ...) : 
  NA/NaN/Inf in foreign function call (arg 1)
In addition: There were 28 warnings (use warnings() to see them)
Warning messages:
1: In data.matrix(x) : NAs introduced by coercion
2: In data.matrix(x) : NAs introduced by coercion
3: In data.matrix(x) : NAs introduced by coercion
4: In data.matrix(x) : NAs introduced by coercion

Does anyone have ideas as to if this error is caused by the NA's introduced by coercion? If so, how can i prevent such coercion?

Info5ek
  • 1,051
  • 3
  • 11
  • 21

3 Answers3

42

There must be some features in your training set with class 'char' .

Please check this

> a <- c("1", "2",letters[1:5], "3")
> as.numeric(a)
[1]  1  2 NA NA NA NA NA  3
Warning message:
NAs introduced by coercion 
Pankaj Sharma
  • 885
  • 1
  • 8
  • 14
  • 2
    Just to add- if the feature is actually categorical it can still be included by converting it to a factor, e.g.. blah – P.Windridge Aug 17 '17 at 09:52
17

Probably the cause is you have some character variables in your data frame.

Convert all character variable into factor in one line:

library(dplyr) data_fac=data_char %>% mutate_if(is.character, as.factor)

Pablo Casas
  • 548
  • 6
  • 9
3

As shown in the warning there were 28 errors which happened to be the number of columns with character datatypes ("chr"). Forcing these columns to factors allowed for the run to commence.

Info5ek
  • 1,051
  • 3
  • 11
  • 21