How to deal with a mix of binary and continuous inputs in neural networks?

Question

I'm using the nnet package in R to attempt to build an ANN to predict real estate prices for condos (personal project). I am new to this and don't have a math background so please bare with me.

I have input variables that are both binary and continuous. For example some binary variables which were originally yes/no were converted to 1/0 for the neural net. Other variables are continuous like Sqft.

Sample of input data

I have normalized all values to be on a 0-1 scale. Maybe Bedrooms and Bathrooms shouldn't be normalized since their range is only 0-4?

Do these mixed inputs present a problem for the ANN? I've gotten okay results, but upon closer examination the weights the ANN has chosen for certain variables don't seem to make sense. My code is below, any suggestions?

ANN <- nnet(Price ~ Sqft + Bedrooms + Bathrooms + Parking2 + Elevator + 
            Central.AC + Terrace + Washer.Dryer + Doorman + Exercise.Room + 
            New.York.View,data[1:700,], size=3, maxit=5000, linout=TRUE, decay=.0001)

UPDATE: Based on the comments below regarding breaking out the binary inputs into separate fields for each value class, my code now looks like:

ANN <- nnet(Price ~ Sqft + Studio + X1BR + X2BR + X3BR + X4BR + X1Bath
        + X2Bath + X3Bath + X4bath + Parking.Yes + Parking.No + Elevator.Yes + Elevator.No 
        + Central.AC.Yes + Central.AC.No + Terrace.Yes + Terrace.No + Washer.Dryer.Yes 
        + Washer.Dryer.No + Doorman.Yes + Doorman.No + Exercise.Room.Yes + Exercise.Room.No 
        + New.York.View.Yes + New.York.View.No + Healtch.Club.Yes + Health.Club.No,
    data[1:700,], size=12, maxit=50000, decay=.0001)

The hidden nodes in the above code are 12, but I've tried a range of hidden nodes from 3 to 25 and all give worse results than the original parameters I had above in the original code posted. I've also tried it with linear output = true/false.

My guess is that I need to feed the data to nnet in a different way because it's not interpreting the binary input properly. Either that, or I need to give it different parameters.

Any ideas?

The standard way of using binary or categorical data as neural network inputs is to expand the field to indicator vectors. For instance, if you had a field that could take values 1,2, or 3, then a 1 would be expanded to [1,0,0], 2->[0,1,0], and 3->[0,0,1]. Real valued input is generally kept as-is. — user1149913, Jul 26 '12 at 03:36
Now that you mention this, I do seem to recall reading this somewhere during my search for an answer. So since the information source is on a csv file, I actually need to add columns to accomodate the new fields for each binary input? For instance if the bedroom input ranges from 0-4, using your example above I'd create 4 additional columns (total of 5 since '0' bedrooms means studio) and a 3BR condo would be expressed as 0,0,0,1,0? — ChrisArmstrong, Jul 26 '12 at 13:37

shadowtalker · Answer 1 · 2015-02-19T16:32:07.557

One way to handle this situation is to rescale the inputs so that their variances are on roughly the same scale. This advice is generally given for regression modeling, but it really applies to all modeling situations that involve variables measured on different scales. This is because the variance of a binary variable is often quite different from the variance of a continuous variable. Gelman and Hill (2006) recommend rescaling continuous inputs by two standard deviations to obtain parity with (un-scaled) binary inputs. This recommendation is also reflected in a paper and blog post.

A more specific recommendation for neural networks is to use "effect coding" for binary inputs (that is, -1 and 1) instead of "dummy coding" (0 and 1), and to take the additional step of centering continuous variables. These recommendations come from an extensive FAQ by Warren Sarle, in particular the sections "Why not code binary inputs as 0 and 1?" and "Should I standardize the input variables?" The gist, though, is the same:

The contribution of an input will depend heavily on its variability relative to other inputs.

As for unordered categorical variables -- you must break them out into binary indicators. They simply are not meaningful otherwise.

But see also https://stats.stackexchange.com/questions/398779/linear-regression-and-high-dimensional-categorical-data/414917#414917 and https://stats.stackexchange.com/questions/231285/dropping-one-of-the-columns-when-using-one-hot-encoding/329281#329281 — kjetil b halvorsen, Nov 24 '19 at 17:45

How to deal with a mix of binary and continuous inputs in neural networks?

1 Answers1

Linked