Binary representation: -1 vs 0

Question

I commonly see {0,1} used in representing pass/fail data, but I recently found a case where I was given a data set with {-1,1} to perform some pre-processing. So, I'm wondering whether I should keep {-1,1} for some downstream purposes.

Question: When dealing with binary data, what are some advantages/disadvantages of using {0,1} vs {-1,1}? Will this affect my result when performing correlation or applying machine learning algorithms e.g. scipy/scikit/tensorflow?

Demetri Pananos · Accepted Answer · 2020-02-18T05:54:41.053

If you have a binary variable coded as -1/1, this has the benefit of making the linear combination of coefficients for that category a contrast. That may be nice for statistical models, but for machine learning my opinion is that 0/1 is better.

Not only does 0/1 make interpretation of means and variances easier, but it also makes interpreting conditions of the data (e.g ad was present or absent) easier as well as automatically being a min/max scaling which some methods (like neural nets) seem to benefit from.

All in all, I would prefer 0/1 over -1/1.

Haitao Du · Answer 2 · 2020-02-18T07:27:50.720

I would recommend you to read this post (if not duplicate question)

Why there are two different logistic loss formulation / notations?

The highlight is that

Using 0, 1 representation is more natural and can be easily related to probability.
Using -1, +1 is more concise for some cases (such as hinge loss or zero one loss). The reason it is concise is because we are getting a +1, if we are making right predictions (the product of ground truth predicted value are positive) and -1 if we are getting a wrong prediction.

Binary representation: -1 vs 0

2 Answers2