Questions tagged [binary-data]

A binary variable takes one of two values, typically coded as "0" and "1".

In a broader sense "binary variable" is a synonym of "dichotomous variable": any variable that can take on only one of two values. In a narrower sense it refers to dichotomous data coded as "1" or "0". (Sometimes "1" is supposed to mean "is present" and "0" to mean "is absent", which may require handling the two values asymmetrically in some statistical analyses (see e.g. Jaccard indices).)

A binary response variable occurs as a result of Bernoulli trials, whose analysis commonly involves contingency tables or logistic/probit regression.

The term 'binary' also refers to data stored as machine-readable binary numbers rather than numbers recorded in strings of ASCII (or Unicode, or other human-readable) numerals.

In econometrics binary variables are also called dummy variables.

1249 questions

votes

4 answers

Reduce Classification Probability Threshold

I have a question regarding classification in general. Let $f$ be a classifier, which outputs a set of probabilities given some data D. Normally, one would say: well, if $P(c|D) > 0.5$, we will assign a class 1, otherwise 0 (let this be a binary…

asked Nov 06 '17 at 07:10

sdgaw erzswer

1,199
1
9
13

votes

5 answers

Is it meaningful to calculate Pearson or Spearman correlation between two Boolean vectors?

There are two Boolean vectors, which contain 0 and 1 only. If I calculate the Pearson or Spearman correlation, are they meaningful or reasonable?

correlation binary-data pearson-r spearman-rho

asked Jun 18 '14 at 07:52

Zhilong Jia

votes

7 answers

Binary classification with strongly unbalanced classes

I have a data set in the form of (features, binary output 0 or 1), but 1 happens pretty rarely, so just by always predicting 0, I get accuracy between 70% and 90% (depending on the particular data I look at). The ML methods give me about the same…

machine-learning classification binary-data unbalanced-classes

asked Sep 19 '16 at 18:39

LazyCat

votes

10 answers

Measuring entropy/ information/ patterns of a 2d binary matrix

I want to measure the entropy/ information density/ pattern-likeness of a two-dimensional binary matrix. Let me show some pictures for clarification: This display should have a rather high entropy: A) This should have medium entropy: B) These…

algorithms binary-data entropy pattern-recognition information-theory

asked Oct 17 '11 at 12:39

Felix S

4,432
4
26
34

votes

4 answers

Would PCA work for boolean (binary) data types?

I want to reduce the dimensionality of higher order systems and capture most of the covariance on a preferably 2 dimensional or 1 dimensional field. I understand this can be done via principal component analysis, and I have used PCA in many…

pca data-visualization binary-data dimensionality-reduction correspondence-analysis

asked Jul 02 '15 at 21:20

Alvin Nunez

votes

5 answers

Should you ever standardise binary variables?

I have a data set with a set of features. Some of them are binary $(1=$ active or fired, $0=$ inactive or dormant), and the rest are real valued, e.g. $4564.342$. I want to feed this data to a machine learning algorithm, so I $z$-score all the…

machine-learning normalization binary-data

asked May 18 '13 at 16:57

siamii

1,767
5
21
29

votes

1 answer

Doing principal component analysis or factor analysis on binary data

I have a dataset with a large number of Yes/No responses. Can I use principal components (PCA) or any other data reduction analyses (such as factor analysis) for this type of data? Please advise how I go about doing this using SPSS.

spss categorical-data pca factor-analysis binary-data

asked Oct 01 '11 at 18:39

Cathy

votes

2 answers

How to use both binary and continuous variables together in clustering?

I need to use binary variables (values 0 & 1) in k-means. But k-means only works with continuous variables. I know some people still use these binary variables in k-means ignoring the fact that k-means is only designed for continuous variables. This…

r clustering binary-data k-means mixed-type-data

asked Jan 02 '15 at 14:55

GeorgeOfTheRF

5,063
14
42
51

votes

1 answer

Is there Factor analysis or PCA for ordinal or binary data?

I have completed the principal component analysis (PCA), exploratory factor analysis (EFA), and confirmatory factor analysis (CFA), treating data with likert scale (5-level responses: none, a little, some,..) as a continuous variable. Then, using…

pca factor-analysis ordinal-data binary-data likert

asked May 30 '16 at 15:41

user116948

votes

2 answers

Clustering a binary matrix

I have a semi-small matrix of binary features of dimension 250k x 100. Each row is a user and the columns are binary "tags" of some user behavior e.g. "likes_cats". user 1 2 3 4 5 ... ------------------------- A 1 0 1 0 1 B …

r clustering binary-data

asked Feb 12 '14 at 09:48

wije

votes

7 answers

Why is gender typically coded 0/1 rather than 1/2, for example?

I understand the logic of coding for data analysis. My question below is on the use of a specific code. Is there a reason why gender is often coded as 0 for female and 1 for male? Why is this coding considered 'standard'? Compare this with Female…

data-transformation binary-data categorical-encoding units

asked Oct 07 '11 at 19:46

Adhesh Josh

2,935
16
50
67

votes

2 answers

Similarity Coefficients for binary data: Why choose Jaccard over Russell and Rao?

From Encyclopedia of Statistical Sciences I understand that given $p$ dichotomous (binary: 1=present; 0=absent) attributes (variables), we can form a contingency table for any two objects i and j of a sample: j 1 0 ------- …

binary-data similarities association-measure

asked Jun 13 '13 at 21:24

wflynny

votes

3 answers

Visualizing the calibration of predicted probability of a model

Suppose I have a predictive model that produces, for each instance, a probability for each class. Now I recognize that there are many ways to evaluate such a model if I want to use those probabilities for classification (precision, recall, etc.). …

data-visualization classification predictive-models binary-data calibration

asked Mar 29 '12 at 14:52

Michael McGowan

4,561
3
31
46

votes

3 answers

Generate random correlated data between a binary and a continuous variable

I want to generate two variables. One is binary outcome variable (say success / failure) and the other is age in years. I want age to be positively correlated with success. For example there should be more successes in the higher age segments than…

correlation random-variable random-generation binary-data

asked Jul 10 '11 at 08:25

user333

6,621
17
44
54

votes

2 answers

optimizing auc vs logloss in binary classification problems

I am performing a binary classification task where the outcome probability is fair low (aroung 3%). I am trying to decide whether to optimize by AUC or log-loss. As much as I have understood, AUC maximizes the model's ability to discriminate between…

classification binary-data auc log-loss

asked Sep 15 '16 at 07:49

Giorgio Spedicato

3,444
4
29
39

2 3

…

83 84 Next