Questions tagged [classification]

Statistical classification is the problem of identifying the sub-population to which new observations belong, where the identity of the sub-population is unknown, on the basis of a training set of data containing observations whose sub-population is known. Therefore these classifications will show a variable behavior which can be studied by statistics.

Statistical classification is the problem of identifying the sub-population to which new observations belong, where the identity of the sub-population is unknown, on the basis of a training set of data containing observations whose sub-population is known. Therefore these classifications will show a variable behavior which can be studied by statistics.

-- Wikipedia at https://en.wikipedia.org/wiki/Statistical_classification

6303 questions
275
votes
6 answers

What does AUC stand for and what is it?

Searched high and low and have not been able to find out what AUC, as in related to prediction, stands for or means.
josh
  • 3,119
  • 4
  • 12
  • 14
173
votes
4 answers

Choice of K in K-fold cross-validation

I've been using the $K$-fold cross-validation a few times now to evaluate performance of some learning algorithms, but I've always been puzzled as to how I should choose the value of $K$. I've often seen and used a value of $K = 10$, but this seems…
Charles Menguy
  • 2,277
  • 4
  • 15
  • 16
169
votes
4 answers

Cohen's kappa in plain English

I am reading a data mining book and it mentioned the Kappa statistic as a means for evaluating the prediction performance of classifiers. However, I just can't understand this. I also checked Wikipedia but it didn't help too:…
Jack Twain
  • 7,781
  • 14
  • 48
  • 74
128
votes
5 answers

How does a Support Vector Machine (SVM) work?

How does a Support Vector Machine (SVM) work, and what differentiates it from other linear classifiers, such as the Linear Perceptron, Linear Discriminant Analysis, or Logistic Regression? * (* I'm thinking in terms of the underlying motivations for…
tdc
  • 7,289
  • 5
  • 32
  • 62
120
votes
6 answers

Why are neural networks becoming deeper, but not wider?

In recent years, convolutional neural networks (or perhaps deep neural networks in general) have become deeper and deeper, with state-of-the-art networks going from 7 layers (AlexNet) to 1000 layers (Residual Nets) in the space of 4 years. The…
110
votes
4 answers

How do you calculate precision and recall for multiclass classification using confusion matrix?

I wonder how to compute precision and recall using a confusion matrix for a multi-class classification problem. Specifically, an observation can only be assigned to its most probable class / label. I would like to compute: Precision = TP / (TP+FP)…
daiyue
  • 1,203
  • 2
  • 9
  • 7
109
votes
4 answers

Softmax vs Sigmoid function in Logistic classifier?

What decides the choice of function ( Softmax vs Sigmoid ) in a Logistic classifier ? Suppose there are 4 output classes . Each of the above function gives the probabilities of each class being the correct output . So which one to take for a…
mach
  • 1,545
  • 3
  • 10
  • 12
102
votes
4 answers

Why isn't Logistic Regression called Logistic Classification?

Since Logistic Regression is a statistical classification model dealing with categorical dependent variables, why isn't it called Logistic Classification? Shouldn't the "Regression" name be reserved to models dealing with continuous dependent…
95
votes
5 answers

How to calculate Area Under the Curve (AUC), or the c-statistic, by hand

I am interested in calculating area under the curve (AUC), or the c-statistic, by hand for a binary logistic regression model. For example, in the validation dataset, I have the true value for the dependent variable, retention (1 = retained; 0 = not…
Matt Reichenbach
  • 3,404
  • 6
  • 25
  • 43
95
votes
6 answers

What is the difference between Multiclass and Multilabel Problem

What is the difference between a multiclass problem and a multilabel problem?
Learner
  • 4,007
  • 11
  • 37
  • 39
92
votes
8 answers

How to compute precision/recall for multiclass-multilabel classification?

I'm wondering how to calculate precision and recall measures for multiclass multilabel classification, i.e. classification where there are more than two labels, and where each instance can have multiple labels?
Vam
  • 1,245
  • 1
  • 10
  • 9
90
votes
6 answers

Feature selection for "final" model when performing cross-validation in machine learning

I am getting a bit confused about feature selection and machine learning and I was wondering if you could help me out. I have a microarray dataset that is classified into two groups and has 1000s of features. My aim is to get a small number of…
89
votes
4 answers

How to produce a pretty plot of the results of k-means cluster analysis?

I'm using R to do K-means clustering. I'm using 14 variables to run K-means What is a pretty way to plot the results of K-means? Are there any existing implementations? Does having 14 variables complicate plotting the results? I found something…
89
votes
5 answers

How to plot ROC curves in multiclass classification?

In other words, instead of having a two class problem I am dealing with 4 classes and still would like to assess performance using AUC.
CLOCK
88
votes
8 answers

When is unbalanced data really a problem in Machine Learning?

We already had multiple questions about unbalanced data when using logistic regression, SVM, decision trees, bagging and a number of other similar questions, what makes it a very popular topic! Unfortunately, each of the questions seems to be…
Tim
  • 108,699
  • 20
  • 212
  • 390
1
2 3
99 100