Questions tagged [model-evaluation]

On evaluating models, either in-sample or out-of-sample.

In-sample model evaluation techniques can be based on measures of fit or loss-functions, but note that in-sample fit will typically increase spuriously as the model becomes more complex, which is called overfitting. For this reason, typically in-sample fit is penalized based on model complexity, like adjusted r-squared, aic or bic. AIC and BIC are also examples of information criteria, which can also be used in-sample.

Out-of-sample model evaluation usually relies on predictive accuracy and again on loss-functions. Distributional predictions can be evaluated using scoring-rules.

922 questions

190

votes

10 answers

Why is accuracy not the best measure for assessing classification models?

This is a general question that was asked indirectly multiple times in here, but it lacks a single authoritative answer. It would be great to have a detailed answer to this for the reference. Accuracy, the proportion of correct classifications among…

machine-learning model-evaluation accuracy scoring-rules faq

asked Nov 09 '17 at 07:32

Tim

108,699
20
212
390

votes

7 answers

Best PCA algorithm for huge number of features (>10K)?

I previously asked this on StackOverflow, but it seems like it might be more appropriate here, given that it didn't get any answers on SO. It's kind of at the intersection between statistics and programming. I need to write some code to do PCA…

pca algorithms model-evaluation high-dimensional

asked Sep 18 '10 at 02:08

dsimcha

7,375
7
32
29

votes

3 answers

How to select a clustering method? How to validate a cluster solution (to warrant the method choice)?

One of the biggest issue with cluster analysis is that we may happen to have to derive different conclusion when base on different clustering methods used (including different linkage methods in hierarchical clustering). I would like to know your…

clustering validation model-evaluation hierarchical-clustering

asked Feb 13 '16 at 23:19

Learner

votes

5 answers

Optimized implementations of the Random Forest algorithm

I have noticed that there are a few implementations of random forest such as ALGLIB, Waffles and some R packages like randomForest. Can anybody tell me whether these libraries are highly optimized? Are they basically equivalent to the random…

random-forest algorithms model-evaluation

asked Apr 26 '11 at 18:39

Henry B.

1,479
1
14
19

votes

1 answer

Cross-validation misuse (reporting performance for the best hyperparameter value)

Recently I have come across a paper that proposes using a k-NN classifier on an specific dataset. The authors used all the data samples available to perform k-fold cross validation for different k values and report cross validation results of the…

cross-validation references model-selection model-evaluation

asked Jul 18 '16 at 10:48

Daniel López

5,164
2
21
42

votes

3 answers

Classification/evaluation metrics for highly imbalanced data

I deal with a fraud detection (credit-scoring-like) problem. As such there is a highly imbalanced relation between fraudulent and non-fraudulent observations. http://blog.revolutionanalytics.com/2016/03/com_class_eval_metrics_r.html provides a great…

classification unbalanced-classes precision-recall cohens-kappa model-evaluation

asked Jul 07 '16 at 08:42

Georg Heiler

votes

3 answers

Can AUC-ROC be between 0-0.5?

Can AUC-ROC values be between 0-0.5? Does the model ever output values between 0 and 0.5?

roc model-evaluation auc

asked Mar 09 '17 at 09:19

Aman

votes

3 answers

Evaluating logistic regression and interpretation of Hosmer-Lemeshow Goodness of Fit

As we all know, there are 2 methods to evaluate the logistic regression model and they are testing very different things Predictive power: Get a statistic that measures how well you can predict the dependent variable based on the independent…

r logistic goodness-of-fit regression-strategies model-evaluation

asked Aug 31 '15 at 03:26

Samoth

votes

2 answers

Proper scoring rule when there is a decision to make (e.g. spam vs ham email)

Among others on here, Frank Harrell is adamant about using proper scoring rules to assess classifiers. This makes sense. If we have 500 $0$s with $P(1)\in[0.45, 0.49]$ and 500 $1$s with $P(1)\in[0.51, 0.55]$, we can get a perfect classifier by…

machine-learning classification model-evaluation accuracy scoring-rules

asked May 05 '20 at 13:30

Dave

28,473
4
52
104

votes

3 answers

AUC and class imbalance in training/test dataset

I just start to learn the Area under the ROC curve (AUC). I am told that AUC is not reflected by data imbalance. I think it means that AUC is insensitive to imbalance in test data, rather than imbalance in training data. In other words, only…

model-evaluation roc auc

asked Feb 06 '17 at 01:19

Munichong

1,645
3
15
26

votes

2 answers

Why use Normalized Gini Score instead of AUC as evaluation?

Kaggle's competition Porto Seguro's Safe Driver Prediction uses Normalized Gini Score as evaluation metric and this got me curious about the reasons for this choice. What are the advantages of using normalized gini score instead of the most usual…

classification auc model-evaluation gini

asked Oct 04 '17 at 17:27

xboard

1,008
11
17

votes

2 answers

What is the difference between $R^2$ and variance score in Scikit-learn?

I was reading about regression metrics in the python scikit-learn manual and even though each one of them has its own formula, I cannot tell intuitively what is the difference between $R^2$ and variance score and therefore when to use one or another…

regression variance scikit-learn r-squared model-evaluation

asked Apr 30 '16 at 14:12

hipoglucido

votes

4 answers

Why isn't the holdout method (splitting data into training and testing) used in classical statistics?

In my classroom exposure to data mining, the holdout method was introduced as a way of assessing model performance. However, when I took my first class on linear models, this was not introduced as a means of model validation or assessment. My online…

regression validation model-evaluation out-of-sample

asked Jan 29 '15 at 05:31

tirkquest

votes

3 answers

Relationship between the phi, Matthews and Pearson correlation coefficients

Are the phi and Matthews correlation coefficients the same concept? How are they related or equivalent to Pearson correlation coefficient for two binary variables? I assume the binary values are 0 and 1. The Pearson's correlation between two…

correlation contingency-tables bernoulli-distribution model-evaluation confusion-matrix

asked May 17 '13 at 23:26

Tim

votes

1 answer

How to Compute the Brier Score for more than Two Classes

tl;dr How do I correctly compute the Brier score for more than two classes? I got confusing results with different approaches. Details below. As suggested to me in a comment to this question, I would like to evaluate the quality of a set of…

classification scikit-learn model-evaluation scoring-rules

asked Apr 17 '19 at 10:42

lo tolmencre

2 3

…

61 62 Next