Questions tagged [predictive-models]

Predictive models are statistical models whose primary purpose is to predict other observations of a system optimally, as opposed to models whose purpose is to test a particular hypothesis or explain a phenomenon mechanistically. As such, predictive models place less emphasis on interpretability and more emphasis on performance.

Wikipedia has articles https://en.wikipedia.org/wiki/Predictive_modelling and https://en.wikipedia.org/wiki/Predictive_analytics with further references.

2756 questions

130

votes

4 answers

Differences between cross validation and bootstrapping to estimate the prediction error

I would like your thoughts about the differences between cross validation and bootstrapping to estimate the prediction error. Does one work better for small dataset sizes or large datasets?

cross-validation predictive-models bootstrap

asked Nov 14 '11 at 14:57

grant

1,491
2
11
10

119

votes

6 answers

Difference between confidence intervals and prediction intervals

For a prediction interval in linear regression you still use $\hat{E}[Y|x] = \hat{\beta_0}+\hat{\beta}_{1}x$ to generate the interval. You also use this to generate a confidence interval of $E[Y|x_0]$. What's the difference between the two?

regression confidence-interval predictive-models prediction-interval

asked Oct 04 '11 at 18:35

question

1,357
4
10
8

107

votes

15 answers

US Election results 2016: What went wrong with prediction models?

First it was Brexit, now the US election. Many model predictions were off by a wide margin, and are there lessons to be learned here? As late as 4 pm PST yesterday, the betting markets were still favoring Hillary 4 to 1. I take it that the betting…

predictive-models ensemble-learning confounding

asked Nov 09 '16 at 18:08

horaceT

3,162
3
15
19

votes

8 answers

When is unbalanced data really a problem in Machine Learning?

We already had multiple questions about unbalanced data when using logistic regression, SVM, decision trees, bagging and a number of other similar questions, what makes it a very popular topic! Unfortunately, each of the questions seems to be…

machine-learning classification predictive-models unbalanced-classes

asked Jun 02 '17 at 12:08

Tim

108,699
20
212
390

votes

16 answers

Practical thoughts on explanatory vs. predictive modeling

Back in April, I attended a talk at the UMD Math Department Statistics group seminar series called "To Explain or To Predict?". The talk was given by Prof. Galit Shmueli who teaches at UMD's Smith Business School. Her talk was based on research she…

predictive-models

asked Aug 03 '10 at 20:19

wahalulu

votes

9 answers

How can I help ensure testing data does not leak into training data?

Suppose we have someone building a predictive model, but that someone is not necessarily well-versed in proper statistical or machine learning principles. Maybe we are helping that person as they are learning, or maybe that person is using some…

machine-learning classification predictive-models cross-validation out-of-sample

asked Dec 19 '11 at 22:49

Michael McGowan

4,561
3
31
46

votes

3 answers

Variables are often adjusted (e.g. standardised) before making a model - when is this a good idea, and when is it a bad one?

In what circumstances would you want to, or not want to scale or standardize a variable prior to model fitting? And what are the advantages / disadvantages of scaling a variable?

modeling predictive-models feature-selection mathematical-statistics standardization

asked Dec 01 '11 at 16:29

Andrew

5,478
5
21
21

votes

5 answers

Is adjusting p-values in a multiple regression for multiple comparisons a good idea?

Lets assume you are a social science researcher/econometrician trying to find relevant predictors of demand for a service. You have 2 outcome/dependent variables describing the demand (using the service yes/no, and the number of occasions). You have…

regression multivariate-analysis predictive-models multiple-regression multiple-comparisons

asked Sep 30 '10 at 14:07

Mikael M

votes

6 answers

Alternatives to logistic regression in R

I would like as many algorithms that perform the same task as logistic regression. That is algorithms/models that can give a prediction to a binary response (Y) with some explanatory variable (X). I would be glad if after you name the algorithm,…

r regression logistic classification predictive-models

asked Aug 31 '10 at 10:02

Tal Galili

19,935
32
133
195

votes

3 answers

What is the root cause of the class imbalance problem?

I've been thinking a lot about the "class imbalance problem" in machine/statistical learning lately, and am drawing ever deeper into a feeling that I just don't understand what is going on. First let me define (or attempt to) define my terms: The…

classification predictive-models unbalanced-classes scoring-rules

asked Nov 25 '16 at 19:02

Matthew Drury

33,314
2
101
132

votes

3 answers

whether to rescale indicator / binary / dummy predictors for LASSO

For the LASSO (and other model selecting procedures) it is crucial to rescale the predictors. The general recommendation I follow is simply to use a 0 mean, 1 standard deviation normalization for continuous variables. But what is there to do with…

predictive-models model-selection lasso normalization standardization

asked Sep 09 '13 at 14:46

László

votes

1 answer

Manually calculated $R^2$ doesn't match up with randomForest() $R^2$ for testing new data

I know this is a fairly specific R question, but I may be thinking about proportion variance explained, $R^2$, incorrectly. Here goes. I'm trying to use the R package randomForest. I have some training data and testing data. When I fit a random…

r correlation predictive-models random-forest r-squared

asked Feb 18 '11 at 02:32

Stephen Turner

4,183
8
27
33

votes

2 answers

Mean absolute percentage error (MAPE) in Scikit-learn

How can we calculate the Mean absolute percentage error (MAPE) of our predictions using Python and scikit-learn? From the docs, we have only these 4 metric functions for Regressions: metrics.explained_variance_score(y_true,…

predictive-models python scikit-learn mape

asked May 07 '13 at 16:52

Nyxynyx

votes

3 answers

Variance of $K$-fold cross-validation estimates as $f(K)$: what is the role of "stability"?

TL,DR: It appears that, contrary to oft-repeated advice, leave-one-out cross validation (LOO-CV) -- that is, $K$-fold CV with $K$ (the number of folds) equal to $N$ (the number of training observations) -- yields estimates of the generalization…

regression machine-learning variance cross-validation predictive-models

asked May 20 '17 at 01:11

Jake Westfall

11,539
2
48
96

votes

1 answer

When and how to use standardized explanatory variables in linear regression

I have 2 simple questions about linear regression: When is it advised to standardize the explanatory variables? Once estimation is carried out with standardized values, how can one predict with new values (how one should standardize the new…

regression predictive-models references standardization predictor

asked Feb 11 '11 at 23:09

teucer

1,801
2
16
29

2 3

…

99 100 Next