Questions tagged [modeling]

This tag describes the process of creating a statistical or machine learning model. Always add a more specific tag.

A statistical model is a formalization of relationships between variables in the form of mathematical equations. A statistical model describes how one or more random variables are related to one or more random variables. The model is statistical as the variables are not deterministically but stochastically related.

2392 questions
265
votes
13 answers

Is there any reason to prefer the AIC or BIC over the other?

The AIC and BIC are both methods of assessing model fit penalized for the number of estimated parameters. As I understand it, BIC penalizes models more for free parameters than does AIC. Beyond a preference based on the stringency of the criteria,…
russellpierce
  • 17,079
  • 16
  • 67
  • 98
254
votes
3 answers

How to know that your machine learning problem is hopeless?

Imagine a standard machine-learning scenario: You are confronted with a large multivariate dataset and you have a pretty blurry understanding of it. What you need to do is to make predictions about some variable based on what you have. As…
Tim
  • 108,699
  • 20
  • 212
  • 390
111
votes
5 answers

Using k-fold cross-validation for time-series model selection

Question: I want to be sure of something, is the use of k-fold cross-validation with time series is straightforward, or does one need to pay special attention before using it? Background: I'm modeling a time series of 6 year (with semi-markov…
Mickaël S
  • 1,258
  • 3
  • 10
  • 6
101
votes
18 answers

Including the interaction but not the main effects in a model

Is it ever valid to include a two-way interaction in a model without including the main effects? What if your hypothesis is only about the interaction, do you still need to include the main effects?
Glen
  • 6,320
  • 4
  • 37
  • 59
98
votes
3 answers

Can someone explain Gibbs sampling in very simple words?

I'm doing some reading on topic modeling (with Latent Dirichlet Allocation) which makes use of Gibbs sampling. As a newbie in statistics―well, I know things like binomials, multinomials, priors, etc.―,I find it difficult to grasp how Gibbs sampling…
Thea
  • 983
  • 1
  • 7
  • 4
88
votes
24 answers

Rules of thumb for "modern" statistics

I like G van Belle's book on Statistical Rules of Thumb, and to a lesser extent Common Errors in Statistics (and How to Avoid Them) from Phillip I Good and James W. Hardin. They address common pitfalls when interpreting results from experimental and…
chl
  • 50,972
  • 18
  • 205
  • 364
87
votes
11 answers

Why should I be Bayesian when my model is wrong?

Edits: I have added a simple example: inference of the mean of the $X_i$. I have also slightly clarified why the credible intervals not matching confidence intervals is bad. I, a fairly devout Bayesian, am in the middle of a crisis of faith of…
Guillaume Dehaene
  • 2,137
  • 1
  • 10
  • 18
85
votes
14 answers

What is the meaning of "All models are wrong, but some are useful"

"Essentially, all models are wrong, but some are useful." --- Box, George E. P.; Norman R. Draper (1987). Empirical Model-Building and Response Surfaces, p. 424, Wiley. ISBN 0471810339. What exactly is the meaning of the above phrase?
gpuguy
  • 1,063
  • 3
  • 10
  • 10
77
votes
6 answers

Variable selection for predictive modeling really needed in 2016?

This question has been asked on CV some yrs ago, it seems worth a repost in light of 1) order of magnitude better computing technology (e.g. parallel computing, HPC etc) and 2) newer techniques, e.g. [3]. First, some context. Let's assume the goal…
76
votes
4 answers

Why does including latitude and longitude in a GAM account for spatial autocorrelation?

I have produced generalized additive models for deforestation. To account for spatial-autocorrelation, I have included latitude and longitude as a smoothed, interaction term (i.e. s(x,y)). I've based this on reading many papers where the authors say…
gisol
  • 943
  • 1
  • 8
  • 10
74
votes
6 answers

Model for predicting number of Youtube views of Gangnam Style

PSY's music video "Gangnam style" is popular, after a little more than 2 months it has about 540 million viewers. I learned this from my preteen children at dinner last week and soon the discussion went in the direction of if it was possible to do…
FredrikD
  • 843
  • 7
  • 15
72
votes
7 answers

Do all interactions terms need their individual terms in regression model?

I am actually reviewing a manuscript where the authors compare 5-6 logit regression models with AIC. However, some of the models have interaction terms without including the individual covariate terms. Does it ever make sense to do this? For example…
djhocking
  • 1,701
  • 3
  • 17
  • 21
69
votes
7 answers

What is a "saturated" model?

What is meant when we say we have a saturated model?
Graham Cookson
  • 7,543
  • 6
  • 41
  • 35
67
votes
3 answers

Variables are often adjusted (e.g. standardised) before making a model - when is this a good idea, and when is it a bad one?

In what circumstances would you want to, or not want to scale or standardize a variable prior to model fitting? And what are the advantages / disadvantages of scaling a variable?
64
votes
4 answers

What is so cool about de Finetti's representation theorem?

From Theory of Statistics by Mark J. Schervish (page 12): Although DeFinetti's representation theorem 1.49 is central to motivating parametric models, it is not actually used in their implementation. How is the theorem central to parametric…
gui11aume
  • 13,383
  • 2
  • 44
  • 89
1
2 3
99 100