Most Popular

1500 questions
41
votes
4 answers

What is an instrumental variable?

Instrumental variables are becoming increasingly common in applied economics and statistics. For the uninitiated, can we have some non-technical answers to the following questions: What is an instrumental variable? When would one want to employ an…
Graham Cookson
  • 7,543
  • 6
  • 41
  • 35
41
votes
9 answers

How can I efficiently model the sum of Bernoulli random variables?

I am modeling a random variable ($Y$) which is the sum of some ~15-40k independent Bernoulli random variables ($X_i$), each with a different success probability ($p_i$). Formally, $Y=\sum X_i$ where $\Pr(X_i=1)=p_i$ and $\Pr(X_i=0)=1-p_i$. I am…
41
votes
5 answers

K-fold vs. Monte Carlo cross-validation

I am trying to learn various cross validation methods, primarily with intention to apply to supervised multivariate analysis techniques. Two I have come across are K-fold and Monte Carlo cross-validation techniques. I have read that K-fold is a…
Liam
  • 553
  • 1
  • 5
  • 6
41
votes
2 answers

Estimate quantile of value in a vector

I have a set of real numbers. I need to estimate the quantile of a new number. Is there any clean way to do this in R? in general? I hope this is not ultra-trivial ;-) Much appreciated for your response. PK
polarise
  • 543
  • 1
  • 4
  • 7
41
votes
3 answers

Interpretation of p-value in hypothesis testing

I recently came across the paper "The Insignificance of Null Hypothesis Significance Testing", Jeff Gill (1999). The author raised a few common misconceptions regarding hypothesis testing and p-values, about which I have two specific questions: The…
user13587
41
votes
7 answers

Relationship between Binomial and Beta distributions

I'm more of a programmer than a statistician, so I hope this question isn't too naive. It happens in sampling program executions at random times. If I take N=10 random-time samples of the program's state, I could see function Foo being executed on,…
41
votes
1 answer

Quantile regression: Which standard errors?

The summary.rq function from the quantreg vignette provides a multitude of choices for standard error estimates of quantile regression coefficients. What are the special scenarios where each of these becomes optimal/desirable? "rank" which produces…
Jase
  • 1,904
  • 3
  • 20
  • 33
41
votes
1 answer

Intuition behind tensor product interactions in GAMs (MGCV package in R)

Generalized additive models are those where $$ y = \alpha + f_1(x_1) + f_2(x_2) + e_i $$ for example. the functions are smooth, and to be estimated. Usually by penalized splines. MGCV is a package in R that does so, and the author (Simon Wood)…
generic_user
  • 11,981
  • 8
  • 40
  • 63
41
votes
11 answers

Are there any good popular science book about statistics or machine learning?

There a bunch of really good popular science books around, that deal with real science, as well as the history and reasons behind current theories, while remaining extremely enjoyable to read. For example, "Chaos" by James Gleick (chaos, fractals,…
naught101
  • 4,973
  • 1
  • 51
  • 85
41
votes
2 answers

Evidence for man-made global warming hits 'gold standard': how did they do this?

This message in a Reuter's article from 25.02.2019 is currently all over the news: Evidence for man-made global warming hits 'gold standard' [Scientists] said confidence that human activities were raising the heat at the Earth’s surface had reached…
Sextus Empiricus
  • 43,080
  • 1
  • 72
  • 161
41
votes
3 answers

How to present results of a Lasso using glmnet?

I would like to find predictors for a continuous dependent variable out of a set of 30 independent variables. I am using Lasso regression as implemented in the glmnet package in R. Here is some dummy code: # generate a dummy dataset with 30…
jokel
  • 2,403
  • 4
  • 32
  • 40
41
votes
3 answers

Why is t-SNE not used as a dimensionality reduction technique for clustering or classification?

In a recent assignment, we were told to use PCA on the MNIST digits to reduce the dimensions from 64 (8 x 8 images) to 2. We then had to cluster the digits using a Gaussian Mixture Model. PCA using only 2 principal components does not yield distinct…
willk
  • 583
  • 1
  • 7
  • 12
41
votes
1 answer

How to determine significant principal components using bootstrapping or Monte Carlo approach?

I am interested in determining the number of significant patterns coming out of a Principal Component Analysis (PCA) or Empirical Orthogonal Function (EOF) Analysis. I am particularly interested in applying this method to climate data. The data…
Marc in the box
  • 3,532
  • 3
  • 33
  • 47
41
votes
1 answer

Existence of the moment generating function and variance

Can a distribution with finite mean and infinite variance have a moment generating function? What about a distribution with finite mean and finite variance but infinite higher moments?
Mgf
  • 411
  • 1
  • 5
  • 3
41
votes
1 answer

What is the difference between "coefficient of determination" and "mean squared error"?

For regression problem, I have seen people use "coefficient of determination" (a.k.a R squared) to perform model selection, e.g., finding the appropriate penalty coefficient for regularization. However, it is also common to use "mean squared…
dolaameng
  • 513
  • 1
  • 5
  • 5