Most Popular
1500 questions
69
votes
8 answers
What is a good, convincing example in which p-values are useful?
My question in the title is self explanatory, but I would like to give it some context.
The ASA released a statement earlier this week “on p-values: context, process, and purpose”, outlining various common misconceptions of the p-value, and urging…

Tal Galili
- 19,935
- 32
- 133
- 195
69
votes
5 answers
Why bother with the dual problem when fitting SVM?
Given the data points $x_1, \ldots, x_n \in \mathbb{R}^d$ and labels $y_1, \ldots, y_n \in \left \{-1, 1 \right\}$, the hard margin SVM primal problem is
$$ \text{minimize}_{w, w_0} \quad \frac{1}{2} w^T w $$
$$ \text{s.t.} \quad \forall i: y_i…

blubb
- 2,458
- 2
- 19
- 28
69
votes
5 answers
Why is ANOVA equivalent to linear regression?
I read that ANOVA and linear regression are the same thing. How can that be, considering that the output of ANOVA is some $F$ value and some $p$-value based on which you conclude if the sample means across the different samples are same or…

Victor
- 5,925
- 13
- 43
- 67
69
votes
2 answers
How can an artificial neural network ANN, be used for unsupervised clustering?
I understand how an artificial neural network (ANN), can be trained in a supervised manner using backpropogation to improve the fitting by decreasing the error in the predictions. I have heard that an ANN can be used for unsupervised learning but…

Vass
- 1,425
- 2
- 14
- 20
69
votes
4 answers
Assumptions regarding bootstrap estimates of uncertainty
I appreciate the usefulness of the bootstrap in obtaining uncertainty estimates, but one thing that's always bothered me about it is that the distribution corresponding to those estimates is the distribution defined by the sample. In general, it…

user4733
- 2,494
- 2
- 20
- 31
68
votes
1 answer
KL divergence between two multivariate Gaussians
I'm having trouble deriving the KL divergence formula assuming two multivariate normal distributions. I've done the univariate case fairly easily. However, it's been quite a while since I took math stats, so I'm having some trouble extending it to…

dmartin
- 3,010
- 3
- 22
- 27
68
votes
4 answers
Look and you shall find (a correlation)
I have several hundred measurements. Now, I am considering utilizing some kind of software to correlate every measure with every measure. This means that there are thousands of correlations. Among these there should (statistically) be a high…

David
- 855
- 1
- 8
- 7
68
votes
18 answers
Statistics interview questions
I am looking for some statistics (and probability, I guess) interview questions, from the most basic through the more advanced. Answers are not necessary (although links to specific questions on this site would do well).

shabbychef
- 10,388
- 7
- 50
- 93
68
votes
8 answers
Which pseudo-$R^2$ measure is the one to report for logistic regression (Cox & Snell or Nagelkerke)?
I have SPSS output for a logistic regression model. The output reports two measures for the model fit, Cox & Snell and Nagelkerke.
So as a rule of thumb, which of these $R^²$ measures would you report as the model fit?
Or, which of these fit indices…

Henrik
- 13,314
- 9
- 63
- 123
68
votes
8 answers
How to simulate data that satisfy specific constraints such as having specific mean and standard deviation?
This question is motivated by my question on meta-analysis. But I imagine that it would also be useful in teaching contexts where you want to create a dataset that exactly mirrors an existing published dataset.
I know how to generate random data…

Jeromy Anglim
- 42,044
- 23
- 146
- 250
68
votes
4 answers
Why is it that natural log changes are percentage changes? What is about logs that makes this so?
Can somebody explain how the properties of logs make it so you can do log linear regressions where the coefficients are interpreted as percentage changes?

thewhitetie
- 837
- 1
- 7
- 6
68
votes
9 answers
How can I help ensure testing data does not leak into training data?
Suppose we have someone building a predictive model, but that someone is not necessarily well-versed in proper statistical or machine learning principles. Maybe we are helping that person as they are learning, or maybe that person is using some…

Michael McGowan
- 4,561
- 3
- 31
- 46
68
votes
4 answers
Why is sample standard deviation a biased estimator of $\sigma$?
According to the Wikipedia article on unbiased estimation of standard deviation the sample SD
$$s = \sqrt{\frac{1}{n-1} \sum_{i=1}^n (x_i - \overline{x})^2}$$
is a biased estimator of the SD of the population. It states that $E(\sqrt{s^2}) \neq…

Dav Weps
- 689
- 1
- 6
- 3
67
votes
6 answers
Standard errors for lasso prediction using R
I'm trying to use a LASSO model for prediction, and I need to estimate standard errors. Surely someone has already written a package to do this. But as far as I can see, none of the packages on CRAN that do predictions using a LASSO will return…

Rob Hyndman
- 51,928
- 23
- 126
- 178
67
votes
1 answer
Variance of product of multiple independent random variables
We know the answer for two independent variables:
$$ {\rm Var}(XY) = E(X^2Y^2) − (E(XY))^2={\rm Var}(X){\rm Var}(Y)+{\rm Var}(X)(E(Y))^2+{\rm Var}(Y)(E(X))^2$$
However, if we take the product of more than two variables, ${\rm Var}(X_1X_2 \cdots…

damla
- 791
- 1
- 7
- 5