Highest Voted Questions - Statistical Analysis Stack Exchange

69

votes

8 answers

What is a good, convincing example in which p-values are useful?

My question in the title is self explanatory, but I would like to give it some context. The ASA released a statement earlier this week “on p-values: context, process, and purpose”, outlining various common misconceptions of the p-value, and urging…

hypothesis-testing bayesian p-value inference frequentist

asked Mar 11 '16 at 11:44

Tal Galili

19,935
32
133
195

69

votes

5 answers

Why bother with the dual problem when fitting SVM?

Given the data points $x_1, \ldots, x_n \in \mathbb{R}^d$ and labels $y_1, \ldots, y_n \in \left \{-1, 1 \right\}$, the hard margin SVM primal problem is $$ \text{minimize}_{w, w_0} \quad \frac{1}{2} w^T w $$ $$ \text{s.t.} \quad \forall i: y_i…

svm

asked Nov 30 '11 at 19:48

blubb

2,458
2
19
28

69

votes

5 answers

Why is ANOVA equivalent to linear regression?

I read that ANOVA and linear regression are the same thing. How can that be, considering that the output of ANOVA is some $F$ value and some $p$-value based on which you conclude if the sample means across the different samples are same or…

regression anova

asked Oct 02 '15 at 18:40

Victor

5,925
13
43
67

69

votes

2 answers

How can an artificial neural network ANN, be used for unsupervised clustering?

I understand how an artificial neural network (ANN), can be trained in a supervised manner using backpropogation to improve the fitting by decreasing the error in the predictions. I have heard that an ANN can be used for unsupervised learning but…

clustering neural-networks unsupervised-learning self-organizing-maps

asked Mar 03 '15 at 16:21

Vass

1,425
2
14
20

69

votes

4 answers

Assumptions regarding bootstrap estimates of uncertainty

I appreciate the usefulness of the bootstrap in obtaining uncertainty estimates, but one thing that's always bothered me about it is that the distribution corresponding to those estimates is the distribution defined by the sample. In general, it…

bootstrap uncertainty

asked May 24 '11 at 19:53

user4733

2,494
2
20
31

68

votes

1 answer

KL divergence between two multivariate Gaussians

I'm having trouble deriving the KL divergence formula assuming two multivariate normal distributions. I've done the univariate case fairly easily. However, it's been quite a while since I took math stats, so I'm having some trouble extending it to…

mathematical-statistics normal-distribution multivariate-normal-distribution kullback-leibler

asked Jun 02 '13 at 20:50

dmartin

3,010
3
22
27

68

votes

4 answers

Look and you shall find (a correlation)

I have several hundred measurements. Now, I am considering utilizing some kind of software to correlate every measure with every measure. This means that there are thousands of correlations. Among these there should (statistically) be a high…

correlation multiple-comparisons permutation-test

asked Dec 25 '10 at 22:16

David

855
1
8
7

68

votes

18 answers

Statistics interview questions

I am looking for some statistics (and probability, I guess) interview questions, from the most basic through the more advanced. Answers are not necessary (although links to specific questions on this site would do well).

intuition careers

asked Dec 14 '10 at 06:20

shabbychef

10,388
7
50
93

68

votes

8 answers

Which pseudo-$R^2$ measure is the one to report for logistic regression (Cox & Snell or Nagelkerke)?

I have SPSS output for a logistic regression model. The output reports two measures for the model fit, Cox & Snell and Nagelkerke. So as a rule of thumb, which of these $R^²$ measures would you report as the model fit? Or, which of these fit indices…

logistic goodness-of-fit r-squared

asked Oct 13 '10 at 16:12

Henrik

13,314
9
63
123

68

votes

8 answers

How to simulate data that satisfy specific constraints such as having specific mean and standard deviation?

This question is motivated by my question on meta-analysis. But I imagine that it would also be useful in teaching contexts where you want to create a dataset that exactly mirrors an existing published dataset. I know how to generate random data…

r dataset simulation random-generation

asked Jun 12 '12 at 11:03

Jeromy Anglim

42,044
23
146
250

68

votes

4 answers

Why is it that natural log changes are percentage changes? What is about logs that makes this so?

Can somebody explain how the properties of logs make it so you can do log linear regressions where the coefficients are interpreted as percentage changes?

regression logarithm mathematical-statistics

asked Nov 04 '16 at 15:07

thewhitetie

837
1
7
6

68

votes

9 answers

How can I help ensure testing data does not leak into training data?

Suppose we have someone building a predictive model, but that someone is not necessarily well-versed in proper statistical or machine learning principles. Maybe we are helping that person as they are learning, or maybe that person is using some…

machine-learning classification predictive-models cross-validation out-of-sample

asked Dec 19 '11 at 22:49

Michael McGowan

4,561
3
31
46

68

votes

4 answers

Why is sample standard deviation a biased estimator of $\sigma$?

According to the Wikipedia article on unbiased estimation of standard deviation the sample SD $$s = \sqrt{\frac{1}{n-1} \sum_{i=1}^n (x_i - \overline{x})^2}$$ is a biased estimator of the SD of the population. It states that $E(\sqrt{s^2}) \neq…

estimation standard-deviation

asked Jun 08 '11 at 12:28

Dav Weps

689
1
6
3

67

votes

6 answers

Standard errors for lasso prediction using R

I'm trying to use a LASSO model for prediction, and I need to estimate standard errors. Surely someone has already written a package to do this. But as far as I can see, none of the packages on CRAN that do predictions using a LASSO will return…

r standard-error prediction lasso

asked Mar 26 '14 at 02:20

Rob Hyndman

51,928
23
126
178

67

votes

1 answer

Variance of product of multiple independent random variables

We know the answer for two independent variables: $$ {\rm Var}(XY) = E(X^2Y^2) − (E(XY))^2={\rm Var}(X){\rm Var}(Y)+{\rm Var}(X)(E(Y))^2+{\rm Var}(Y)(E(X))^2$$ However, if we take the product of more than two variables, ${\rm Var}(X_1X_2 \cdots…

variance random-variable independence

asked Mar 18 '13 at 23:41

damla

791
1
7
5

Most Popular