Highest Voted Questions - Statistical Analysis Stack Exchange

65

votes

9 answers

Clustering with a distance matrix

I have a (symmetric) matrix M that represents the distance between each pair of nodes. For example, A B C D E F G H I J K L A 0 20 20 20 40 60 60 60 100 120 120 120 B 20 0 20 20 60 80 80 80 120 140 140…

clustering

asked Sep 16 '10 at 11:47

yassin

753
1
6
6

65

votes

9 answers

Is this chart showing the likelihood of a terrorist attack statistically useful?

I'm seeing this image passed around a lot. I have a gut-feeling that the information provided this way is somehow incomplete or even erroneous, but I'm not well versed enough in statistics to respond. It makes me think of this xkcd comic, that even…

probability interpretation prediction

asked Feb 01 '17 at 17:09

LCIII

753
5
7

65

votes

7 answers

Why is the validation accuracy fluctuating?

I have a four layer CNN to predict response to cancer using MRI data. I use ReLU activations to introduce nonlinearities. The train accuracy and loss monotonically increase and decrease respectively. But, my test accuracy starts to fluctuate wildly.…

machine-learning python deep-learning

asked Jan 08 '17 at 02:37

Raghuram

763
1
6
10

65

votes

4 answers

How do you calculate the probability density function of the maximum of a sample of IID uniform random variables?

Given the random variable $$Y = \max(X_1, X_2, \ldots, X_n)$$ where $X_i$ are IID uniform variables, how do I calculate the PDF of $Y$?

density-function extreme-value

asked Nov 15 '11 at 19:34

Mascarpone

793
1
6
7

64

votes

2 answers

Do we need a global test before post hoc tests?

I often hear that post hoc tests after an ANOVA can only be used if the ANOVA itself was significant. However, post hoc tests adjust $p$-values to keep the global type I error rate at 5%, don't they? So why do we need the global test first? If…

anova statistical-significance post-hoc

asked Apr 19 '11 at 16:51

even

2,147
6
18
13

64

votes

2 answers

Is there a difference between 'controlling for' and 'ignoring' other variables in multiple regression?

The coefficient of an explanatory variable in a multiple regression tells us the relationship of that explanatory variable with the dependent variable. All this, while 'controlling' for the other explanatory variables. How I have viewed it so…

regression multiple-regression

asked Dec 07 '13 at 02:14

Siddharth Gopi

1,395
1
12
22

64

votes

6 answers

Efficient online linear regression

I'm analysing some data where I would like to perform ordinary linear regression, however this is not possible as I am dealing with an on-line setting with a continuous stream of input data (which will quickly get too large for memory) and need to…

time-series regression algorithms real-time

asked Feb 05 '11 at 18:25

mikera

975
1
8
12

64

votes

5 answers

What is the difference between a "nested" and a "non-nested" model?

In the literature on hierarchical/multilevel models I have often read about "nested models" and "non-nested models", but what does this mean? Could anyone maybe give me some examples or tell me about the mathematical implications of this phrasing?

hypothesis-testing terminology nested-models nested-data

asked Nov 19 '10 at 11:32

llama

791
1
5
6

64

votes

9 answers

List of situations where a Bayesian approach is simpler, more practical, or more convenient

There have been many debates within statistics between Bayesians and frequentists. I generally find these rather off-putting (although I think it has died down). On the other hand, I've met several people who take an entirely pragmatic view of the…

bayesian frequentist

asked Oct 29 '12 at 03:39

gung - Reinstate Monica

132,789
81
357
650

64

votes

8 answers

Are bayesians slaves of the likelihood function?

In his book "All of Statistics", Prof. Larry Wasserman presents the following Example (11.10, page 188). Suppose that we have a density $f$ such that $f(x)=c\,g(x)$, where $g$ is a known (nonnegative, integrable) function, and the normalization…

bayesian mathematical-statistics

asked Oct 01 '12 at 21:01

Zen

21,786
3
72
114

64

votes

4 answers

What is so cool about de Finetti's representation theorem?

From Theory of Statistics by Mark J. Schervish (page 12): Although DeFinetti's representation theorem 1.49 is central to motivating parametric models, it is not actually used in their implementation. How is the theorem central to parametric…

probability mathematical-statistics modeling exchangeability

asked Aug 16 '12 at 17:40

gui11aume

13,383
2
44
89

64

votes

5 answers

Why is tanh almost always better than sigmoid as an activation function?

In Andrew Ng's Neural Networks and Deep Learning course on Coursera he says that using $tanh$ is almost always preferable to using $sigmoid$. The reason he gives is that the outputs using $tanh$ centre around 0 rather than $sigmoid$'s 0.5, and this…

machine-learning neural-networks backpropagation sigmoid-curve

asked Feb 26 '18 at 08:45

Tom Hale

2,231
3
13
31

64

votes

6 answers

Is ridge regression useless in high dimensions ($n \ll p$)? How can OLS fail to overfit?

Consider a good old regression problem with $p$ predictors and sample size $n$. The usual wisdom is that OLS estimator will overfit and will generally be outperformed by the ridge regression estimator: $$\hat\beta = (X^\top X + \lambda I)^{-1}X^\top…

cross-validation overfitting ridge-regression regularization

asked Feb 14 '18 at 16:31

amoeba

93,463
28
275
317

64

votes

4 answers

Are there cases where PCA is more suitable than t-SNE?

I want to see how 7 measures of text correction behaviour (time spent correcting the text, number of keystrokes, etc.) relate to each other. The measures are correlated. I ran a PCA to see how the measures projected onto PC1 and PC2, which avoided…

pca tsne

asked Oct 05 '16 at 08:22

user3744206

807
1
8
10

64

votes

2 answers

Derivation of closed form lasso solution

For the lasso problem $\min_\beta (Y-X\beta)^T(Y-X\beta)$ such that $\|\beta\|_1 \leq t$. I often see the soft-thresholding result $$ \beta_j^{\text{lasso}}= \mathrm{sgn}(\beta^{\text{LS}}_j)(|\beta_j^{\text{LS}}|-\gamma)^+ $$ for the orthonormal…

lasso

asked Nov 01 '11 at 00:03

Gary

1,469
1
13
9

Most Popular