Most Popular

1500 questions
65
votes
9 answers

Clustering with a distance matrix

I have a (symmetric) matrix M that represents the distance between each pair of nodes. For example, A B C D E F G H I J K L A 0 20 20 20 40 60 60 60 100 120 120 120 B 20 0 20 20 60 80 80 80 120 140 140…
yassin
  • 753
  • 1
  • 6
  • 6
65
votes
9 answers

Is this chart showing the likelihood of a terrorist attack statistically useful?

I'm seeing this image passed around a lot. I have a gut-feeling that the information provided this way is somehow incomplete or even erroneous, but I'm not well versed enough in statistics to respond. It makes me think of this xkcd comic, that even…
LCIII
  • 753
  • 5
  • 7
65
votes
7 answers

Why is the validation accuracy fluctuating?

I have a four layer CNN to predict response to cancer using MRI data. I use ReLU activations to introduce nonlinearities. The train accuracy and loss monotonically increase and decrease respectively. But, my test accuracy starts to fluctuate wildly.…
Raghuram
  • 763
  • 1
  • 6
  • 10
65
votes
4 answers

How do you calculate the probability density function of the maximum of a sample of IID uniform random variables?

Given the random variable $$Y = \max(X_1, X_2, \ldots, X_n)$$ where $X_i$ are IID uniform variables, how do I calculate the PDF of $Y$?
Mascarpone
  • 793
  • 1
  • 6
  • 7
64
votes
2 answers

Do we need a global test before post hoc tests?

I often hear that post hoc tests after an ANOVA can only be used if the ANOVA itself was significant. However, post hoc tests adjust $p$-values to keep the global type I error rate at 5%, don't they? So why do we need the global test first? If…
even
  • 2,147
  • 6
  • 18
  • 13
64
votes
2 answers

Is there a difference between 'controlling for' and 'ignoring' other variables in multiple regression?

The coefficient of an explanatory variable in a multiple regression tells us the relationship of that explanatory variable with the dependent variable. All this, while 'controlling' for the other explanatory variables. How I have viewed it so…
Siddharth Gopi
  • 1,395
  • 1
  • 12
  • 22
64
votes
6 answers

Efficient online linear regression

I'm analysing some data where I would like to perform ordinary linear regression, however this is not possible as I am dealing with an on-line setting with a continuous stream of input data (which will quickly get too large for memory) and need to…
mikera
  • 975
  • 1
  • 8
  • 12
64
votes
5 answers

What is the difference between a "nested" and a "non-nested" model?

In the literature on hierarchical/multilevel models I have often read about "nested models" and "non-nested models", but what does this mean? Could anyone maybe give me some examples or tell me about the mathematical implications of this phrasing?
llama
  • 791
  • 1
  • 5
  • 6
64
votes
9 answers

List of situations where a Bayesian approach is simpler, more practical, or more convenient

There have been many debates within statistics between Bayesians and frequentists. I generally find these rather off-putting (although I think it has died down). On the other hand, I've met several people who take an entirely pragmatic view of the…
gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650
64
votes
8 answers

Are bayesians slaves of the likelihood function?

In his book "All of Statistics", Prof. Larry Wasserman presents the following Example (11.10, page 188). Suppose that we have a density $f$ such that $f(x)=c\,g(x)$, where $g$ is a known (nonnegative, integrable) function, and the normalization…
Zen
  • 21,786
  • 3
  • 72
  • 114
64
votes
4 answers

What is so cool about de Finetti's representation theorem?

From Theory of Statistics by Mark J. Schervish (page 12): Although DeFinetti's representation theorem 1.49 is central to motivating parametric models, it is not actually used in their implementation. How is the theorem central to parametric…
gui11aume
  • 13,383
  • 2
  • 44
  • 89
64
votes
5 answers

Why is tanh almost always better than sigmoid as an activation function?

In Andrew Ng's Neural Networks and Deep Learning course on Coursera he says that using $tanh$ is almost always preferable to using $sigmoid$. The reason he gives is that the outputs using $tanh$ centre around 0 rather than $sigmoid$'s 0.5, and this…
64
votes
6 answers

Is ridge regression useless in high dimensions ($n \ll p$)? How can OLS fail to overfit?

Consider a good old regression problem with $p$ predictors and sample size $n$. The usual wisdom is that OLS estimator will overfit and will generally be outperformed by the ridge regression estimator: $$\hat\beta = (X^\top X + \lambda I)^{-1}X^\top…
amoeba
  • 93,463
  • 28
  • 275
  • 317
64
votes
4 answers

Are there cases where PCA is more suitable than t-SNE?

I want to see how 7 measures of text correction behaviour (time spent correcting the text, number of keystrokes, etc.) relate to each other. The measures are correlated. I ran a PCA to see how the measures projected onto PC1 and PC2, which avoided…
user3744206
  • 807
  • 1
  • 8
  • 10
64
votes
2 answers

Derivation of closed form lasso solution

For the lasso problem $\min_\beta (Y-X\beta)^T(Y-X\beta)$ such that $\|\beta\|_1 \leq t$. I often see the soft-thresholding result $$ \beta_j^{\text{lasso}}= \mathrm{sgn}(\beta^{\text{LS}}_j)(|\beta_j^{\text{LS}}|-\gamma)^+ $$ for the orthonormal…
Gary
  • 1,469
  • 1
  • 13
  • 9