Most Popular
1500 questions
58
votes
6 answers
Introduction to statistics for mathematicians
What is a good introduction to statistics for a mathematician who is already well-versed in probability? I have two distinct motivations for asking, which may well lead to different suggestions:
I'd like to better understand the statistics…

Mark Meckes
- 2,916
- 3
- 19
- 18
58
votes
4 answers
Does the optimal number of trees in a random forest depend on the number of predictors?
Can someone explain why we need a large number of trees in random forest when the number of predictors is large? How can we determine the optimal number of trees?

Z Khan
- 583
- 1
- 5
- 4
58
votes
12 answers
Resources for learning Markov chain and hidden Markov models
I am looking for resources (tutorials, textbooks, webcast, etc) to learn about Markov Chain and HMMs. My background is as a biologist, and I'm currently involved in a bioinformatics-related project.
Also, what are the necessary mathematical…

bow
- 121
- 1
- 3
- 4
58
votes
4 answers
Cross Entropy vs. Sparse Cross Entropy: When to use one over the other
I am playing with convolutional neural networks using Keras+Tensorflow to classify categorical data. I have a choice of two loss functions: categorial_crossentropy and sparse_categorial_crossentropy.
I have a good intuition about the…

kedarps
- 2,902
- 2
- 19
- 30
58
votes
5 answers
Neural networks vs support vector machines: are the second definitely superior?
Many authors of papers I read affirm SVMs is superior technique to face their regression/classification problem, aware that they couldn't get similar results through NNs. Often the comparison states that
SVMs, instead of NNs,
Have a strong founding…

stackovergio
- 1,025
- 1
- 8
- 11
58
votes
3 answers
Statistics and causal inference?
In his 1984 paper "Statistics and Causal Inference", Paul Holland raised one of the most fundamental questions in statistics:
What can a statistical model say about
causation?
This led to his motto:
NO CAUSATION WITHOUT MANIPULATION
which…

Shane
- 11,961
- 17
- 71
- 89
58
votes
2 answers
What is the difference between a particle filter (sequential Monte Carlo) and a Kalman filter?
A particle filter and Kalman filter are both recursive Bayesian estimators. I often encounter Kalman filters in my field, but very rarely see the usage of a particle filter.
When would one be used over the other?

Shane
- 11,961
- 17
- 71
- 89
58
votes
6 answers
Adam optimizer with exponential decay
In most Tensorflow code I have seen Adam Optimizer is used with a constant Learning Rate of 1e-4 (i.e. 0.0001). The code usually looks the following:
...build the model...
# Add the optimizer
train_op =…

MarvMind
- 683
- 1
- 6
- 5
58
votes
3 answers
Why does shrinkage work?
In order to solve problems of model selection, a number of methods (LASSO, ridge regression, etc.) will shrink the coefficients of predictor variables towards zero. I am looking for an intuitive explanation of why this improves predictive ability.…

aspiringstatistician
- 581
- 1
- 5
- 3
58
votes
5 answers
What is the difference between N and N-1 in calculating population variance?
I did not get the why there are N and N-1 while calculating population variance. When we use N and when we use N-1?
Click here for a larger version
It says that when population is very big there is no difference between N and N-1 but it does not…

ilhan
- 932
- 3
- 11
- 19
58
votes
3 answers
Why do Convolutional Neural Networks not use a Support Vector Machine to classify?
In recent years, Convolutional Neural Networks (CNNs) have become the state-of-the-art for object recognition in computer vision. Typically, a CNN consists of several convolutional layers, followed by two fully-connected layers. An intuition behind…

Karnivaurus
- 5,909
- 10
- 36
- 52
58
votes
3 answers
What does standard deviation tell us in non-normal distribution
In a normal distribution, the 68-95-99.7 rule imparts standard deviation a lot of meaning, but what would standard deviation mean in a non-normal distribution (multimodal or skewed)? Would all data values still fall within 3 standard deviations? Do…

Zuhaib Ali
- 681
- 1
- 5
- 5
58
votes
5 answers
Cost function of neural network is non-convex?
The cost function of neural network is $J(W,b)$, and it is claimed to be non-convex. I don't quite understand why it's that way, since as I see that it's quite similar to the cost function of logistic regression, right?
If it is non-convex, so the…

avocado
- 3,045
- 5
- 32
- 45
58
votes
5 answers
Correlations between continuous and categorical (nominal) variables
I would like to find the correlation between a continuous (dependent variable) and a categorical (nominal: gender, independent variable) variable. Continuous data is not normally distributed. Before, I had computed it using the Spearman's $\rho$.…

Md. Ferdous Wahid
- 805
- 1
- 7
- 11
57
votes
3 answers
When combining p-values, why not just averaging?
I recently learned about Fisher's method to combine p-values. This is based on the fact that p-value under the null follows a uniform distribution, and that $$-2\sum_{i=1}^n{\log X_i} \sim \chi^2(2n), \text{ given } X \sim \text{Unif}(0,1)$$
which I…

Alby
- 2,103
- 3
- 19
- 22