Most Popular

1500 questions
58
votes
6 answers

Introduction to statistics for mathematicians

What is a good introduction to statistics for a mathematician who is already well-versed in probability? I have two distinct motivations for asking, which may well lead to different suggestions: I'd like to better understand the statistics…
Mark Meckes
  • 2,916
  • 3
  • 19
  • 18
58
votes
4 answers

Does the optimal number of trees in a random forest depend on the number of predictors?

Can someone explain why we need a large number of trees in random forest when the number of predictors is large? How can we determine the optimal number of trees?
Z Khan
  • 583
  • 1
  • 5
  • 4
58
votes
12 answers

Resources for learning Markov chain and hidden Markov models

I am looking for resources (tutorials, textbooks, webcast, etc) to learn about Markov Chain and HMMs. My background is as a biologist, and I'm currently involved in a bioinformatics-related project. Also, what are the necessary mathematical…
bow
  • 121
  • 1
  • 3
  • 4
58
votes
4 answers

Cross Entropy vs. Sparse Cross Entropy: When to use one over the other

I am playing with convolutional neural networks using Keras+Tensorflow to classify categorical data. I have a choice of two loss functions: categorial_crossentropy and sparse_categorial_crossentropy. I have a good intuition about the…
58
votes
5 answers

Neural networks vs support vector machines: are the second definitely superior?

Many authors of papers I read affirm SVMs is superior technique to face their regression/classification problem, aware that they couldn't get similar results through NNs. Often the comparison states that SVMs, instead of NNs, Have a strong founding…
stackovergio
  • 1,025
  • 1
  • 8
  • 11
58
votes
3 answers

Statistics and causal inference?

In his 1984 paper "Statistics and Causal Inference", Paul Holland raised one of the most fundamental questions in statistics: What can a statistical model say about causation? This led to his motto: NO CAUSATION WITHOUT MANIPULATION which…
Shane
  • 11,961
  • 17
  • 71
  • 89
58
votes
2 answers

What is the difference between a particle filter (sequential Monte Carlo) and a Kalman filter?

A particle filter and Kalman filter are both recursive Bayesian estimators. I often encounter Kalman filters in my field, but very rarely see the usage of a particle filter. When would one be used over the other?
Shane
  • 11,961
  • 17
  • 71
  • 89
58
votes
6 answers

Adam optimizer with exponential decay

In most Tensorflow code I have seen Adam Optimizer is used with a constant Learning Rate of 1e-4 (i.e. 0.0001). The code usually looks the following: ...build the model... # Add the optimizer train_op =…
58
votes
3 answers

Why does shrinkage work?

In order to solve problems of model selection, a number of methods (LASSO, ridge regression, etc.) will shrink the coefficients of predictor variables towards zero. I am looking for an intuitive explanation of why this improves predictive ability.…
58
votes
5 answers

What is the difference between N and N-1 in calculating population variance?

I did not get the why there are N and N-1 while calculating population variance. When we use N and when we use N-1? Click here for a larger version It says that when population is very big there is no difference between N and N-1 but it does not…
ilhan
  • 932
  • 3
  • 11
  • 19
58
votes
3 answers

Why do Convolutional Neural Networks not use a Support Vector Machine to classify?

In recent years, Convolutional Neural Networks (CNNs) have become the state-of-the-art for object recognition in computer vision. Typically, a CNN consists of several convolutional layers, followed by two fully-connected layers. An intuition behind…
58
votes
3 answers

What does standard deviation tell us in non-normal distribution

In a normal distribution, the 68-95-99.7 rule imparts standard deviation a lot of meaning, but what would standard deviation mean in a non-normal distribution (multimodal or skewed)? Would all data values still fall within 3 standard deviations? Do…
Zuhaib Ali
  • 681
  • 1
  • 5
  • 5
58
votes
5 answers

Cost function of neural network is non-convex?

The cost function of neural network is $J(W,b)$, and it is claimed to be non-convex. I don't quite understand why it's that way, since as I see that it's quite similar to the cost function of logistic regression, right? If it is non-convex, so the…
avocado
  • 3,045
  • 5
  • 32
  • 45
58
votes
5 answers

Correlations between continuous and categorical (nominal) variables

I would like to find the correlation between a continuous (dependent variable) and a categorical (nominal: gender, independent variable) variable. Continuous data is not normally distributed. Before, I had computed it using the Spearman's $\rho$.…
57
votes
3 answers

When combining p-values, why not just averaging?

I recently learned about Fisher's method to combine p-values. This is based on the fact that p-value under the null follows a uniform distribution, and that $$-2\sum_{i=1}^n{\log X_i} \sim \chi^2(2n), \text{ given } X \sim \text{Unif}(0,1)$$ which I…