Highest Voted Questions - Statistical Analysis Stack Exchange

58

votes

6 answers

Introduction to statistics for mathematicians

What is a good introduction to statistics for a mathematician who is already well-versed in probability? I have two distinct motivations for asking, which may well lead to different suggestions: I'd like to better understand the statistics…

references

asked Jul 21 '10 at 13:50

Mark Meckes

2,916
3
19
18

58

votes

4 answers

Does the optimal number of trees in a random forest depend on the number of predictors?

Can someone explain why we need a large number of trees in random forest when the number of predictors is large? How can we determine the optimal number of trees?

machine-learning random-forest

asked Sep 12 '12 at 14:07

Z Khan

583
1
5
4

58

votes

12 answers

Resources for learning Markov chain and hidden Markov models

I am looking for resources (tutorials, textbooks, webcast, etc) to learn about Markov Chain and HMMs. My background is as a biologist, and I'm currently involved in a bioinformatics-related project. Also, what are the necessary mathematical…

references markov-process hidden-markov-model bioinformatics

asked Oct 04 '10 at 08:33

bow

121
1
3
4

58

votes

4 answers

Cross Entropy vs. Sparse Cross Entropy: When to use one over the other

I am playing with convolutional neural networks using Keras+Tensorflow to classify categorical data. I have a choice of two loss functions: categorial_crossentropy and sparse_categorial_crossentropy. I have a good intuition about the…

machine-learning conv-neural-network loss-functions information-theory cross-entropy

asked Jan 31 '18 at 15:03

kedarps

2,902
2
19
30

58

votes

5 answers

Neural networks vs support vector machines: are the second definitely superior?

Many authors of papers I read affirm SVMs is superior technique to face their regression/classification problem, aware that they couldn't get similar results through NNs. Often the comparison states that SVMs, instead of NNs, Have a strong founding…

machine-learning svm neural-networks

asked Jun 08 '12 at 02:59

stackovergio

1,025
1
8
11

58

votes

3 answers

Statistics and causal inference?

In his 1984 paper "Statistics and Causal Inference", Paul Holland raised one of the most fundamental questions in statistics: What can a statistical model say about causation? This led to his motto: NO CAUSATION WITHOUT MANIPULATION which…

causality

asked Aug 31 '10 at 19:13

Shane

11,961
17
71
89

58

votes

2 answers

What is the difference between a particle filter (sequential Monte Carlo) and a Kalman filter?

A particle filter and Kalman filter are both recursive Bayesian estimators. I often encounter Kalman filters in my field, but very rarely see the usage of a particle filter. When would one be used over the other?

bayesian particle-filter kalman-filter

asked Aug 26 '10 at 21:00

Shane

11,961
17
71
89

58

votes

6 answers

Adam optimizer with exponential decay

In most Tensorflow code I have seen Adam Optimizer is used with a constant Learning Rate of 1e-4 (i.e. 0.0001). The code usually looks the following: ...build the model... # Add the optimizer train_op =…

neural-networks deep-learning gradient-descent tensorflow adam

asked Mar 05 '16 at 08:22

MarvMind

683
1
6
5

58

votes

3 answers

Why does shrinkage work?

In order to solve problems of model selection, a number of methods (LASSO, ridge regression, etc.) will shrink the coefficients of predictor variables towards zero. I am looking for an intuitive explanation of why this improves predictive ability.…

lasso ridge-regression intuition regularization

asked Nov 02 '15 at 20:29

aspiringstatistician

581
1
5
3

58

votes

5 answers

What is the difference between N and N-1 in calculating population variance?

I did not get the why there are N and N-1 while calculating population variance. When we use N and when we use N-1? Click here for a larger version It says that when population is very big there is no difference between N and N-1 but it does not…

variance population

asked Nov 03 '11 at 15:02

ilhan

932
3
11
19

58

votes

3 answers

Why do Convolutional Neural Networks not use a Support Vector Machine to classify?

In recent years, Convolutional Neural Networks (CNNs) have become the state-of-the-art for object recognition in computer vision. Typically, a CNN consists of several convolutional layers, followed by two fully-connected layers. An intuition behind…

machine-learning neural-networks svm deep-learning conv-neural-network

asked Aug 20 '15 at 14:43

Karnivaurus

5,909
10
36
52

58

votes

3 answers

What does standard deviation tell us in non-normal distribution

In a normal distribution, the 68-95-99.7 rule imparts standard deviation a lot of meaning, but what would standard deviation mean in a non-normal distribution (multimodal or skewed)? Would all data values still fall within 3 standard deviations? Do…

normal-distribution standard-deviation skewness

asked Jul 20 '14 at 07:54

Zuhaib Ali

681
1
5
5

58

votes

5 answers

Cost function of neural network is non-convex?

The cost function of neural network is $J(W,b)$, and it is claimed to be non-convex. I don't quite understand why it's that way, since as I see that it's quite similar to the cost function of logistic regression, right? If it is non-convex, so the…

machine-learning neural-networks loss-functions

asked Jul 09 '14 at 13:59

avocado

3,045
5
32
45

58

votes

5 answers

Correlations between continuous and categorical (nominal) variables

I would like to find the correlation between a continuous (dependent variable) and a categorical (nominal: gender, independent variable) variable. Continuous data is not normally distributed. Before, I had computed it using the Spearman's $\rho$.…

correlation categorical-data descriptive-statistics biostatistics spearman-rho

asked Jun 10 '14 at 08:13

Md. Ferdous Wahid

805
1
7
11

57

votes

3 answers

When combining p-values, why not just averaging?

I recently learned about Fisher's method to combine p-values. This is based on the fact that p-value under the null follows a uniform distribution, and that $$-2\sum_{i=1}^n{\log X_i} \sim \chi^2(2n), \text{ given } X \sim \text{Unif}(0,1)$$ which I…

hypothesis-testing p-value multiple-comparisons central-limit-theorem combining-p-values

asked Dec 04 '13 at 23:11

Alby

2,103
3
19
22

Most Popular