Highest Voted Questions - Statistical Analysis Stack Exchange

55

votes

8 answers

Book for reading before Elements of Statistical Learning?

Based on this post, I want to digest Elements of Statistical Learning. Fortunately it is available for free and I started reading it. I don't have enough knowledge to understand it. Can you recommend a book that is a better introduction to the…

machine-learning references

asked Nov 26 '11 at 03:12

B Seven

2,873
4
24
29

55

votes

4 answers

How does LSTM prevent the vanishing gradient problem?

LSTM was invented specifically to avoid the vanishing gradient problem. It is supposed to do that with the Constant Error Carousel (CEC), which on the diagram below (from Greff et al.) correspond to the loop around cell. (source:…

neural-networks lstm

asked Dec 08 '15 at 09:01

TheWalkingCube

653
1
6
6

55

votes

2 answers

Intuitive explanations of differences between Gradient Boosting Trees (GBM) & Adaboost

I'm trying to understand the differences between GBM & Adaboost. These are what I've understood so far: There are both boosting algorithms, which learns from previous model's errors and finally make a weighted sum of the models. GBM and Adaboost…

boosting adaboost

asked Aug 01 '15 at 07:50

Hee Kyung Yoon

687
1
6
9

55

votes

4 answers

Why sigmoid function instead of anything else?

Why is the de-facto standard sigmoid function, $\frac{1}{1+e^{-x}}$, so popular in (non-deep) neural-networks and logistic regression? Why don't we use many of the other derivable functions, with faster computation time or slower decay (so…

logistic neural-networks least-squares

asked Jul 24 '15 at 11:14

Mark Horvath

795
1
8
9

55

votes

2 answers

Prediction interval for lmer() mixed effects model in R

I want to get a prediction interval around a prediction from a lmer() model. I have found some discussion about this: http://rstudio-pubs-static.s3.amazonaws.com/24365_2803ab8299934e888a60e7b16113f619.html http://glmm.wikidot.com/faq but they seem…

r mixed-model prediction prediction-interval lme4-nlme

asked Apr 22 '15 at 21:42

hossibley

797
2
8
10

55

votes

2 answers

What is maxout in neural network?

Can anyone explain what maxout units in a neural network do? How do they perform and how do they differ from conventional units? I tried to read the 2013 "Maxout Network" paper by Goodfellow et al. (from Professor Yoshua Bengio's group), but I don't…

machine-learning neural-networks

asked Dec 19 '14 at 04:46

RockTheStar

11,277
31
63
89

54

votes

4 answers

Why do statisticians say a non-significant result means "you can't reject the null" as opposed to accepting the null hypothesis?

Traditional statistical tests, like the two sample t-test, focus on trying to eliminate the hypothesis that there is no difference between a function of two independent samples. Then, we choose a confidence level and say that if the difference of…

hypothesis-testing statistical-significance confidence-interval equivalence tost

asked Feb 08 '14 at 20:55

ryu576

2,220
1
16
25

54

votes

5 answers

What is the difference between NaN and NA?

I would like to know why some languages like R has both NA and NaN. What are the differences or are they equally the same? Is it really needed to have NA?

r

asked Dec 22 '10 at 06:52

user2479

641
1
5
3

54

votes

13 answers

Visually interesting statistics concepts that are easy to explain

I noticed on Math Stack Exchange a terrific thread which highlighted a number of very visually interesting math concepts. I would be curious to see graphics/gifs which anyone has that very clearly illustrate a statistics concept (particularly those…

self-study data-visualization

asked Mar 02 '20 at 01:00

David Veitch

947
6
12

54

votes

3 answers

Multivariate linear regression vs neural network?

It seems that it is possible to get similar results to a neural network with a multivariate linear regression in some cases, and multivariate linear regression is super fast and easy. Under what circumstances can neural networks give better results…

regression multiple-regression neural-networks

asked Oct 27 '12 at 08:06

Hugh Perkins

4,279
1
23
38

54

votes

10 answers

What is a good algorithm for estimating the median of a huge read-once data set?

I'm looking for a good algorithm (meaning minimal computation, minimal storage requirements) to estimate the median of a data set that is too large to store, such that each value can only be read once (unless you explicitly store that value). There…

algorithms median large-data online-algorithms

asked Jul 20 '10 at 19:21

PeterR

1,712
1
16
13

54

votes

3 answers

Is it possible to do time-series clustering based on curve shape?

I have sales data for a series of outlets, and want to categorise them based on the shape of their curves over time. The data looks roughly like this (but obviously isn't random, and has some missing data): n.quarters <- 100 n.stores <- 20 if…

r time-series clustering

asked Oct 05 '10 at 07:45

fmark

4,666
5
35
51

54

votes

10 answers

Why is the sum of two random variables a convolution?

For long time I did not understand why the "sum" of two random variables is their convolution, whereas a mixture density function sum of $f(x)$ and $g(x)$ is $p\,f(x)+(1-p)g(x)$; the arithmetic sum and not their convolution. The exact phrase "the…

density-function terminology cumulative-distribution-function mixture-distribution convolution

asked Mar 06 '18 at 09:46

Carl

11,532
7
45
102

54

votes

8 answers

Modern successor to Exploratory Data Analysis by Tukey?

I've been reading Tukey's book "Exploratory Data Analysis". Being written in 1977, the book emphasizes paper/pencil methods. Is there a more 'modern' successor which takes into account that we can now instantaneosly plot large data sets?

data-visualization references descriptive-statistics exploratory-data-analysis

asked Feb 08 '12 at 08:18

biofreezer

255
4
11

54

votes

7 answers

Is it a good practice to always scale/normalize data for machine learning?

My understanding is that when some features have different ranges in their values (for example, imagine one feature being the age of a person and another one being their salary in USD) will affect negatively algorithms because the feature with…

machine-learning data-transformation normalization

asked Jan 07 '16 at 04:09

Juan Antonio Gomez Moriano

1,171
1
12
16

Most Popular