Highest Voted Questions - Statistical Analysis Stack Exchange

49

votes

5 answers

Generic sum of Gamma random variables

I have read that the sum of Gamma random variables with the same scale parameter is another Gamma random variable. I've also seen the paper by Moschopoulos describing a method for the summation of a general set of Gamma random variables. I have…

probability distributions gamma-distribution summations saddlepoint-approximation

asked Oct 10 '13 at 19:49

OSE

1,057
2
10
8

49

votes

3 answers

What is the difference between posterior and posterior predictive distribution?

I understand what a Posterior is, but I'm not sure what the latter means? How are the 2 different? Kevin P Murphy indicated in his textbook, Machine Learning: a Probabilistic Perspective, that it is "an internal belief state". What does that really…

posterior definition

asked Sep 25 '13 at 16:05

A.D

2,114
3
17
27

49

votes

1 answer

Logistic regression: anova chi-square test vs. significance of coefficients (anova() vs summary() in R)

I have a logistic GLM model with 8 variables. I ran a chi-square test in R anova(glm.model,test='Chisq') and 2 of the variables turn out to be predictive when ordered at the top of the test and not so much when ordered at the bottom. The…

r regression logistic statistical-significance generalized-linear-model

asked May 23 '13 at 19:57

StreetHawk

493
1
5
5

49

votes

6 answers

What do "endogeneity" and "exogeneity" mean substantively?

I understand that the basic definition of endogeneity is that $$ X'\epsilon=0 $$ is not satisfied, but what does this mean in a real world sense? I read the Wikipedia article, with the supply and demand example, trying to make sense of it, but it…

regression causality instrumental-variables

asked May 21 '13 at 06:22

user25901

491
1
5
3

49

votes

5 answers

What is residual standard error?

When running a multiple regression model in R, one of the outputs is a residual standard error of 0.0589 on 95,161 degrees of freedom. I know that the 95,161 degrees of freedom is given by the difference between the number of observations in my…

regression standard-error residuals

asked Apr 30 '13 at 20:54

ustroetz

741
1
8
14

49

votes

2 answers

Are splines overfitting the data?

My problem: I recently met a statistician that informed me that splines are only useful for exploring data and are subjected to overfitting, thus not useful in prediction. He preferred exploring with simple polynomials ... As I’m a big fan of…

regression splines

asked Feb 01 '13 at 09:36

Max Gordon

5,616
8
30
51

49

votes

4 answers

Does correlation = 0.2 mean that there is an association "in only 1 in 5 people"?

In The Idiot Brain: A Neuroscientist Explains What Your Head is Really Up To, Dean Burnett wrote The correlation between height and intelligence is usually cited as being about $0.2$, meaning height and intelligence seem to be associated in only…

correlation neuroscience

asked Feb 14 '18 at 20:07

Sitak

593
4
5

49

votes

6 answers

Is Amazon's "average rating" misleading?

If I understand correctly, book ratings on a 1-5 scale are Likert scores. That is, a 3 for me may not necessarily be a 3 for someone else. It's an ordinal scale IMO. One shouldn't really average ordinal scales but can definitely take the mode,…

mean ordinal-data likert

asked Jul 03 '12 at 21:51

PhD

13,429
19
45
47

49

votes

8 answers

Danger of setting all initial weights to zero in Backpropagation

Why is it dangerous to initialize weights with zeros? Is there any simple example that demonstrates it?

neural-networks backpropagation

asked Apr 25 '12 at 18:21

user8078

593
1
5
4

49

votes

6 answers

Why do we use ReLU in neural networks and how do we use it?

Why do we use rectified linear units (ReLU) with neural networks? How does that improve neural network? Why do we say that ReLU is an activation function? Isn't softmax activation function for neural networks? I am guessing that we use both, ReLU…

neural-networks

asked Aug 02 '16 at 17:26

user2896492634

593
1
5
4

49

votes

1 answer

How does the Adam method of stochastic gradient descent work?

I'm familiar with basic gradient descent algorithms for training neural networks. I've read the paper proposing Adam: ADAM: A METHOD FOR STOCHASTIC OPTIMIZATION. While I've definitely got some insights (at least), the paper seems to be too high…

neural-networks optimization gradient-descent adam

asked Jun 24 '16 at 15:45

daniel451

2,635
6
22
26

49

votes

5 answers

How to make a time series stationary?

Besides taking differences, what are other techniques for making a non-stationary time series, stationary? Ordinarily one refers to a series as "integrated of order p" if it can be made stationary through a lag operator $(1-L)^P X_t$.

time-series stationarity

asked Aug 24 '10 at 18:40

Shane

11,961
17
71
89

49

votes

3 answers

How does saddlepoint approximation work?

How does saddlepoint approximation work? What sort of problem is it good for? (Feel free to use a particular example or examples by way of illustration) Are there any drawbacks, difficulties, things to watch out for, or traps for the unwary?

distributions mathematical-statistics moment-generating-function saddlepoint-approximation partial-moments

asked Jan 20 '16 at 01:35

Glen_b

257,508
32
553
939

49

votes

1 answer

Difference between GradientDescentOptimizer and AdamOptimizer (TensorFlow)?

I've written a simple MLP in TensorFlow which is modelling a XOR-Gate. So for: input_data = [[0., 0.], [0., 1.], [1., 0.], [1., 1.]] it should produce the following: output_data = [[0.], [1.], [1.], [0.]] The network has an input layer, a hidden…

machine-learning neural-networks error gradient-descent supervised-learning

asked Dec 01 '15 at 13:48

daniel451

2,635
6
22
26

49

votes

5 answers

How does rectilinear activation function solve the vanishing gradient problem in neural networks?

I found rectified linear unit (ReLU) praised at several places as a solution to the vanishing gradient problem for neural networks. That is, one uses max(0,x) as activation function. When the activation is positive, it is obvious that this is better…

machine-learning neural-networks deep-learning gradient-descent

asked Oct 13 '15 at 20:05

Hans-Peter Störr

607
1
6
6

Most Popular