Highest Voted Questions - Statistical Analysis Stack Exchange

66

votes

7 answers

Which activation function for output layer?

While the choice of activation functions for the hidden layer is quite clear (mostly sigmoid or tanh), I wonder how to decide on the activation function for the output layer. Common choices are linear functions, sigmoid functions and softmax…

neural-networks

asked Jun 12 '16 at 14:42

Funkwecker

2,432
5
24
43

66

votes

7 answers

How much to pay? A practical problem

This is not a home work question but real problem faced by our company. Very recently (2 days ago) we ordered for manufacturing of 10000 product labels to a dealer. Dealer is independent person. He gets the labels manufactured from outside and…

probability bayesian model decision-theory

asked Jan 21 '16 at 15:06

Neeraj

2,150
18
28

66

votes

5 answers

How to statistically compare two time series?

I have two time series, shown in the plot below: The plot is showing the full detail of both time series, but I can easily reduce it to just the coincident observations if needed. My question is: What statistical methods can I use to assess the…

r time-series

asked Nov 29 '11 at 15:28

robintw

1,977
4
24
23

66

votes

3 answers

Maximum likelihood method vs. least squares method

What is the main difference between maximum likelihood estimation (MLE) vs. least squares estimaton (LSE) ? Why can't we use MLE for predicting $y$ values in linear regression and vice versa? Any help on this topic will be greatly appreciated.

regression estimation maximum-likelihood least-squares

asked Mar 27 '15 at 14:54

evros

751
2
7
6

66

votes

5 answers

Why do we minimize the negative likelihood if it is equivalent to maximization of the likelihood?

This question has puzzled me for a long time. I understand the use of 'log' in maximizing the likelihood so I am not asking about 'log'. My question is, since maximizing log likelihood is equivalent to minimizing "negative log likelihood" (NLL), why…

maximum-likelihood likelihood

asked Mar 10 '15 at 05:05

Tony

1,583
4
15
20

66

votes

12 answers

What does orthogonal mean in the context of statistics?

In other contexts, orthogonal means "at right angles" or "perpendicular". What does orthogonal mean in a statistical context? Thanks for any clarifications.

descriptive-statistics

asked Jun 20 '11 at 12:38

pmgjones

5,543
8
36
36

66

votes

3 answers

Why does ridge estimate become better than OLS by adding a constant to the diagonal?

I understand that the ridge regression estimate is the $\beta$ that minimizes residual sum of square and a penalty on the size of $\beta$ $$\beta_\mathrm{ridge} = (\lambda I_D + X'X)^{-1}X'y = \operatorname{argmin}\big[ \text{RSS} + \lambda…

regression least-squares ridge-regression regularization

asked Oct 11 '14 at 18:52

Heisenberg

4,239
3
23
54

66

votes

4 answers

Random Forest - How to handle overfitting

I have a computer science background but am trying to teach myself data science by solving problems on the internet. I have been working on this problem for the last couple of weeks (approx 900 rows and 10 features). I was initially using logistic…

random-forest overfitting

asked Aug 15 '14 at 04:39

Abhi

1,269
3
13
17

66

votes

4 answers

Intuitive explanation of Fisher Information and Cramer-Rao bound

I am not comfortable with Fisher information, what it measures and how is it helpful. Also it's relationship with the Cramer-Rao bound is not apparent to me. Can someone please give an intuitive explanation of these concepts?

estimation intuition fisher-information

asked May 09 '11 at 20:43

Infinity

893
1
8
7

65

votes

2 answers

Why only three partitions? (training, validation, test)

When you are trying to fit models to a large dataset, the common advice is to partition the data into three parts: the training, validation, and test dataset. This is because the models usually have three "levels" of parameters: the first…

machine-learning model-selection data-mining

asked Apr 08 '11 at 14:45

charles.y.zheng

7,346
2
28
32

65

votes

2 answers

What is the difference between a partial likelihood, profile likelihood and marginal likelihood?

I see these terms being used and I keep getting them mixed up. Is there a simple explanation of the differences between them?

estimation maximum-likelihood

asked Jul 26 '10 at 09:12

Rob Hyndman

51,928
23
126
178

65

votes

6 answers

Real-life examples of moving average processes

Can you give some real-life examples of time series for which a moving average process of order $q$, i.e. $$ y_t = \sum_{i=1}^q \theta_i \varepsilon_{t-i} + \varepsilon_t, \text{ where } \varepsilon_t \sim \mathcal{N}(0, \sigma^2) $$ has some a…

time-series arima interpretation moving-average

asked Dec 03 '12 at 19:02

weez13

1,127
2
9
12

65

votes

14 answers

What is the most surprising characterization of the Gaussian (normal) distribution?

A standardized Gaussian distribution on $\mathbb{R}$ can be defined by giving explicitly its density: $$ \frac{1}{\sqrt{2\pi}}e^{-x^2/2}$$ or its characteristic function. As recalled in this question it is also the only distribution for which the…

probability normal-distribution mathematical-statistics characteristic-function

asked Nov 09 '10 at 20:19

robin girard

6,335
6
46
60

65

votes

12 answers

Why do neural networks need so many training examples to perform?

A human child at age 2 needs around 5 instances of a car to be able to identify it with reasonable accuracy regardless of color, make, etc. When my son was 2, he was able to identify trams and trains, even though he had seen just a few. Since he was…

neural-networks neuroscience

asked Feb 24 '19 at 14:07

Marcin

917
1
7
11

65

votes

4 answers

Does it make sense to add a quadratic term but not the linear term to a model?

I have a (mixed) model in which one of my predictors should a priori only be quadratically related to the predictor (due to the experimental manipulation). Hence, I would like to add only the quadratic term to the model. Two things keep me from…

regression polynomial

asked May 18 '12 at 13:34

Henrik

13,314
9
63
123

Most Popular