Most Popular
1500 questions
50
votes
2 answers
What is the distribution of the sum of non i.i.d. gaussian variates?
If $X$ is distributed $N(\mu_X, \sigma^2_X)$,
$Y$ is distributed $N(\mu_Y, \sigma^2_Y)$
and $Z = X + Y$, I know that $Z$ is distributed $N(\mu_X + \mu_Y, \sigma^2_X + \sigma^2_Y)$ if X and Y are independent.
But what would happen if X and Y were not…

JCWong
- 1,392
- 1
- 15
- 29
50
votes
6 answers
What algorithm is used in linear regression?
I usually hear about "ordinary least squares". Is that the most widely used algorithm used for linear regression? Are there reasons to use a different one?

Belmont
- 1,273
- 3
- 12
- 16
50
votes
5 answers
Is minimizing squared error equivalent to minimizing absolute error? Why squared error is more popular than the latter?
When we conduct linear regression $y=ax+b$ to fit a bunch of data points $(x_1,y_1),(x_2,y_2),...,(x_n,y_n)$, the classic approach minimizes the squared error. I have long been puzzled by a question that will minimizing the squared error yield the…

Tony
- 1,583
- 4
- 15
- 20
50
votes
5 answers
Dynamic Time Warping Clustering
What would be the approach to use Dynamic Time Warping (DTW) to perform clustering of time series?
I have read about DTW as a way to find similarity between two time series, while they could be shifted in time. Can I use this method as a similarity…

Kobe-Wan Kenobi
- 2,437
- 3
- 20
- 33
50
votes
2 answers
Is it unusual for the MEAN to outperform ARIMA?
I recently applied a range of forecasting methods (MEAN, RWF, ETS, ARIMA and MLPs) and found that MEAN did surprisingly well. (MEAN: where all future predictions are predicted as been equal to the arithmetic mean of the observed values.) MEAN even…

Andy T
- 1,014
- 3
- 10
- 16
50
votes
6 answers
Percentage of overlapping regions of two normal distributions
I was wondering, given two normal distributions with $\sigma_1,\ \mu_1$ and $\sigma_2, \ \mu_2$
how can I calculate the percentage of overlapping regions of two distributions?
I suppose this problem has a specific name, are you aware of any…

Ali Salehi
- 603
- 1
- 6
- 5
50
votes
17 answers
What is your favorite data visualization blog?
What is the best blog on data visualization?
I'm making this question a community wiki since it is highly subjective. Please limit each answer to one link.
Please note the following criteria for proposed answers:
[A]cceptable answers to…

Shane
- 11,961
- 17
- 71
- 89
49
votes
4 answers
How to interpret coefficients from a polynomial model fit?
I'm trying to create a second order polynomial fit to some data I have. Let's say I plot this fit with ggplot():
ggplot(data, aes(foo, bar)) + geom_point() +
geom_smooth(method="lm", formula=y~poly(x, 2))
I get:
So, a second order fit…

user13907
- 687
- 1
- 6
- 7
49
votes
3 answers
What is the distribution of the Euclidean distance between two normally distributed random variables?
Assume you are given two objects whose exact locations are unknown, but are distributed according to normal distributions with known parameters (e.g. $a \sim N(m, s)$ and $b \sim N(v, t))$. We can assume these are both bivariate normals, such that…

Nick
- 3,327
- 6
- 28
- 24
49
votes
4 answers
Approximate order statistics for normal random variables
Are there well known formulas for the order statistics of certain random distributions? Particularly the first and last order statistics of a normal
random variable, but a more general answer would also be appreciated.
Edit: To clarify, I am…

Chris Taylor
- 3,432
- 1
- 25
- 29
49
votes
3 answers
Derive Variance of regression coefficient in simple linear regression
In simple linear regression, we have $y = \beta_0 + \beta_1 x + u$, where $u \sim iid\;\mathcal N(0,\sigma^2)$. I derived the estimator:
$$
\hat{\beta_1} = \frac{\sum_i (x_i - \bar{x})(y_i - \bar{y})}{\sum_i (x_i - \bar{x})^2}\ ,
$$
where $\bar{x}$…

mynameisJEFF
- 1,583
- 4
- 24
- 29
49
votes
3 answers
AIC,BIC,CIC,DIC,EIC,FIC,GIC,HIC,IIC --- Can I use them interchangeably?
On p. 34 of his PRNN Brian Ripley comments that "The AIC was named by Akaike (1974) as 'An Information Criterion' although it seems commonly believed that the A stands for Akaike". Indeed, when introducing the AIC statistic, Akaike (1974, p.719)…

Hibernating
- 3,723
- 2
- 21
- 34
49
votes
4 answers
How to calculate relative error when the true value is zero?
How do I calculate relative error when the true value is zero?
Say I have $x_{true} = 0$ and $x_{test}$. If I define relative error as:
$$\text{relative error} = \frac{x_{true}-x_{test}}{x_{true}}$$
Then the relative error is always undefined. If…

okj
- 613
- 1
- 6
- 5
49
votes
2 answers
Interpretation of R's output for binomial regression
I'm quite new on this with binomial data tests, but needed to do one and now I´m not sure how to interpret the outcome. The y-variable, the response variable, is binomial and the explanatory factors are continuous. This is what I got when…

user40116
- 501
- 1
- 5
- 4
49
votes
3 answers
Empirical justification for the one standard error rule when using cross-validation
Are there any empirical studies justifying the use of the one standard error rule in favour of parsimony? Obviously it depends on the data-generation process of the data, but anything which analyses a large corpus of datasets would be a very…

DavidShor
- 1,281
- 1
- 11
- 18