Most Popular

1500 questions
50
votes
2 answers

What is the distribution of the sum of non i.i.d. gaussian variates?

If $X$ is distributed $N(\mu_X, \sigma^2_X)$, $Y$ is distributed $N(\mu_Y, \sigma^2_Y)$ and $Z = X + Y$, I know that $Z$ is distributed $N(\mu_X + \mu_Y, \sigma^2_X + \sigma^2_Y)$ if X and Y are independent. But what would happen if X and Y were not…
JCWong
  • 1,392
  • 1
  • 15
  • 29
50
votes
6 answers

What algorithm is used in linear regression?

I usually hear about "ordinary least squares". Is that the most widely used algorithm used for linear regression? Are there reasons to use a different one?
50
votes
5 answers

Is minimizing squared error equivalent to minimizing absolute error? Why squared error is more popular than the latter?

When we conduct linear regression $y=ax+b$ to fit a bunch of data points $(x_1,y_1),(x_2,y_2),...,(x_n,y_n)$, the classic approach minimizes the squared error. I have long been puzzled by a question that will minimizing the squared error yield the…
Tony
  • 1,583
  • 4
  • 15
  • 20
50
votes
5 answers

Dynamic Time Warping Clustering

What would be the approach to use Dynamic Time Warping (DTW) to perform clustering of time series? I have read about DTW as a way to find similarity between two time series, while they could be shifted in time. Can I use this method as a similarity…
Kobe-Wan Kenobi
  • 2,437
  • 3
  • 20
  • 33
50
votes
2 answers

Is it unusual for the MEAN to outperform ARIMA?

I recently applied a range of forecasting methods (MEAN, RWF, ETS, ARIMA and MLPs) and found that MEAN did surprisingly well. (MEAN: where all future predictions are predicted as been equal to the arithmetic mean of the observed values.) MEAN even…
Andy T
  • 1,014
  • 3
  • 10
  • 16
50
votes
6 answers

Percentage of overlapping regions of two normal distributions

I was wondering, given two normal distributions with $\sigma_1,\ \mu_1$ and $\sigma_2, \ \mu_2$ how can I calculate the percentage of overlapping regions of two distributions? I suppose this problem has a specific name, are you aware of any…
Ali Salehi
  • 603
  • 1
  • 6
  • 5
50
votes
17 answers

What is your favorite data visualization blog?

What is the best blog on data visualization? I'm making this question a community wiki since it is highly subjective. Please limit each answer to one link. Please note the following criteria for proposed answers: [A]cceptable answers to…
Shane
  • 11,961
  • 17
  • 71
  • 89
49
votes
4 answers

How to interpret coefficients from a polynomial model fit?

I'm trying to create a second order polynomial fit to some data I have. Let's say I plot this fit with ggplot(): ggplot(data, aes(foo, bar)) + geom_point() + geom_smooth(method="lm", formula=y~poly(x, 2)) I get: So, a second order fit…
user13907
  • 687
  • 1
  • 6
  • 7
49
votes
3 answers

What is the distribution of the Euclidean distance between two normally distributed random variables?

Assume you are given two objects whose exact locations are unknown, but are distributed according to normal distributions with known parameters (e.g. $a \sim N(m, s)$ and $b \sim N(v, t))$. We can assume these are both bivariate normals, such that…
Nick
  • 3,327
  • 6
  • 28
  • 24
49
votes
4 answers

Approximate order statistics for normal random variables

Are there well known formulas for the order statistics of certain random distributions? Particularly the first and last order statistics of a normal random variable, but a more general answer would also be appreciated. Edit: To clarify, I am…
49
votes
3 answers

Derive Variance of regression coefficient in simple linear regression

In simple linear regression, we have $y = \beta_0 + \beta_1 x + u$, where $u \sim iid\;\mathcal N(0,\sigma^2)$. I derived the estimator: $$ \hat{\beta_1} = \frac{\sum_i (x_i - \bar{x})(y_i - \bar{y})}{\sum_i (x_i - \bar{x})^2}\ , $$ where $\bar{x}$…
49
votes
3 answers

AIC,BIC,CIC,DIC,EIC,FIC,GIC,HIC,IIC --- Can I use them interchangeably?

On p. 34 of his PRNN Brian Ripley comments that "The AIC was named by Akaike (1974) as 'An Information Criterion' although it seems commonly believed that the A stands for Akaike". Indeed, when introducing the AIC statistic, Akaike (1974, p.719)…
Hibernating
  • 3,723
  • 2
  • 21
  • 34
49
votes
4 answers

How to calculate relative error when the true value is zero?

How do I calculate relative error when the true value is zero? Say I have $x_{true} = 0$ and $x_{test}$. If I define relative error as: $$\text{relative error} = \frac{x_{true}-x_{test}}{x_{true}}$$ Then the relative error is always undefined. If…
okj
  • 613
  • 1
  • 6
  • 5
49
votes
2 answers

Interpretation of R's output for binomial regression

I'm quite new on this with binomial data tests, but needed to do one and now I´m not sure how to interpret the outcome. The y-variable, the response variable, is binomial and the explanatory factors are continuous. This is what I got when…
49
votes
3 answers

Empirical justification for the one standard error rule when using cross-validation

Are there any empirical studies justifying the use of the one standard error rule in favour of parsimony? Obviously it depends on the data-generation process of the data, but anything which analyses a large corpus of datasets would be a very…
DavidShor
  • 1,281
  • 1
  • 11
  • 18