Proper normalized cross-correlation

Question

I'm confused with the widely used approach to compute the normalized cross-correlation which is something like this:

Standardize the argument vectors (of equal length $n$).
Slide one over the other computing the dot-product of the intersection (correct at least for real vectors).
Optionally, define 95% confidence interval as $\pm\frac{2}{\sqrt{n}}$.

Autocorrelation of $\mathcal{N}(0,1)$ sample by this method looks like this:

Example from R Studio

I can achieve such a result using xcorr_common function from this gist:

Example from IPython Notebook, common version of xcorr

But it seems to me that this method is pretty confusing, because the arguments are standardized in total and not for each intersection. Moreover, the confidence interval derived from the total arguments length is incorrect for the lesser intersections.

I think, the correct (or at least a not-less-correct) result should look like this:

Example from IPython Notebook, special version of xcorr

This example is generated using xcorr function from the gist.

The confidence interval here is computed as $\pm\frac{2}{\sqrt{n-k}}$, where $k$ is the lag and $n-k$ is the available intersection length. Each intersection is standardized independently.

After this introduction I can ask my question: am I deadly wrong with this strange "proper" approach or am I inventing a wheel? :) Is it safe to work with cross-correlation calculated by the latter approach? Is it really more "proper" than the commonly used approach? If so, why even R uses the former method?

Thanks in advance.

* For basic info on confidence intervals for cross-correlation refer to:

Stats StackExchange answer by Rob Hyndman: https://stats.stackexchange.com/a/3128/43304
Guidelines for testing the autocorrelation or cross correlation

As it's pointed by whuber in his comment, this question is closely related to a number of already existing ones:

...and probably others with pretty good answers.

Are you perhaps re-asking [this question](http://stats.stackexchange.com/questions/81754) about computing the ACF? — whuber, Sep 22 '14 at 16:10
@whuber, thank you for the reference. It covers at least half of my question. I think one can say that the standard ACF implementation works good for relatively small $k$ (```lag.max```). Yet I'm still interested in whether the "sliding Pearson" approach can be really useful in some cases or from some point of view? — werediver, Sep 22 '14 at 18:26
Heh, I see... standardizing in total, using all-sample mean and std. dev. is better (for stationary processes?) because they do better estimate population mean and std. dev. But is $\pm\frac{2}{\sqrt{n-k}}$ a better 95% CI than just $\pm\frac{2}{\sqrt{n}}$ straigth line? — werediver, Sep 22 '14 at 19:37

Proper normalized cross-correlation

0 Answers0