3

Working in python, I get data at regular interval. The data contains some features, $X_1,\dots,X_p$. I am trying to get an online algorithm to build correlation matrixes. The naive approach of keeping track of the last $n$ instances of $X$ is working but rather slow.

I have found some way to get variance and covariance of the whole sample in one or two passes, but it doesn't seems to be what I am looking for. I am looking for a way to update some current calculation with mostly the last value (and removing the oldest values) in a running manner. Example: for a running sum, you can keep track of a cumulative sum, only adding / removing one value at each time step.

Any idea how to perform that? any idea how to perform that in a vectorized way so as to get matrixes as outputs?

Richard Hardy
  • 54,375
  • 10
  • 95
  • 219
lcrmorin
  • 1,380
  • 16
  • 31
  • Is $n$ small enough to store the past $n$ observations in RAM? – whuber Dec 16 '21 at 20:07
  • Is there some reason why you can't just compute an online covariance matrix (which you say you know how to do), and then compute the correlations as $cov(x,y)/\sqrt{var(x)var(y)}$? – Mike Hawk Dec 17 '21 at 02:04
  • @whuber: yes it is. I am able to keep track of the last n observation in an array and perform the whole correlation calculation at each time step. However it is rather slow and I am looking for something faster, where I could just add an observation (and remove the oldest one). Kind of the running sum exemple I added. – lcrmorin Dec 17 '21 at 08:41
  • @MikeHawk: it seems that the 'online' algo that is mosty discussed online is a way to calculate correlation passing trough a sample of n instances. Which is not what I am looking for. I am looking for a way to update a correlation matrix given a new observation (and removing the oldest observation). – lcrmorin Dec 17 '21 at 08:44
  • 1
    There are standard formulas, as @MikeHawk alludes, for efficiently updating a covariance matrix with one new observation. They can be adapted to *removing* an observation, too. Apply them both to obtain your online algorithm. – whuber Dec 17 '21 at 15:04
  • 1
    You might think about an updating Woodbury method applied to the problem. https://mathworld.wolfram.com/WoodburyFormula.html . There is also a formula for running correlations that works with a running mean. – EngrStudent Dec 17 '21 at 16:05
  • 1
    In addition to implementing a numerically more stable algorithm as offered by @Onyambu here, consider periodically recomputing the window statistics directly from the data in the buffer, thereby restarting the update process. If you don't, eventually your statistics will follow a random walk away from their true values and ultimately become useless (you can even obtain negative variances if you're not careful). The vector version of this question is not a complication: just update each entry in the variance-covariance matrix separately. – whuber Dec 20 '21 at 18:55

3 Answers3

2

If I understand correctly, you have a sequence of observations $\{(X^t_1,\dots, X^t_p)\}_t$ and you want to compute the matrix

$$Corr(t)_{ij}=\sum_{t'=t-n+1}^t (X^{t'}_i-\mu_i^{t})(X^{t'}_j-\mu_j^{t})\left(\sum_{t'=t-n+1}^t (X_i^{t'}-\mu_i^t)^2\sum_{t'=t-n+1}^t (X^{t'}_j-\mu_j^t)^2\right)^{-1/2}$$ where $\mu_i^t=n^{-1}\sum_{t'=t-n+1}^t X^{t'}_i$ is the n-lag mean. Like I mentioned above, you can write this in terms of the lag-n covariance matrix in the usual way, so it is enough to compute the covariance $Cov(t)_{ij}=n^{-1}\sum_{t'=t-n+1}^t (X^{t'}_i-\mu_i^{t})(X^{t'}_j-\mu_j^{t})$, from which you can read off the correlations.

I think the easiest way to do it is to keep track of both the mean vector $\mu^t_i$ as well as the "uncentered covariances" $CovU(t)_{ij}=n^{-1}\sum_{t'=t-n+1}^t X^{t'}_iX^{t'}_j$. Each of these can be updated in responses to a new observation simply by adding and removing one value at a time.

Then once you have updated these values, you can read off the covariance by the formula $Cov(t)_{ij}=CovU(t)_{ij}-\mu^t_i\mu^t_j$ and likewise the correlations by $Corr(t)_{ij}=Cov(t)_{ij}/\sqrt{Cov(t)_{ii}Cov(t)_{jj}}$.

Also, depending on what you ultimately want to do, it may be worth trying an exponentially weighted moving average, rather than a hard cutoff. This has the advantage that you only have to store the previous estimate (rather than the previous estimate, as well as the previous n observations), and the update equation is considerably simpler.

Mike Hawk
  • 1,094
  • 5
  • 13
  • This is what I am currently doing after noticing that from colinearity cov(x,y) = E(xy) - E(x)E(y). So I end up doing mostly running means and running std. I am a bit concerned about wikipedia calling the formula susceptible to 'catastrophic cancelation'. I think I can build the EWMA pretty easily but not sure about the std. – lcrmorin Dec 17 '21 at 16:35
  • @ lcrmorin re: catastrophic cancellation, I am not sure what you mean, could you link the wiki article? to use the EWMA for std, you would just use the same exponential update rule as for the runnning means $E(xy)\to \alpha xy+(1-\alpha) E(xy)$, and then compute the covariance from $Exy, Ex$ and $Ey$. – Mike Hawk Dec 17 '21 at 16:48
  • Okay I figured I don't need to keep track of Cov_ii (and cov_jj) independantly and can just reuse the diagonal of cov_ij... working with matrixes I just keep track of E(XtX) and E(X). I'll accept your answer as it put me on the right tracks. I already got a 6x speed improvement over a numpy corrcoef baseline. Remaining speed will more likely come from code refactoring than algo design. – lcrmorin Dec 17 '21 at 17:26
  • (and bottleneck coming from other parts of the codes). Any idea or ressources about online version of more advanced calculation (quantile ? eigenvalues of a correlation matrix ? correlation based clustering ?). – lcrmorin Dec 17 '21 at 17:28
  • 2
    Quantiles are difficult to update, [but not impossible](https://stats.stackexchange.com/questions/7959). For monitoring the eigenvalues, look into [online methods to update the SVD.](https://stats.stackexchange.com/questions/177007). – whuber Dec 17 '21 at 21:57
  • 2
    In some cases this will fail badly due to cancellation of numerical roundoff errors when subtracting and adding squares of values to sums of squares of other values. This is especially a concern when repeating the calculation, potentially a huge number of times. That's why the more detailed approach of https://stats.stackexchange.com/a/557530/919 is necessary. – whuber Dec 20 '21 at 18:51
2

Yes, You can compute the correlation without using the whole dataset as follows.

Let $$ \begin{aligned} X = x_{ij}&~ i=1,\dots,n;~j = 1,\dots,p\text{ Whole dataset}\\ X_{1j}& \implies \text{oldest observation to be removed at feature }X_j\\ X_{(+)j}&\implies \text{new observation to be added at feature }X_j\\ \mu^{(t)}_j &\implies \text{the current mean of feature }X_j\\ \sigma^{(t)}_{jk} &\implies \text{the current covariance between }X_j \text{ and } X_k\\ \end{aligned} $$ More definitions: Suppose $$ \begin{aligned} D_j =& X_{(+)j} - X_{1j}\quad \text{the difference between the new observation and the old observation}\\ E_j =& X_{(+)j} - \mu_{j}^{(t+1)}\quad \text{the difference between the new observation and the updated mean}\\ F_j = & X_{1j} - \mu_{j}^{(t+1)}\quad \text{the difference between the old observation and the updated mean} \end{aligned} $$

Then:

$$ \begin{aligned} \mu_j^{(t+1)} &= \frac{1}{n}\sum_{i=1}^n X_{ij} + \frac{X_{(+)j} - X_{1j}}{n}=\mu_{j}^{(t)} + \frac{D_j}{n} \\ \end{aligned} $$ And $$ \begin{aligned} \sigma_{ij}^{(t+1)} =& \frac{1}{n-1}\sum_{i=1}^n \left(X_{ij}-\mu_{j}^{(t+1)}\right)\left(X_{ik}-\mu_{k}^{(t+1)}\right) + \frac{\left(X_{(+)j} - \mu_{j}^{(t+1)}\right)\left(X_{(+)k} - \mu_{k}^{(t+1)}\right)}{n-1}- \frac{\left(X_{1j} - \mu_{j}^{(t+1)}\right)\left(X_{1k} - \mu_{k}^{(t+1)}\right)}{n-1}\\ =& \frac{1}{n-1}\sum_{i=1}^n \left(X_{ij}-\mu_{j}^{(t)} - \frac{D_j}{n}\right)\left(X_{ik}-\mu_{k}^{(t)} - \frac{D_k}{n}\right) + \frac{E_jE_k-F_jF_k}{n-1}\\ =&\sigma_{jk}^{t} + \frac{D_jD_k}{n(n-1)} + \frac{E_jE_k-F_jF_k}{n-1}\\ \end{aligned} $$

Thus the Updating formula in Matrix notations to be used in Python

$$ \begin{aligned} D =&~ X_{(+)} - X_{1}\quad \text{A vector of all the feature differences} \\ \mu^{t+1} =& ~\mu^{(t)} + \frac{D}{n}\\ E =& ~X_{(+)} - \mu^{(t+1)}\\ F = & ~X_{1} - \mu^{(t+1)}\\ \Sigma^{(t+1)} =& ~\Sigma^{(t)} + \frac{DD^\top}{n(n-1)} + \frac{EE^\top - FF^\top}{n-1}\\ S =& \sqrt{Diag(\Sigma^{(t+1)})}\\ R^{(t+1)} = & \Sigma^{(t+1)} {\oslash} SS^\top\quad \text{where }\oslash \text{ is the elementwise division.} \end{aligned} $$

Python code:

Note that we do have the data, previous mean, previous variance, number of observations:

import numpy as np
#Create a fake data
np.random.seed(48)
X = np.random.normal(np.random.uniform(-20, 100, 4), 
                        np.random.uniform(2,5,4), (20, 4))

n = X.shape[0]# Will never Change!! You adding one obs and removing one
mu = X.mean(0)
covar = np.cov(X.T)
corr = np.corrcoef(X.T) # Current Correlation(Not Updated Direcctly)

# Convert covariance matrix to correlation
def cov2cor(Sigma):
    S = np.sqrt(np.diag(Sigma))[:, None]
    return Sigma / S.dot(S.T)
   
# Function to update the mean and covariance. 
def update(Xnew, Xold, mu, cov, n):
    D = (Xnew - Xold).ravel()[:, None]
    mu_new = mu.ravel()[:, None] + D/n
    E = Xnew.ravel()[:, None] - mu_new
    F = Xold.ravel()[:, None] - mu_new
    cov_new = cov + (D @ D.T/n + E @ E.T - F @ F.T)/(n-1)
    return {'mu':mu_new.ravel(), 'covar':cov_new}


new_obs = np.array([-10, 100, 50, 30])

new = update(new_obs,X[0], mu, covar, n = X.shape[0]) # X[0] is the one to be removed

updated_cor = cov2cor(new['covar'])
print(updated_cor)
#compare to

## Updated Data:
Y = np.r_[X[1:], new_obs[None, :]] # Used Y instead of X. just to keep the two different
print(np.corrcoef(Y.T))

[[ 1.          0.30645714  0.06955658 -0.22865679]
 [ 0.30645714  1.          0.51049261  0.45552015]
 [ 0.06955658  0.51049261  1.          0.6746911 ]
 [-0.22865679  0.45552015  0.6746911   1.        ]]

[[ 1.          0.30645714  0.06955658 -0.22865679]
 [ 0.30645714  1.          0.51049261  0.45552015]
 [ 0.06955658  0.51049261  1.          0.6746911 ]
 [-0.22865679  0.45552015  0.6746911   1.        ]]
```
KU99
  • 339
  • 1
  • 6
1

Disclaimer: This is the notation taught to me by an engineering professor not a statistician.

Answer:
This is the mean over N elements:

$$ \mu_{N} = \frac{1}{N}\Sigma_{i=1}^{N} x_i$$

We can split out the last element like this: $$ \mu_{N} = \frac{1}{N}\Sigma_{i=1}^{N-1} x_i + \frac{1}{N} x_N$$

We can re-contrive the sum term as the mean of $N-1$ elements: $$ \mu_{N} = \left(\frac{N-1}{N}\right)\left(\frac{1}{N-1}\Sigma_{i=1}^{N-1} x_i \right) + \frac{1}{N} x_N$$

With substitution this becomes: $$ \mu_{N} = \left(\frac{N-1}{N}\right)\left(\mu_{N-1} \right) + \frac{1}{N} x_N$$

So we have a 2-term update running mean, also called EWMA.

There is a similar definition for the variance: $$ \sigma_{N}^2 = \frac{1}{N}\Sigma_{i=1}^{N-1} \left(x_i - \mu_{N}\right)^2 $$

We split out the last element as: $$ \sigma_{N}^2 = \frac{1}{N}\Sigma_{i=1}^{N} \left(x_i - \mu_{N}\right)^2 + \frac{1}{N} \left(x_N - \mu_{N}\right)^2 $$

We can then recontrive the sum over $N-1$ elements: $$ \sigma_{N}^2 = \frac{N-1}{N} \left( \sigma_{N-1}^2 \right) + \frac{1}{N} \left(x_N - \mu_{N}\right)^2 $$

You could extend this to multiple variables with some algebra, and it should be relatively quick to update. You will need a measure of central tendency, so you might want to use the running mean to estimate the true mean.

EngrStudent
  • 8,232
  • 2
  • 29
  • 82
  • 2
    Note that $\mu$ is changing. In the update of $\sigma^2_N$ you never took this into consideration. You just used $\mu$, but we have $\mu_N$ not $\mu$. U – KU99 Dec 17 '21 at 17:21
  • Yup. There is a lot of baggage and hidden nuance in this approach. You are correct. – EngrStudent Dec 17 '21 at 18:10