2

I need to perform the average of a set of correlation coefficients $\rho_i$ with $i=1,\ldots,m$. I follow the standard prescription:

  1. apply the Fisher z-transformation to my $\rho_i$ ($z_i$ are the z-transformed correlations)
  2. perform the mean $\bar{z}$ of the $z_i$
  3. and then apply the inverse transformation to $\bar{z}$ to obtain the mean correlation $\bar{\rho}$.

My question is about the mean at point 2. It is known that the $z_i$s approximately follow a Gaussian distribution with standard deviation $\sigma_{z_i}=1/\sqrt{n_i-3}$ where $n_i$ is the sample size which has been used to estimate $\rho_i$. I am working in a setup (that I cannot avoid) with samples with very different $n_i$ and this translates in significantly different standard deviations of my $z_i$ ($n_{max}/n_{min}\approx 20$).

Should I do a simple mean of my $z_i$ or it is preferable, in a such a scenario, to perform a weighted mean of the $z_i$ where the weights are the inverse of $\sigma_{z_i}$, namely $\bar{z} = \sum_i z_i\sigma_{z_i}^{-1}/\sum_i\sigma_{z_i}^{-1}$?

I went through the literature of the Fisher transform but I cannot find any clue on this specific point.

A way to rephrase my question is: are there any known results on an extra step of stabilization of the correlation sample variance by performing a weighted mean of the $z_i$ using the the inverse of the $\sigma_{z_i}$ as weights because, differently from $\sigma_{\rho_i}$, the $\sigma_{z_i}$ have the nice feature that they do not depend on $\rho_i$

user1234383
  • 123
  • 4
  • The idea behind the Fisher transformation is that it stabilizes variance ... – kjetil b halvorsen Dec 27 '18 at 13:09
  • Yes definitely but I was wondering if there were results on some extra stabilization steps when the transformed variables still have a significant residual dependence on $n$ (NB: it is not dramatic since the ratio of the max/min of the std dev of the z variables goes as $\sqrt(20)$ in my case). – user1234383 Dec 27 '18 at 13:17
  • 2
    Possible duplicate of [How can I pool correlations?](https://stats.stackexchange.com/questions/313088/how-can-i-pool-correlations) – kjetil b halvorsen Dec 27 '18 at 13:25
  • 2
    Thanks for the link but they propose to stabilize the correlation doing a weighted mean of the $\rho_i$ with the inverse of the variance but this variance depends on the correlation itself, I would prefer to avoid the suggested iterative procedure. My question is related but I do not think it is a duplicate. It is simply if there are there any known results about doing this weighted mean after the Fisher transformation. Anyway, the link is useful because it provides the proper way to google for this subject, i.e. 'pool correlation'. if I find any results, I'll update the question. – user1234383 Dec 27 '18 at 13:47
  • 1
    Could you explain what quantity this average is intended to represent or estimate? That would help determine what approaches are appropriate and valid. – whuber Dec 27 '18 at 14:06
  • I need to average the auto-correlation functions (acf) of $m$-time series of stock returns (nb: I am working with extremely noisy series). I have therefore the need to perform $T$ averages - one for each value of the lag of the acf - where $T$ is the maximum lag I estimate in my autocorrelation function. Unfortunately the size of the samples I use to estimate the acf are extremely heterogeneous - essentially for some stocks I have shorter time series - and I was looking for the best way to pool them to have a result as clean as possible. – user1234383 Dec 27 '18 at 15:29
  • OK, but what is this average intended to represent? Your description of extreme heterogeneity suggests an average might be meaningless. It's not at all evident that it would reflect any kind of autocorrelation (and if so, of what time series exactly?). – whuber Dec 27 '18 at 16:17
  • [1/2] As already said in my previous comment, the time series are time series of stock returns (stock returns = price changes of stocks and a stock = Apple, Amazon, GE, JPMorgan, etc, quoted firms). Stock return series are extremely noisy and from a single auto-correlation function it is mostly impossible to extract any information about the correlation structure of the series. Here my need to average over several stocks (the $m$ of my previous notation) in order to increase the signal to noise ratio if the pattern I want to extract and measure. – user1234383 Dec 28 '18 at 09:07
  • [2/2] I know how to average correlation (Fisher transform, etc), I did it several times but this time I work with time series of significantly different lengths ($n_i$ in the notation of the question) and I was only wondering if it was useful/common practice to do an extra stabilization of the variance of the Fisher transformed variables. Anyway I am running a little toy model (where I know the true correlation pattern) which mimics the regime of heterogeneous $n_i$ I wrk with, to check if doing a plain mean or a weighted mean of the $z:i$ provide the same results. – user1234383 Dec 28 '18 at 09:17
  • 1
    @whuber the heterogeneity is simply in the lengths of the series, not in the underlying correlation structure I want to extract, in other words, I am assuming that all stocks are realization of the same process with same acf. It follows that pooling the correlation is meaningful, I am only asking if I should do an extra step on top of the variance stabilization produced by the Fisher transformation. – user1234383 Dec 28 '18 at 09:28
  • 1
    Thank you. Your final comment is key, I think, because it points the way to a simple, principled solution. That solution would be equivalent to stringing all the time series out into a single series, with each component separated by a long list of NA values, and computing its ACF as usual. Of course you wouldn't actually implement the calculation this way, but it shows what the correct formula ought to be under your (extremely strong) assumptions about these returns. – whuber Dec 28 '18 at 14:51

0 Answers0