Is the average of betas from Y ~ X and X ~ Y valid?

Question

I am interested in the relationship between two time series variables: $Y$ and $X$. The two variables are related to each other, and it's not clear from theory which one causes the other.

Given this, I have no good reason to prefer the linear regression $ Y = \alpha + \beta X$ over $ X = \kappa + \gamma Y $.

Clearly there is some relationship between $\beta$ and $\gamma$, though I recall enough statistics to understand that $\beta = 1/ \gamma$ is not true. Or perhaps it's not even close? I'm a bit hazy.

The problem is to decide how much of $X$ one ought to hold against $Y$.

I'm considering taking the average of $\beta$ and $1/ \gamma$ and using that as the hedge ratio.

Is the average of $\beta$ and $1/ \gamma$ a meaningful concept?

And as a secondary question (perhaps this should be another post), what is the appropriate way to deal with the fact that the two variables are related to each other -- meaning that there really isn't an independent and dependent variable?

The problem is not causality but instead the errors of measurement (it is just that often the dependent variable Y is the one with large measurement error, making "Y = a + B x + error" the common expression) Do you have an idea about the errors in the measurement of X and Y. — Sextus Empiricus, Jan 06 '19 at 12:04
To determine causality you need a controlled experiment. An experiment where you are able to change some variable independently from the others. (or a very unique situation where two populations can be considered/assumed equal except for one or more particular variables that are to be considered as "independent" variables) — Sextus Empiricus, Jan 06 '19 at 12:18
The exact values of $\beta$ and $\gamma$ can be found in [this answer of mine](https://stats.stackexchange.com/a/20556/6633) to [Effect of switching responses and explanatory variables...](https://stats.stackexchange.com/q/20553/6633), and, as you suspect, $\beta$ is not the reciprocal of $\gamma$, and averaging $\beta$ and $1/\gamma$ is not the right way to go. A pictorial view of what $\beta$ and $\gamma$ are minimizing is given in [Elvis's answer](https://stats.stackexchange.com/a/20560/6633) to the same question, and he introduces a"least rectangles" regression that you might want ..... — Dilip Sarwate, Jan 06 '19 at 15:43
..... (continued) to consider as an alternative to averaging $\beta$ and $1/\gamma$ or coming up with some other function of $\beta$ and $1/\gamma$ to measure what you are looking for. Be sure to read the comments by Moderator cardinal on Elvis's answer; they relate "least rectangles" regression to other, previously used, statistical methods. — Dilip Sarwate, Jan 06 '19 at 15:46
You are in the ideal scenario where the choice of technique has a direct, physically measurable impact; you can simply measure the out-of-sample hedging error for each estimate, and compare them. Also, typically optimal hedging is better handled by using a VECM model (see for example Gatarek & Johansen, 2014, *Optimal hedging with the cointegrated vector autoregressive model*), which does not require choosing to model Y as a function of X or vice-versa. — Chris Haug, Jan 06 '19 at 16:32
You might want to look at the geometric mean $\sqrt{\dfrac{\beta}{\gamma}}$ as a possibility (if they are both negative you might take the negative square root). Then look at $\dfrac{s_y}{s_x}$, which should be very similar — Henry, Jan 06 '19 at 18:37
@MartijnWeterings X and Y are both yields of bonds, and i am using official closes, so i would think that both are measured with about the same amount of error. Clearly that error is non-zero, as there is noise in any price even if it is measured precisely at a point in time. — ricardo, Jan 06 '19 at 22:15
@DilipSarwate Having read your links it seems that i'm looking for [total least squares](https://en.wikipedia.org/wiki/Total_least_squares#Scale_invariant_methods). If you'd like to post an answer i'll accept. — ricardo, Jan 06 '19 at 22:20
@ChrisHaug I was cautious about just picking the best fit as i feared that what's best might change over time. I was unaware of the VECM approach, I'll check out that paper. — ricardo, Jan 06 '19 at 22:22
@ricardo, then there may be easily *no* causal relationship between the two. Even when you observe a strong correlation. What is the point of your regression? You wish to test/find some underlying model? What are you gonna do with the $\beta$ that you obtain? — Sextus Empiricus, Jan 06 '19 at 23:21
@ricardo Note that I specified *out-of-sample* error, so not the (in-sample) *fit* of the model. And it is entirely possible for the optimal hedge ratio to change over time (especially if the relationship is not actually linear), that doesn't change the fact that figuring out the best hedging strategy can be most directly done by backtesting the model and observing the results. — Chris Haug, Jan 06 '19 at 23:53
@MartijnWeterings the aim is to figure out the appropriate amount of asset $Y$ to sell against a unit of asset $X$ (or vice versa). My problem is that the answer varies depending on if i begin with a unit of $X$ and ask how much $Y$ i should sell; or if i begin with a unit of $Y$ and ask how much $X$ i should sell. Clearly they can't both be *good* answers! — ricardo, Jan 07 '19 at 03:01
@Henry Now that i've had a think about it i can see that the geometric mean is **much better** than the arithmetic mean -- but it wasn't obvious until i did the math. I didn't see that the mean would vary depending on if i took the average of $\beta$ and $1/ \gamma$ or $\gamma$ and $1/ \beta$ — ricardo, Jan 07 '19 at 04:34
@ricardo It is unclear to me how you wish to use $\beta$ or $\gamma$ for the purpose of obtaining an optimal hedge ratio. Note in Xi'an's answer that these regression lines relate to the means of the conditional distributions of X given Y and Y given X (this is independent from whatever causal relation there may be). These lines are also different, even when you have perfect information about the joint distribution of X and Y.... — Sextus Empiricus, Jan 07 '19 at 08:23
.... Can you describe more exactly how the process goes? E.g. would you collect a sample of pairs of X and Y, then estimate a joint distribution for the two, then compute a probability density of profit for different ratio's of the two, then use some metric or loss function to compute which ratio is the best? What do you base your decisions on? (and why don't you consider time as well?) — Sextus Empiricus, Jan 07 '19 at 08:24
@MartijnWeterings I'm trying to solve for a hedge ratio. Say we have two stocks $X$ and $Y$ that are related. I wish to hold one unit of stock $X$ and would like to reduce the risk of owning that stock by selling $1 / \beta$ units of stock $Y$. I had hoped that the regression $Y = \alpha + \beta X$ would tell me home many units of $Y$ i needed to sell ... I was surprised to find that the answer changed when i swapped $X$ and $Y$. — ricardo, Jan 07 '19 at 08:49
@Henry your tips were perfect for a short term fix. I have thought about the math and compared the geometric means to the vol ratios in my data -- and i get the exact same thing. Is there some edge case where they are not the same thing? — ricardo, Jan 09 '19 at 22:09
ricardo: There are three issues with the geometric mean: (a) the sign; (b) what happens when $Y$ and $X$ are uncorrelated leading to $\beta=\gamma=0$; and (c) a philosophical issue that the estimated relationship between $X$ and $Y$ would actually take no account of the actual relationship between $X$ and $Y$ (apart from the sign) since you can draw the resulting regression line just knowing each of their means and the ratio of each of their standard deviations — Henry, Jan 09 '19 at 22:49

score 11 · Accepted Answer · edited Jan 07 '19 at 10:14

To see the connection between both representations, take a bivariate Normal vector: $$ \begin{pmatrix} X_1 \\ X_2 \end{pmatrix} \sim \mathcal{N} \left( \begin{pmatrix} \mu_1 \\ \mu_2 \end{pmatrix} , \begin{pmatrix} \sigma^2_1 & \rho \sigma_1 \sigma_2 \\ \rho \sigma_1 \sigma_2 & \sigma^2_2 \end{pmatrix} \right) $$ with conditionals $$X_1 \mid X_2=x_2 \sim \mathcal{N} \left( \mu_1 + \rho \frac{\sigma_1}{\sigma_2}(x_2 - \mu_2),(1-\rho^2)\sigma^2_1 \right)$$ and $$X_2 \mid X_1=x_1 \sim \mathcal{N} \left( \mu_2 + \rho \frac{\sigma_2}{\sigma_1}(x_1 - \mu_1),(1-\rho^2)\sigma^2_2 \right)$$ This means that $$X_1=\underbrace{\left(\mu_1-\rho \frac{\sigma_1}{\sigma_2}\mu_2\right)}_\alpha+\underbrace{\rho \frac{\sigma_1}{\sigma_2}}_\beta X_2+\sqrt{1-\rho^2}\sigma_1\epsilon_1$$ and $$X_2=\underbrace{\left(\mu_2-\rho \frac{\sigma_2}{\sigma_1}\mu_1\right)}_\kappa+\underbrace{\rho \frac{\sigma_2}{\sigma_1}}_\gamma X_1+\sqrt{1-\rho^2}\sigma_2\epsilon_2$$ which means (a) $\gamma$ is not $1/\beta$ and (b) the connection between the two regressions depends on the joint distribution of $(X_1,X_2)$.

How would I decide if the average of the two betas is a better measure of the hedge ratio than one or the other? — ricardo, Jan 06 '19 at 09:08
@ricardo By measuring the out-of-sample hedging error under each estimate, which is ultimately what you are trying to minimize. — Chris Haug, Jan 06 '19 at 16:35

score 3 · Answer 2 · answered Jan 06 '19 at 23:13

Converted from a comment.....

The exact values of $\beta$ and $\gamma$ can be found in this answer of mine to Effect of switching responses and explanatory variables in simple linear regression, and, as you suspect, $\beta$ is not the reciprocal of $\gamma$, and averaging $\beta$ and $\gamma$ (or averaging $\beta$ and $1/\gamma$) is not the right way to go. A pictorial view of what $\beta$ and $\gamma$ are minimizing is given in Elvis's answer to the same question, and in the answer, he introduces a "least rectangles" regression that might be what you are looking for. The comments following Elvis's answer should not be neglected; they relate this "least rectangles" regression to other, previously studied, techniques. In particular, note that Moderator chl points out that this method is of interest when it is not clear which is the predictor variable and which the response variable.

Sextus Empiricus · Answer 3 · 2019-01-08T01:06:09.803

$\beta$ and $\gamma$

As Xi'an noted in his answer the $\beta$ and $\gamma$ are related to each other by relating to the conditional means $X|Y$ and $Y|X$ (which in their turn relate to a single joint distribution) these are not symmetric in the sense that $\beta \neq 1/\gamma$. This is neither the case if you would 'know' the true $\sigma$ and $\rho$ instead of using estimates. You have $$\beta = \rho_{XY} \frac{\sigma_Y}{\sigma_X}$$ and $$\gamma = \rho_{XY} \frac{\sigma_X}{\sigma_Y}$$

or you could say

$$\beta \gamma = \rho_{XY}^2 \leq 1$$

See also simple linear regression on wikipedia for computation of the $\beta$ and $\gamma$.

It is this correlation term which sort of disturbs the symmetry. When the $\beta$ and $\gamma$ would be simply the ratio of the standard deviation $\sigma_Y/\sigma_X$ and $\sigma_X/\sigma_Y$ then they would indeed be each others inverse. The $\rho_{XY}$ term can be seen as modifying this as a sort of regression to the mean.

With perfect correlation $\rho_{XY} = 1$ then you can fully predict $X$ based on $Y$ or vice versa. The slopes will be equal $$\beta \gamma = 1$$
But with less than perfect correlation, $\rho_{XY} < 1$, you can not make those perfect predictions and the conditional mean will be somewhat closer to the unconditional mean, in comparison to a simple scaling by $\sigma_Y/\sigma_X$ or $\sigma_X/\sigma_Y$. The slopes of the regression lines will be less steep. The slopes will be not related as each others reciprocal and their product will be smaller than one $$\beta \gamma < 1$$

Is a regression line the right method?

You may wonder whether these conditional probabilities and regression lines is what you need to determine your ratios of $X$ and $Y$. It is unclear to me how you would wish to use a regression line in the computation of an optimal ratio.

Below is an alternative way to compute the ratio. This method does have symmetry (ie if you switch X and Y then you will get the same ratio).

Alternative

Say, the yields of bonds $X$ and $Y$ are distributed according to a multivariate normal distribution$^\dagger$ with correlation $\rho_{XY}$ and standard deviations $\sigma_X$ and $\sigma_Y$ then the yield of a hedge that is sum of $X$ and $Y$ will be normal distributed:

$$H = \alpha X + (1-\alpha) Y \sim N(\mu_H,\sigma_H^2)$$

were $0 \leq \alpha \leq 1$ and with

$$\begin{array}{rcl} \mu_H &=& \alpha \mu_X+(1-\alpha) \mu_Y \\ \sigma_H^2 &=& \alpha^2 \sigma_X^2 + (1-\alpha)^2 \sigma_Y^2 + 2 \alpha (1-\alpha) \rho_{XY} \sigma_X \sigma_Y \\ & =& \alpha^2(\sigma_X^2+\sigma_Y^2 -2 \rho_{XY} \sigma_X\sigma_Y) + \alpha (-2 \sigma_Y^2+2\rho_{XY}\sigma_X\sigma_Y) +\sigma_Y^2 \end{array} $$

The maximum of the mean $\mu_H$ will be at $$\alpha = 0 \text{ or } \alpha=1$$ or not existing when $\mu_X=\mu_Y$.

The minimum of the variance $\sigma_H^2$ will be at $$\alpha = 1 - \frac{\sigma_X^2 -\rho_{XY}\sigma_X\sigma_Y}{\sigma_X^2 +\sigma_Y^2 -2 \rho_{XY} \sigma_X\sigma_Y} = \frac{\sigma_Y^2-\rho_{XY}\sigma_X\sigma_Y}{\sigma_X^2+\sigma_Y^2 -2 \rho_{XY} \sigma_X\sigma_Y} $$

The optimum will be somewhere in between those two extremes and depends on how you wish to compare losses and gains

Note that now there is a symmetry between $\alpha$ and $1-\alpha$. It does not matter whether you use the hedge $H=\alpha_1 X+(1-\alpha_1)Y$ or the hedge $H=\alpha_2 Y + (1-\alpha_2) X$. You will get the same ratios in terms of $\alpha_1 = 1-\alpha_2$.

Minimal variance case and relation with principle components

In the minimal variance case (here you actually do not need to assume a multivariate Normal distribution) you get the following hedge ratio as optimum $$\frac{\alpha}{1-\alpha} = \frac{var(Y) - cov(X,Y)}{var(X)-cov(X,Y)}$$ which can be expressed in terms of the regression coefficients $\beta = cov(X,Y)/var(X)$ and $\gamma = cov(X,Y)/var(Y)$ and is as following $$\frac{\alpha}{1-\alpha} = \frac{1-\beta}{1-\gamma}$$

In a situation with more than two variables/stocks/bonds you might generalize this to the last (smallest eigenvalue) principle component.

Variants

Improvements of the model can be made by using different distributions than multivariate normal. Also you could incorporate the time in a more sophisticated model to make better predictions of future values/distributions for the pair $X,Y$.

^{$\dagger$ This is a simplification but it suits the purpose of explaining how one can, and should, perform the analysis to find an optimal ratio without a regression line.}

I am sorry, but as a physicist, I know too little about the language (long, short, holdings, etc.) related to stocks, bonds and finance. If you could cast it in simpler language I might be able to understand it and work with it. My answer is just a very simple expression that is unaware of the details and possibilities how to express hedging and stocks, but it shows the basic principle how you can get away from the use of a regression line (go back to first principles, express the model for profit which is at the core instead of using regression lines whose relevance is not directly clear). — Sextus Empiricus, Jan 07 '19 at 11:41
I think i understand. The problem is that 1/ρ_{XY} \ne p_{XY}$. indeed, $p_{XY}$ often changes quite and bit when we take the inverse. Your alternative is close to the case I am thinking about, but i do want to check one thing: does this allow non-negative holdings? Adopting your terminology, i'd have a unit holding of bond X, and a negative holding of Y. Say long one unit of bond X and short (say) 1.2 units of bond Y ... but it could be 0.2 units or 5 units, depending on the math. — ricardo, Jan 07 '19 at 11:42
long means that i make 1% on a bond if the price increases by ~1%; short means that i lose ~1% on a bond if the price increases by ~1%. So the idea is that i am long one unit of one bond (so i benefit from an appreciation) and am short some amount of the other bond (so i lose from an appreciation). — ricardo, Jan 07 '19 at 11:46
*"The problem is to decide how much of X one ought to hold against Y."* My problem with this is that there is no explanation/model/expression how you decide about this. How do you define losses and gains and how much do you value them? — Sextus Empiricus, Jan 07 '19 at 11:46
Are there costs associated with being short and long? I imagine that you have a given amount to invest and this limits how much you can be short/long in those bonds. Then based on your previous knowledge you can estimate/determine the distribution of losses/gains for whatever combination on that limit. Finally, based on some function that determines how you value losses and gains (this expresses why/how you hedge) you can decide which combination to choose. — Sextus Empiricus, Jan 07 '19 at 12:04
I value gains and losses the same and am neutral w.r.t both assets. You can think of series $X$ and $Y$ as being composed of prices at points in time $x_t$ and $y_t$. If $x_t$ changes, we make a profit or loss of $\Delta x_t$. The problem is to find the $\beta$ such that in most cases the gain in $\Delta x_t$ is matched by the loss due to $\Delta y_t$; of course we would not hold the spread if we didn't expect the spread to move and to make money over time -- but the point is to have a pure spread view. Holding costs are a complication for later. — ricardo, Jan 07 '19 at 12:10
*"I value gains and losses the same"* why then do you hedge? *"The problem is to find the β such that in most cases the gain in Δx_t is matched by the loss due to Δy_t"* this is still a bit ambiguous, there are multiple ways to match gains and losses. But say you wish to minimize P(hedge gives a loss), then you could use the expression that I give for $H$ (where you have to flip some signs for the difference in being short or long) and compute, using estimates for $\rho$ and the $\sigma$, which ratio gives the lowest probability of a loss. — Sextus Empiricus, Jan 07 '19 at 12:24
@MartijnWeterings The classical definition of "optimal hedge ratio" is the one in which the hedged portfolio (what you've called $H$) has minimal variance. It's quite possible that ricardo is looking for some other loss function, but that is where the regression approach comes from. — Chris Haug, Jan 07 '19 at 13:12
In that minimal variance case mentioned by Chris, the expression for the variance at the end of the answer (which does not really require a multivariate normal distribution and works more generally) is minimal for the following hedge ratio $$\frac{\alpha}{1-\alpha} = \frac{var(Y) - cov(X,Y)}{var(X)-cov(X,Y)}$$ which is in terms of the coefficient $\beta = cov(X,Y)/var(X)$ and $\gamma = cov(X,Y)/var(Y)$ can be expressed as following $$\frac{\alpha}{1-\alpha} = \frac{1-\beta}{1-\gamma}$$ — Sextus Empiricus, Jan 07 '19 at 13:22
I don't know if this adds anything useful (because it follows directly from your equations above) but I think it's interesting to point out that the geometric mean of the two slope coefficients is equal two the absolute value of the correlation coefficient: $\sqrt{\beta \cdot \gamma} = |{\rho}_{xy}|$ and thus the absolute value of the arithmetic mean of $\beta$ and $\gamma$ is greater than (or equal to) $|{\rho}_{xy}|$ — statmerkur, Jan 11 '19 at 08:15

score 1 · Answer 4 · answered Jan 06 '19 at 11:12

Perhaps the approach of "Granger causality" might help. This would help you to assess whether X is a good predictor of Y or whether X is a better of Y. In other words, it tells you whether beta or gamma is the thing to take more seriously. Also, considering that you are dealing with time series data, it tells you how much of the history of X counts towards the prediction of Y (or vice versa).

Wikipedia gives a simple explanation: A time series X is said to Granger-cause Y if it can be shown, usually through a series of t-tests and F-tests on lagged values of X (and with lagged values of Y also included), that those X values provide statistically significant information about future values of Y.

What you do is the following:

regress X(t-1) and Y(t-1) on Y(t)
regress X(t-1), X(t-2), Y(t-1), Y(t-2) on Y(t)
regress X(t-1), X(t-2), X(t-3), Y(t-1), Y(t-2), Y(t-3) on Y(t)

Continue for whatever history length might be reasonable. Check the significance of the F-statistics for each regression. Then do the same the reverse (so, now regress the past values of X and Y on X(t)) and see which regressions have significant F-values.

A very straightforward example, with R code, is found here. Granger causality has been critiqued for not actually establishing causality (in some cases). But it seems that you application is really about "predictive causality," which is exactly what the Granger causality approach is meant for.

The point is that the approach will tell you whether X predicts Y or whether Y predicts X (so you no longer would be tempted to artificially--and incorrectly--compound the two regression coefficients) and it gives you a better prediction (as you will know how much history of X and Y you need to know to predict Y), which is useful for hedging purposes, right?

I have a strong theoretical reason to believe that neither is truly a cause, and that even if one became a cause that it would not remain true over time. So i don't think that Granger Causailty is the answer in this case. I've upvoted the answer in any case, as it is useful -- esp. the R code. — ricardo, Jan 07 '19 at 03:04
That is why I explicitly mention that "Granger causality has been critiqued for not actually establishing causality (in some cases)." It seems to me that your question is more about establishing "predictive causality," which is what Granger causality is meant for. In addition, Granger's approach uses the information in your time series data, which are a waste not to use if you have them. Of course, you can (should?) re-estimate the effects over time. I expect that the Granger effects are more stable than cross-sectional OLS (you can test this beforehand, using historical data). HTH — Steve G. Jones, Jan 07 '19 at 07:04