32

NOTE: This question has originally been posted in MSE, but it did not generate any interest. It was first posted there, because the question itself is a pure matrix-algebra question.
Nevertheless, since the motive has to do with statistics and econometrics, I am posting the question on Cross Validated also, in the hope that some statistics/matrix algebra savvy brain will have something to contribute.

The framework is as follows: We have a cross-sectional i.i.d. sample $\{\mathbf y, \mathbf X\}$, where $\mathbf y$ is a $N \times 1$ column vector, and $\mathbf X$ is a $N\times K$ matrix. We postulate a linear relationship between $\mathbf y$ and $\mathbf X$,

$$\mathbf y = \mathbf X \beta + \mathbf u $$ where $\mathbf u$ is white-noise with variance $\sigma^2$, and exogenous to the regressors in the $\mathbf X$ matrix, and $\beta$ is a $K\times 1$ column vector of unknown constant coefficients. Under this assumption, the OLS estimator is unbiased and consistent. Now assume $N$ is "large", say $O \left( 10^{4} \right)$ or more (samples that large have started to appear in the econometrics field also). Then a researcher could conceivably entertain the following two options:

$A$) Run one OLS regression using the whole sample. This tactic can be thought of as appealing to the consistency property of the OLS estimator. Call this estimator $\hat \beta$.

$B$) Divide the sample into $m$ disjoint sub-samples (for simplicity, assumed of equal length, and note that their union equals the whole sample), run $m$ regressions, and calculate the average of the $m$ coefficient estimates she will thus obtain. This tactic can be thought of as appealing to the unbiasedness property of the OLS estimator. Call this averaging estimator $\bar b_m$.

(Note that tactic $B$ does not fall into any re-sampling approach, like bootstrap, subsampling, or jackknife -to be exact, it has been considered as a marginal case in applying jackknife in time series, but it is not trully a jackknife method).

I have derived a nice-looking (to me) result that shows that the variance of the whole-sample estimator is always smaller than the variance of the averaging estimator:

$$\text{Var}\left(\bar b_m\right) > \text{Var}\left(\hat \beta\right) $$ I say it is nice-looking because the result uses the Arithmetic-Harmonic mean inequality for PD matrices, proven here: Specifically, writing $Z_l= \left(X_l'X_l\right)^{-1}$ for the inverse moment matrix of the regressors from the l-th sample, l=$1,...,m$, denoting by $A_m$ the arithmetic mean and by $H_m$ the harmonic mean of these $Z$ matrices, it is not hard to arrive at the following:

$$\text{Var}\left(\bar b_m\right) = \frac1m\sigma^2A_m > \frac1m\sigma^2H_m =\text{Var}\left(\hat \beta\right) $$

...the inequality holding in the matrix sense. Note that $H_m$ is the harmonic mean of $\left(X_1'X_1\right)^{-1},...,\left(X_m'X_m\right)^{-1} $in the true matrix sense, not a matrix containing the harmonic means of the corresponding elements of the matrices it averages.

So the averaging estimator $\bar b_m$ is always less efficient than the whole sample estimator $\hat \beta$.

My question: Are there any known bounds for the difference between Arithmetic-Harmonic means for matrices?

For real numbers they are (see the wiki article and the original resources here and here ).

Why? because it will be helpful in order to move to the next step and compare estimators that may be neither unbiased nor consistent, and so one is left with a criterion such as Minimum Squared Error to compare them.

Any suggestion, link or reference will be really appreciated.

EngrStudent
  • 8,232
  • 2
  • 29
  • 82
Alecos Papadopoulos
  • 52,923
  • 5
  • 131
  • 241
  • 1
    For future reference: It is preferable to flag your post at the original site and ask for it to be *migrated* rather than reposting. – cardinal Aug 22 '13 at 01:37
  • @cardinal Thanks. I have seen some posts labeled "migrated", but I didn't know the procedure. – Alecos Papadopoulos Aug 22 '13 at 01:44
  • 1
    (+1) No problem. I would be interested to know what norms you want to consider for the inequalities you're interested in. The one you've derived regarding the covariance matrices is actually a special case of a more general result. Your characterizations of (A) and (B) are also a little unusual insofar as the estimator in (A) is also unbiased and minimum variance in the sense that any linear combination of $\hat\beta$ is UMVU for the corresponding linear combination of the coefficients $\beta$. – cardinal Aug 22 '13 at 01:58
  • I am not sure I understand the second half of the comment. Both estimators are unbiased and consistent, and the (A) "tactic" gives the whole-sample estimator which indeed is shown to be minimum variance. Each approach attempts to actually endow its specific output, the estimate, with a different desirable property of the estimator function, and the result just verifies that appealing to consistency is preferred compared to appealing to unbiasedness, because the former is better in efficiency. – Alecos Papadopoulos Aug 22 '13 at 02:11
  • 2
    Dear Alecos, while this may be how you are thinking of (A) and (B), it strikes me as a little heuristic and not all-together natural. For one, *consistency* is a limiting statement and this will require stronger conditions on $\mathbf X$ than stated here, so "justifying" (A) in this way is rather tenuous. For (B), one might ask why the natural $U$-statistic version is ignored versus the one given. – cardinal Aug 22 '13 at 02:34
  • I think this question has great potential; I will try to give it some thought myself and would like to see what norms are of interest to you here. There are known AM-GM matrix inequalities that might interest you, too. – cardinal Aug 22 '13 at 02:35
  • @ cardinal I am really glad my question created interest and I do appreciate the intent to contribute. I will also think about and respond to the issues you bring up (norms, AM-GM ineq., U-statistics). Naturally, consistency requires additional assumptions but I was really assuming them here. (CONT'D) – Alecos Papadopoulos Aug 22 '13 at 09:45
  • @cardinal My practical starting point for all this was "since we cannot tell "how large" a sample must really become in order for consistency to "make its mark" on the estimates, maybe we can exploit a "practically large" sample to, at last, construct the sampling distribution from _real data_ and exploit finite sample properties? – Alecos Papadopoulos Aug 22 '13 at 09:46
  • @AlecosPapadopoulos, here is an arxiv paper that might be what you are looking for? (disclaimer I only skimmed the paper) https://arxiv.org/pdf/1501.04823.pdf – Lucas Roberts Jan 02 '18 at 20:19
  • I must be missing some essential idea, so let me ask at the risk of displaying my ignorance: given that the "whole sample" estimator is the minimum variance unbiased estimator, and given (as I assume) that the "averaging estimator" uses a *linear* average such as the arithmetic mean or weighted arithmetic mean (you don't specifically say, and the discussion of the harmonic mean makes one wonder which average it really is), can't we *immediately* conclude that the variance of the averaging estimator cannot be any smaller? – whuber Aug 21 '19 at 13:30
  • Second question: since there are no universal bounds for the difference between the AM and HM (even for positive numbers), evidently the bounds you seek must be some kind of function of the matrices. If that's so, the question is awfully broad and potentially has many different answers. What guidance can you supply concerning the kind of answer you're seeking? – whuber Aug 21 '19 at 13:32
  • 1
    Answer: As LS regression encompasses the univariate case, consider a string of #s with an outlier. Dividing into segments and averaging groups maintains the bad group mean, and just increases estimator variance with shrinking the data into a collection of fewer groups. Forget LS not robust, but do look at LAD (Least Absolute Deviations) which in the univariate case results in the median as the estimator. So now explore bounded relationships between a population's mean and median. Extend beyond univariate. – AJKOER Apr 26 '20 at 11:08
  • I found two papers http://www.zu.edu.jo/UploadFile/PaperFiles/PaperFile_42_35.pdf Theorem 3.2.3 https://projecteuclid.org/euclid.afa/1429286042 regards, shibamouli – Shibamouli Lahiri Sep 27 '20 at 10:15
  • @AlecosPapadopoulos - "... the motive has to do with statistics and econometrics ..."; "... a researcher could conceivably entertain the following two options ..." The researcher has a third option, forming a weighted combination of the subsample estimates, where the (optimal) weights are the precision (inverse variance) of the subsamples. The problem with method (B) is that the equal weights are suboptimal relative to the precision weights. – krkeane Jul 14 '21 at 11:52

1 Answers1

4

Yes, indeed there is. Please see the work by Mond and Pec̆arić here. They established the AM-GM inequality for positive semi-definite matrices. Here is a link to the paper that contains the proof:

https://www.sciencedirect.com/science/article/pii/0024379595002693

After downloading the paper, the proof is on pages 450-452, in the Main Result section.

Here is a citation in case you need it:

Mond, B., and Pec̆arić, J. E. (1996), “A mixed arithmetic-mean-harmonic-mean matrix inequality,” Linear Algebra and its Applications, Linear Algebra and Statistics: In Celebration of C. R. Rao’s 75th Birthday (September 10, 1995), 237–238, 449–454. https://doi.org/10.1016/0024-3795(95)00269-3.

I hope this helps you.

Best, =K=

kls
  • 301
  • 5