Shrinkage of the eigenvalues

Question

Assume we have $n$ samples $X_1,..., X_n$ which are independent and identically distributed with mean = 0 and unknown non-singular covariance matrix $M$. Each sample $X_i$ is a vector of size $p\times 1$.

I want to apply the "Stein-Haff estimator" [Stein, C. 1975] which estimates $M$ by shrinking its eigenvalues. So assume $M$ has the eigenvalues $\alpha_1, \alpha_2, ..., \alpha_p$. Its estimation $M'$ has the corrected eigenvalues $\alpha'_1, \alpha'_2, ..., \alpha'_p$.

Stein-Haff estimator is of the form Stein, C (1977b), Lecture 4 page 1391:

$~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~$$\alpha'_j = \alpha_j / L_j(\alpha)$

Where $L_j(\alpha) = n + p - 2j + 1 + 2 \sum_{i>j} (\alpha_i/ (\alpha_j - \alpha_i)) - 2\sum_{i<j} (\alpha_j / (\alpha_i - \alpha_j))$.

Unfortunately, this estimator is not very well since some corrected eigenvalues can be zero or even negative. That is why, they proceed to apply an "isotonic regression" to the above equation. There are a lot of references that explain how the isotonic version of this estimator is done:

enter image description here

Unfortunately, I am not able to get the PDF of the first two references. But can someone explain to me how Stein modified his estimator using the isotonic regression. So what is the final form of the equation above?

Kindly any help from your expriences will be very appreciated!

score 2 · Accepted Answer · edited Sep 04 '15 at 20:37

First of all, I don't know anything about Stein-Haff estimator other than what I saw from a few seconds of Googling in https://stat.duke.edu/~berger/papers/yang.pdf , which contains the quote "This estimator has two problems. First, the intuitively compatible ordering $\phi_1 \geq \phi_2 \geq \dots \geq \phi_p$ is frequently violated. Second, and more serious, some of the $\phi_i$ may even be negative. Stein suggests an isotonizing algorithm to avoid these problems. ... The details of this isotonizing algorithm can be found in Lin and Perlman (1985)".
That reference is: LIN, S. P. and PERLMAN,M. D. (1985). A Monte Carlo comparison of four estimators for a covariance matrix. In Multivariate Analysis 6 (P. R. Krishnaiah, ed.) 411-429. North-Holland, Amsterdam.

However, I do know about optimization. Isotonizing constraints can be placed on a least squares problem, making it into a (linearly constrained) convex Quadratic Programming (QP) problem, which is easy to formulate and numerically solve using off the shelf software. If an L^1 norm is used for the regression, or even an L^1 penalty being added to an L^2 objective, that is still a convex QP. In the case in which the objective is solely L^1, it would actually be a Linear Programming (LP) problem, which is a special case of a convex QP.

As for the negative eigenvalues, presuming those are still possible after adding the isotonizing constraints, that can be dealt with by imposing a semidefinite constraint on the covariance matrix. I.e., imposing the constraint that the minimum eigenvalue $\geq 0$. You could actually set a minimum eigenvalue value other than 0 if you so desire, and you would need to do this if you want to ensure the covariance matrix is nonsingular as you seem to suggest is desired or required. Addition of this semidefinite constraint turns the whole optimization problem into a convex semidefinite program (SDP), or technically, something convertible thereinto.

Formulating and numerically solving such a convex SDP, i.e., objective in your choice of norm (L^p for any $p \geq 1$), plus any objective penalty in your choice of norm $(p \geq 1)$ which need not be the same as the other norm, plus isotonizing (linear) constraints, plus semi-definite constraint, is VERY easy and straightforward using a tool such as CVX http://cvxr.com/cvx/ . This should be very fast executing unless dimension of the covariance matrix (what you called p, not what I called p) is in the thousands or greater. YALMIP http://users.isy.liu.se/johanl/yalmip/ could be used instead of CVX (which only allows formulation and solution of convex optimization problems, except for the optional specification of integer constraints). YALMIP allows for a greater choice of optimization solvers and a greater range of problems (non-convex) which can be formulated and solved than CVX, but has a steeper learning curve.

Thank you for your time and answering me :). So in the case where we have more unknown regression coefficients than the dimension of the dependent variable, the least squares may be inefficient and thus some Eigenvalues can be negative or zero (undetermined system of equations). So in this case we can apply the isotonic constraint to the least squares instead of adding an L1 or penalties??? — Christina, Jun 17 '15 at 08:37
I don't know what the objective function in Stein-Haff is "supposed to" be. Adding isotonnizing and semidefinite constraints is easy. You can change the objective function to whatever you want and call it the Christina-Stone-Stein-Haff estimator :) You don't have to add any penalty to the objective function - I was just saying you can without making the optimization problem solution more difficult. If using CVX or YALMIP, the preferrred approach for a Least Squares objective is to minimize a norm or sum of norms, not sum of squares. Too many coefficients sounds like a recipe for overfitting. — Mark L. Stone, Jun 17 '15 at 09:35
As i said, I don't know the what or why of Stein-Haff estimator, but if it has all these deficiencies, it could be that imposing isotonizing and semidefinite constraints is just a band-aid. In 1975, eigenvalue optimization was almost unheard of. Now it's fairly mature. I don't think Stein ever imposed or maybe knew about eigenvalue (semidefinite) constraints. Eigenvalue optimization for covariance matrices is easy because they're symmetric. I've done a lot of eigenvalue optimization, not just for covariance matrices, but for non-symmetric matrices, which is much harder due to non-convexity. — Mark L. Stone, Jun 17 '15 at 09:44

Shrinkage of the eigenvalues

1 Answers1