5

Consider an $N$-by-$N$ covariance matrix:

\begin{equation} Σ = \begin{bmatrix} Σ_{11} & Σ_{12} & \dots & Σ_{1N}\\ Σ_{12} & Σ_{22} & \dots & Σ_{2N}\\ \vdots & \vdots & \vdots & \vdots\\ Σ_{1Ν} & Σ_{2N} & \dots & Σ_{ΝΝ} \end{bmatrix} \end{equation}

and its inverse, a precision matrix:

\begin{equation} Λ = Σ^{-1} = \begin{bmatrix} Λ_{11} & Λ_{12} & \dots & Λ_{1N}\\ Λ_{12} & Λ_{22} & \dots & Λ_{2N}\\ \vdots & \vdots & \vdots & \vdots\\ Λ_{1Ν} & Λ_{2N} & \dots & Λ_{ΝΝ} \end{bmatrix} \end{equation}

Question 1: Suppose we have the covariance diagonal terms, $Σ_{ii}$, and the precision off-diagonal terms, $Λ_{i\neq j}$. How can we recover a covariance matrix consistent with these constraints? Edit: This question has been answered here.

Remark: Starting from $ΣΛ=Ι$ gives us $N^2$ equations in $N(N+1)/2$ unknowns where all but $N$ of the equations contain a single quadratic term.

Question 2 (special case): Consider the case where the precision off-diagonals have a rank-1 structure: $Λ_{i\neq j} = (R R^T)_{ij}$ where $R$ is $N$-by-$1$. Edit: This case was answered here.


Motivation: Suppose we have two multivariate Gaussian distributions, $p(\mathbf{x}) \sim \mathcal{N}(\mu_p, \Sigma_p)$ and $q(\mathbf{x}) \sim \mathcal{N}(\mu_q, \Sigma_q)$ where $\mathbf{x} \in \mathbb{R}^N$. We know $\mu_p$, $\Sigma_p$, $\mu_q$, and $\textrm{diag}(\Sigma_q)$, and our task is to find the off-diagonal elements of $\Sigma_q$ that minimize the KL divergence $KL(Q \left| \right| P)$.

This situation can arise when we have prior beliefs about $\mathbf{x}$ given by $p(\mathbf{x})$ and we want to infer a (Gaussian-restricted variational) posterior $q(\mathbf{x})$ given access to many observations, each of which contains only an individual component of $\mathbf{x} = (x^{(1)}, x^{(2)}, \dots, x^{(N)})$. In this case, we can form precise beliefs about the distributions of each component ($\mathcal{N}(\mu_{q,i}, \Sigma_{q,ii})$), but must rely completely on the prior to form beliefs about their correlations.

Define $\Lambda_p = \Sigma_p^{-1}$ and $\Lambda_q = \Sigma_q^{-1}$. Minimizing $KL(Q\left| \right| P)$ is equivalent to minimizing:

$$L \triangleq \textrm{tr}(\Lambda_p \Sigma_q) - \log |\Sigma_q|$$ and some calculation gives:

$$\frac{\partial L}{\partial (\Sigma_q)_{i \neq j}} = (\Lambda_p)_{ij} - (\Lambda_q)_{ij}$$

Setting the partial derivatives to $0$ yields $(\Lambda_q)_{ij} = (\Lambda_p)_{ij}$ for all $i \neq j$. In other words, the posterior distribution $Q$ maintains the off-diagonal precision structure of the prior. But this leaves open the question of how to determine the full $\Sigma_q$ (equivalently $\Lambda_q$).

Jack G.
  • 61
  • 3

1 Answers1

1

This can be setup as a non-linear system of equations. Because of non-linearity, it is not obvious that the system has a unique solution, but I did for the $2\times 2$-case, where it is unique as far as I can see.

But the equations become unwieldy, so in practice it seems easier to formulate it as a minimization problem. Below I do this in the case $2\times 2$, as a proof of concept, and it seems to work well.

Let $$ \Sigma =\begin{pmatrix} a & x \\ x & b \end{pmatrix}, \quad \Lambda = \begin{pmatrix} y & c \\ c & z \end{pmatrix} $$ multiply them, subtract $I$, square each of the elements and sum them. At the solution this sum-of-squares criterion should be zero. An R implementation:

crit <- function(par) {
    a <- par[1]; b<- par[2]; c <- par[3]  
    function(xx) {
        x <- xx[1] ; y <- xx[2] ; z <- xx[3]
    (a*y  +  c*x -1)^2  + (a*c  + x*z)^2  +
    (x*y  +  b*c)^2  + (x*c  +  b*z -1)^2     
    }
    }
init <- function( aa) {
    a <- aa[1] ; b <- aa[2] ; c <- aa[3]  
    x <- -c/2
    c(x, b/(a*b-x*x), a/(a*b-x*x))
}

completeSigma <- function( dsigma, offd_lambda) {
    obj <- optim( init( c(dsigma, offd_lambda)),
                 crit( c(dsigma, offd_lambda)), method="BFGS")
### Return completed sigma:
    par <- obj$par
    matrix( c( dsigma[1], par[1], par[1], dsigma[2]), 2, 2)
}

completeSigma( c(4, 1), -1/3)
     [,1] [,2]
[1,]    4    1
[2,]    1    1

(For a more mathematical treatment, I asked at https://mathoverflow.net/questions/406753/finding-a-matrix-from-its-diagonal-and-the-off-diagonal-elements-of-its-inverse)
$$

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
  • An exact formula is available. Define $\lambda = (1 + \sqrt{1 + 4b^2AD})/(2AD)$ where $(A,D)$ is the diagonal of $\Sigma$ and $b$ is the term common to the off-diagonal of $\mathbb A.$ Then the diagonal of $\mathbb A$ is the vector $\lambda(\Sigma_{22}, \Sigma_{11}).$ $\mathbb A$ can now be recovered by inverting $\Sigma.$ – whuber Oct 22 '21 at 21:04
  • @whuber: Yes, but can you generalize to $N\times N$ case? The optimization solution is easy to generalize! – kjetil b halvorsen Oct 22 '21 at 23:04