3

I have a set of $n$ measurements of $p$ variables $\xi_i$. I am interested in the inverse covariance or precision matrix $P$ of the variables, but because $p \gg n$ and because of limited storage ($p$ can be on the order of several 100,000), I would like to constrain the precision matrix to a given sparsity pattern.

In particular, the variables are associated with positions $(x_i, y_i, z_i)$, and I would like the precision matrix to have non-zero entries only for pairs of variables $(i, j)$ with $$ \sqrt{ (x_i - x_j)^2 + (y_i - y_j)^2 + (z_i - z_j)^2 } <= r. $$

Is there a method to estimate such a precision matrix with a given sparsity pattern? I'm aware of the graphical lasso and other methods of sparse precision estimation, but as far as I know they infer the sparsity pattern from the data.

As a second part of the question, would it be possible to apply the same sparsity constraint instead to a matrix $W$ of the same size as $P$, such that $P = W' W$? This matrix would be used to "whiten" or decorrelate the variables.

A. Donda
  • 2,819
  • 14
  • 32
  • Dempster's 'Covariance Selection' comes to mind: https://wiki.ucl.ac.uk/download/attachments/36288228/Dempster1972.pdf?version=1&modificationDate=1382695167000&api=v2 – steveo'america Mar 14 '19 at 18:46
  • Pls could you explain "variables are associated with positions"? – Yves Mar 20 '19 at 16:23
  • @Yves, concretely it means I'm working with fMRI, and each variable is the BOLD signal picked up from a voxel in the brain. I tried to formulate it generically because I would expect similar spatial constraints to also apply in other fields like geostatistics. – A. Donda Mar 20 '19 at 19:13
  • @A. Donda Thank you. So an observation is a sampling time, with every voxel sampled at the same time. Ideally, each variable could be a (continuous?) function of time. Indeed, similar constraints are met in geostatistics. – Yves Mar 20 '19 at 19:33
  • @Yves, yes, you can characterize each observation by a point on a four-dimensional grid $(t, x, y, z)$, i.e. constant step size for each dimension. However, the set of spatial positions can be irregular (the brain is not a cuboid). – A. Donda Mar 20 '19 at 19:38
  • @A. Donda So I would say that you are looking for a spatio-temporal covariance in relation with a Gaussian (or trans-Gaussian) Process with index $\mathbf{s} := [x, \, y,\, z]$ (space) and $t$ (time). The sparsity in $\mathbf{s}$ relates to a Markov (spatial) property but you may have to consider some (time) autocorrelation. – Yves Mar 20 '19 at 19:59
  • @Yves, for the purpose of this question I'd assume that there is no temporal autocorrelation. Spatially, I guess you could call it Markov because of finite depencency, but I wouldn't assume stationarity. – A. Donda Mar 20 '19 at 20:01
  • @A. Donda Thank you. The spatial Markov property is tightly related to the sparsity of the covariance matrix. You may find useful to read about the [R inla](http://www.r-inla.org) project. I think I better understand the problem now. – Yves Mar 20 '19 at 20:08

1 Answers1

3

You may exchange the penalty in the graphical Lasso by a constraint given by your wanted sparsity pattern. The graphical Lasso is given by

$\hat{\Theta} = \text{argmin}_{\Theta \ge 0} \left(\text{Tr}(S \Theta) - \log \det(\Theta) + \lambda \sum_{j \ne k} |\Theta_{jk}| \right)$

where the constraint $\Theta\ge 0$ means that $\Theta$ is psd. Instead you could solve the optimization problem

$\hat{\Theta} = \text{argmin}_{\Theta \ge 0, \Theta_{ij}=0 \text{ for }(i,j)\in A} \left(\text{Tr}(S \Theta) - \log \det(\Theta) \right)$

where $A$ is a subset of $\{1,...,p\}\times\{1,...,p\}$ of entries you would like to force to 0. For instance you may use the argument penalizeMatrix in http://sachaepskamp.com/qgraph/reference/EBICglasso.html

Another option is to use, for some very large $\lambda$,

$\hat{\Theta} = \text{argmin}_{\Theta \ge 0} \left(\text{Tr}(S \Theta) - \log \det(\Theta) + \sum_{i,j} \Lambda_{i,j}|\Theta_{ij}| \right)$

so that only the entries in $A$ are penalized. This last form is implemented in skggm in python https://skggm.github.io/skggm/tour where you can set the argument lam=... as a $p\times p$ matrix, with large elements for entries you want to penalize, and entries equal to 0 where you do not want to penalize.

jlewk
  • 406
  • 3
  • 5
  • Thanks! Your second equation is exactly what I'd want, and I see that the third could provide a good approximation. The extension to the second part of my question, writing the precision in terms of a sparse whitening matrix, is formally obvious. But would this also be supported by the software packages you mention? – A. Donda Mar 20 '19 at 19:22
  • If not, is there some generic term for this kind of optimization problem that I could use to search for it? I guess it's related to "quadratic programming", but I'm not sure. – A. Donda Mar 20 '19 at 19:24
  • I am not sure to understand what you mean by whitening. It's not clear from your question what the dimensions of W are, or which rows/columns/cells of W you would like to be 0. – jlewk Mar 21 '19 at 04:26
  • "Whitening" is a linear transform of the original variables into new variables that are of variance 1 and correlation 0. The corresponding matrix is of the same size as $P$. The sparsity constraint would be the same for $W$ as it was for $P$. I've edited the question to include this information. – A. Donda Mar 21 '19 at 16:59