MSE of correlations

Question

These might be dumb questions but I am having trouble to wrap my head around of a particular problem. I have a sparse count matrix $G $ that I want to optimize which is $N \times p$. Also, I have correlation matrix $C_{ij}$ which is $N \times N$ that I want matrix $G$ to be optimized for. So, without any constraints I have the following loss function:

$$ \text{loss} = \sum_i\sum_j (\operatorname{corr}(G_i, G_j) - C_{ij})^2 $$

where $G_i$ is a vector $1 \times p$ and $i = j = 1,\ldots,N$.

So, my first question is how can I solve/implement this particular problem?

My second question is related to my lack of knowledge regarding defining optimization problems :)

First of all, as I mentioned above $G$ is a count matrix and it is very sparse and I also want to keep the distribution of each $G_i$ while optimizing, not just randomly change it as it is described above because the initial values are important. So, my question is that is there a way for me to add these information into the problem design?

I feel like some clarification is needed: 1) Does "count matrix" mean a matrix whose elements are only 1s and 0s? 2) What do you mean by keeping the distribution of each $G_i$? So we start with some matrix $G$ and we want to modify it. What are we allowed to do? Can we change each element of $G$ independently? Or do we have to, for example, keep the sum of each row fixed? — Aleksejs Fomins, Dec 30 '20 at 13:02
These are good questions. 1) A count matrix consist of non-negative integers. In my case, I want to optimize the distributions of single cell RNAseq data, which is mostly zeros but occasionally each element goes up to ~10. Though for simplicity I can probably start with bulk RNAseq and there generally $G_i$ is modeled via poisson or negative binomial distribution. Though I am not quite sure, how single cells are modeled, I need to read more about it. — eonurk, Dec 30 '20 at 15:27
I am confused. Is $G_i$ a fixed vector or a random vector. If it is random and has a known distribution, it has a fixed correlation matrix. I repeat my question. What part of $G_i$ are we allowed to change to optimize the loss? — Aleksejs Fomins, Dec 30 '20 at 15:52

score 0 · Answer 1 · edited Dec 30 '20 at 13:43

As far as i see, this problem has similarities with Non-Negative Matrix Factorization (NMF) as data $G$ matrix has non negative entries with many zeros.

$$G_{i,j}= \sum_n W_{i,k} H_{k,j}$$

Where $W$ and $H$ are $n×k$ and $k×n$ matrices and $G$ is approximated/represented with $WH$. The major caveat here $W$ and $H$ have to have only non negative entities too. If your end results is to have a kind of "transformed" non negative correlation matrix C factorized into two non negative vector $W, H$ , then NMF might help..

Element of Statistical Learning page 553, section 14.6 represents an optimization setup based on minimising a divergence objective function where $G_{i,j}$ has a Poisson distribution .

I'm sorry, I don't exactly follow. How does NMF help perform the optimization? Is the minimization of the factorized matrix somehow faster than the original? — Aleksejs Fomins, Dec 30 '20 at 13:40

Aleksejs Fomins · Answer 2 · 2020-12-30T13:36:33.303

Analysis:

Sample correlation is a highly-nonlinear function, so it is unlikely we will be able to solve the optimization problem analytically. So we will need to look for a numerical solution.
The unconstrained problem clearly has multiple extrema, because there are multiple ways to achieve the same correlation value. Further, since you require an integer solution to your matrix coefficients, some extremas may be better than other because they are closer to integer values and introduce less loss when rounded off. It must be noted that exact solutions to optimization problems over integers are often NP-hard, so only an approximate solution may be feasible.

So, if you don't want to delve too deep into mathematics and only need an approximate solution, you might try some standard optimization techniques (e.g. scipy). The strategy would be to optimize the loss function over non-negative floating-point numbers, then round them off to nearest integers. Since there are multiple extrema in place, it would be beneficial to run the algorithm for multiple random initial conditions. Finally, if performance is an issue, you may consider computing Jacobian and/or Hessian of your loss function analytically, as some methods can perform significantly faster if such information is provided.

Also you mentioned very sparse solutions. Sparsity of a numerical solution is typically controlled by introducing additional L1 Norm to the loss function, and regulating the desired accurracy/sparsity ratio by changing the prefactor in front of the L1 Norm. In your case, if you require independent sparsity constraints on each row, you may consider adding an independent penalty for each row.

MSE of correlations

2 Answers2