How to factorise a covariance matrix with one observation per row-column combination?

Question

I have $N$ correlated random variables. I assume that these random variables are given by the following expression:

$ \tilde{x}_i = \alpha_i + \beta_i \cdot \tilde{m} + \gamma_i \cdot \tilde{\varepsilon_i}, $

where $\tilde{m}$ is a "global" random variable and $\tilde{\varepsilon_i}$ are "variable specific" random variables (as can be seen from absence and presence of the index $i$, respectively). The mean and sigma of both $\tilde{m}$ and $\tilde{\varepsilon_i}$ are assumed to be zero and one, respectively. The $\tilde{\varepsilon_i}$ are also assumed to be independent. As a consequence, the covariance matrix should be given by the following expression:

$ C_{ij} = \beta_i \cdot \beta_j + \delta_{ij} \cdot \gamma_i \cdot \gamma_j, $

where $\delta_{ij}$ is Kronecker delta.

Now I say that each random variable comes with one number (feature $f_i$) that determines values of $\alpha_i$, $\beta_i$ and $\gamma_i$:

$ \alpha_i = \alpha (f_i), $

$ \beta_i = \beta (f_i), $

$ \gamma_i = \gamma (f_i), $

where $\alpha$, $\beta$ and $\gamma$ are some "universal" functions (the same for all N random variables).

Using the available observations of $x_i$ I can calculate the covariance matrix $C_{ij}$ and try to find such functions $\beta$ and $\gamma$ that approximate it well:

$ C_{ij} = C(f_i, f_j) = \beta(f_i) \cdot \beta(f_j) + \delta_{ij} \cdot \gamma(f_i) \cdot \gamma(f_j). $

So far no problems. The problem comes from the fact that features $f_i$ are not constants as well as the number of random variables.

For example, on the first time step I might have 3 random variables with the following values of features: $f_1 = 1.3, f_2 = 4.5, f_3 = 0.3$ and I also have the corresponding observations of the random variables: $x_1 = 1.0, x_2 = -0.5, x_3 = 4.0$. On the second step I might have 5 random variables coming with some new 5 values of features $f_i$ and 5 new observations $x_i$. How can I find functions $\beta(f)$ and $\gamma(f)$ in this case? Or, in other words, I can assume one pair of functions ($\beta_1(f)$, $\gamma_1(f)$) and another pair ($\beta_2(f)$, $\gamma_2(f)$). How can I determine which pair of functions approximate my data set better?

ADDED (to cover questions from the comments):

What is the difference between factor analysis and my problem? In the factor analysis we have a (covariance) matrix that we want to factorise. In my case I do not have a matrix. I would have a covariance matrix if I have a constant number of random variables and if the statistical properties of these variables (i.e. correlation between them) is constant.
What do I mean by a "pair of functions". I pair of functions is my hypothesis about how $\beta$ and $\gamma$ depend on feature $f$. Given a set of observations, I would like to check what hypothesis is more plausible (accurate).

Once again, my set up is as follow:

On each time step $t$ I have $n_t$ observations ($n_t$ random numbers): $y_1, y_2, \dots , y_{t_{n}}$
On each time step $t$, for each random number, I have a corresponding feature: $f_1, f_2, \dots , f_{t_{n}}$
I assume that $\beta$ and $\gamma$ are functions of features and I want to find out what functions describe my data in the best way.

What can also say, that my random variables instead of being indexed by an integer $i$ are "indexed" by a real valued feature $f$.

ADDED 2:

Here is an example of my data set:

   time  feature    y
0     1      1.0 -4.0
1     1     -0.5  2.0
2     1     -3.7  3.2
3     2      2.2  5.6
4     2      1.3  0.3
5     2      0.2  0.7
6     2     -4.5  2.2
7     3      7.2  4.5
8     3      0.3  5.9

What do you know about the unknown functions $\alpha, \beta, \gamma$? — kjetil b halvorsen, Oct 03 '20 at 17:36
I do not know anything about these functions. I just assume different dependencies and want to know what assumptions work better. In other words, I need a measure of accuracy. — Roman, Oct 03 '20 at 20:30
Please add all new info as an edit to the post! And, can you add some context, how will you apply this? As of now it reads as very abstract and dificult to relate to. — kjetil b halvorsen, Oct 03 '20 at 21:41
in the end, when you mention pairs of functions, it's confusing, what do you mean? — carlo, Oct 07 '20 at 17:36
Because your first two equations show the classic factor analysis model with 1 factor, it would be nice to hear how your task is different from usual FA. I.e. if you have a covariance matrix (do you?) what prevents you from simply factor-analysing it? — ttnphns, Oct 07 '20 at 19:49
@ttnphns, I do not have a covariance matrix. I have extended my question to provide more details. — Roman, Oct 08 '20 at 08:20
You write *"Where $\tilde{m}$ is a "global" random variable"*. What does that mean? How does this $\tilde{m}$ vary if it is "global"? If it is the same for all samples/items/experiments, then you can effectively treat it as a constant (and it won't be in the covariance matrix which is now a simple diagonal matrix). — Sextus Empiricus, Oct 08 '20 at 10:09
Did you mean the following? $$\tilde{x}_{ij} = \alpha_i + \beta_i \cdot \tilde{m}_{j} + \gamma_i \cdot \tilde{\varepsilon_i},$$ You have measurements $x_{1,1}, x_{2,1}, x_{3,1}, x_{4,2}, x_{5,2}, x_{6,2}, x_{7,2}, x_{8,2}$ where the three measurements with $i=1,2,3,$ come from the first time step and the five measurements with $i=4,5,6,7,8$ come from the second time step. — Sextus Empiricus, Oct 08 '20 at 10:19
@SextusEmpiricus, "global" in my context means that at a give time step its value is the same for all random numbers $x$. If we use your notation, then index $i$ would denote random variables and $j$ would denote time. So, you formula is correct with the exception that $\varepsilon$ should have to indices ($i$ and $j$). — Roman, Oct 08 '20 at 10:56
In your question "How can I determine which pair of functions approximate my data set better?" I am not so sure what is the problem. Is it about the comparison and what sort of method to use to do the comparison, like here: [likelihood based model selection](https://stats.stackexchange.com/questions/30835). Or is it about how to compute the covariance matrix for the error distribution, which will have some block structure? — Sextus Empiricus, Oct 08 '20 at 14:37

Sextus Empiricus · Accepted Answer · 2020-10-09T11:50:33.697

2

If your model is like

$$\tilde{x}_{it} = \alpha_i + \beta_i \cdot \tilde{m}_{t} + \gamma_i \cdot \tilde{\varepsilon}_{it}$$

Where $\tilde{m}_{j}, \tilde{\varepsilon}_{it} \sim \mathcal{N}(0,1)$ then we can rewrite is as a multivariate normal distribution

$$\textbf{x} \sim N(\boldsymbol{\alpha}, \boldsymbol{\Sigma})$$

Where $\textbf{x}$ is the vector of all observations $\lbrace x_{it} \rbrace$.

For instance the vector $\lbrace x_{1,1}, x_{2,1}, x_{3,1}, x_{1,2}, x_{2,2}, x_{3,2}, x_{4,2}, x_{5,2} \rbrace$ corresponds to measurements three measurements in the first time step and five measurements in the second time step. The index $i$ is repeated, and so the parameter $\beta_i$ will be the same for all these measurements at different times $j$ but with the same $i$ (I am not sure whether this is what you want?).
Where $\boldsymbol{\alpha}$ is the vector of the corresponding means.
Where $\boldsymbol{\Sigma}$ is the covariance matrix which will have a block form

$$\Sigma = \begin{bmatrix} C_{1ij} & 0 & \dots &0 \\ 0 & C_{2ij} & \dots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \dots & C_{nij} \\ \end{bmatrix}$$

with $n$ blocks equal to the number of time steps and each block $C_{tij}$ is like your original $C_{ij}$

This is similar to mixed-effects models explained here: Intuition about parameter estimation in mixed models (variance parameters vs. conditional modes) A code example to manually construct those blocks (instead of using build in mixed model functions) is here: https://stats.stackexchange.com/a/337348

So for given features $f_{it}$ (one for each $x_{it}$?) and a given model to compute the $\alpha_i,\beta_i,\gamma_i$ the model is fully specified and this allows you to compute the likelihood and make comparisons of models based on likelihood. Or if $\tilde{m}_{j}, \tilde{\varepsilon}_{ij}$ are not really normal distributed, then the covariance matrix still holds and you may see it as an approximation of the true likelihood, the result is a quasi likelihood.

(Or potentially you want to optimize the parameters of the model and optimize the likelihood? I am not sure whether that is what you want because you explicitly ask for comparing two models. Doing that might be possible but it is not easy to fit a non-linear mixed model where the variance also depends on the mean. You could try to just put it into some optimizer, but maybe, depending on the problem, simplifications can be made to made the convergence easier. Finding those simplifications is a bit of an art and there is not a general straigtforward method.)

edited Oct 09 '20 at 11:50

answered Oct 08 '20 at 15:25

Sextus Empiricus

43,080
1
72
161

Hi. You provide a solution of the problem that is described before "So far no problems" sentence in my original question. The problem is that the index $i$ is **not** repeated. If, on the first time step, we see some 3 observations associate with some 3 values of features, then on the second time step, where see 5 observations associated with completely different values of features. So, the third observation on the first time step has nothing to do with the third observation on the second time step. – Roman Oct 09 '20 at 07:42
@Roman , I am not completely sure whether I understand why the different values of features are important. Your blocks $C_{tij}$ will be made by the features $f$ which result in some $\alpha, \beta, \gamma$ that relate specifically to the measurements $x_{it}$ of a particular observation $i$ in time step $t$. If the features are not the same for a similar $i$ but different $t$, and hence the $\alpha_i, \beta_i, \gamma_i$ should not be considered the same for the same $i$, then don't make them the same. You should label them as $\alpha_{i,t}, \beta_{i,t}, \gamma_{i,t}$ instead. – Sextus Empiricus Oct 09 '20 at 08:38
Your $\alpha_i, \beta_i, \gamma_i$ should not be seen as regression-coefficients that are *fixed* for each $i$. They are instead function of the features. The regression-coefficients that you estimate or the coefficients/functions that you compare are in those functions. – Sextus Empiricus Oct 09 '20 at 08:47
Simplified example: Say in a mixed effects model you can have a variable intercept per group (time step) and the formula will be like $$y_{it} = \alpha_t + \beta \cdot f_{it} + \epsilon_{it}$$ with $\alpha_t \sim N(\mu_\alpha, \tau^2)$ and $\epsilon_{it} \sim N(0,\sigma^2)$. You seem to be absorbing the function $ \beta \cdot f_{it} = g(\beta, f_{it}) $ (which is in this case a linear function and in your case non-linear) into a coefficient $\beta_{it} = g(\beta, f_{it})$ $$y_{it} = \alpha_t + \beta_{it} + \epsilon_{it}$$ but you wrongly make this coefficient independent of $t$. – Sextus Empiricus Oct 09 '20 at 08:57
So you should use: $$\tilde{x}_{it} = \alpha(f_{kit},\theta) + \beta(f_{kit},\theta) \tilde{m}_t + \gamma(f_{kit},\theta) \tilde{\varepsilon}_{it}$$ where $f_{kit}$ is the $k-th$ feature in the $i$-th measurement of the $t$-th time step, $\tilde{m}_t \sim N(0,1)$ is noise that varies per time step and $\varepsilon_{it} \sim N(0,1)$ is noise that varies per measurement. The $\theta$ are the parameters to be fitted or compared. This $\theta$ can also be a vector of multiple parameters. Or instead of using parameters you compare two different functions. – Sextus Empiricus Oct 09 '20 at 09:05
How exactly am I supposed to calculate $C_{t i j}$? The only thing that I have is just $y_{i t}$ and $y_{j t}$. So, I just have two numbers. How should I combine these two numbers to calculate $C_{t i j}$ – Roman Oct 09 '20 at 11:10
@Roman You use your own formula $$C_{tij} = \beta_{it} \cdot \beta_{jt} + \delta_{ij} \cdot \gamma_{it} \cdot \gamma_{it}$$ Where you compute the $\alpha_{it},\beta_{it},\gamma_{it}$ according to the $t_n$ features $f_{kit}$ for each observation $i$ in time step $t$. (Or maybe you only have one feature each time step and measurement, in which case you can drop the $k$ underscript) – Sextus Empiricus Oct 09 '20 at 11:20
Maybe you can show the first part of your data frame to clarify the situation? – Sextus Empiricus Oct 09 '20 at 11:28
okay, I can use my formula to calculate "expected" (or "theoretical") value of the covariance. But what do I compare it with? I need to compare it with the corresponding "real" or "observed" values of the covariance. So, where do I get them from. My question was about how to calculate observed covariances. – Roman Oct 09 '20 at 13:12
I will extend my question to provide an example of data set. – Roman Oct 09 '20 at 13:13
@Roman I am not sure what you mean by "what do I compare with". I was under the impression that you already know two models to compare with each other. *"Or, in other words, I can assume one pair of functions ($\beta_1(f)$, $\gamma_1(f)$) and another pair ($\beta_2(f)$, $\gamma_2(f)$). How can I determine which pair of functions approximate my data set better?"* So you compare the likelihood of ($\beta_1(f)$, $\gamma_1(f)$) with the likelihood of ($\beta_2(f)$, $\gamma_2(f)$) – Sextus Empiricus Oct 09 '20 at 14:05

How to factorise a covariance matrix with one observation per row-column combination?

1 Answers1