5

Consider the sum of three normal random variables:

$ R_{i,j}=A_{i}+B_{j}+C_{i,j}\, $

where $ A_{i}∼N(μ_{A},σ_{A}) $ , $ B_{j}∼N(μ_{B},σ_{B}) $ and $ C_{i,j}∼N(μ_{C},σ_{C}) $ . Assuming $A$, $B$ and $C$ are iid (independent), the density of $R$ is still normal, with mean $μ_{A}+μ_{B}+μ_{C}$ and variance $σ_{A}+σ_{B}+σ_{C}$.

Suppose I observe a list of realizations of $R$ (about 500 observations) as well as the $i$ and $j$ of each realization (so I know if two realizations of $R$ share the same $j$ for example).

How can I recover the distribution (mean and variance) of $A$, $B$ and $C$?

Ferdi
  • 4,882
  • 7
  • 42
  • 62
Rick
  • 53
  • 2
  • 2
    Could you explain the subscripting? The double subscripting of $C$ is especially mysterious. It looks like you are able to observe various sums $A+C$ and $B+C$ as well as an indicator of whether $A$ or $B$ is the first component, but that's only a guess. – whuber Jun 04 '19 at 14:46
  • Whuber, say R is the outcome, i a given person and j a given place. – Rick Jun 04 '19 at 15:41
  • 1
    Your title says you're summing *distriutions* while your body text says you're summing *random variables*. $X+Y$ is an *entirely* different kind of thing to $t(z) = F_X(z)+G_Y(z)$ (i.e. the 'distribution of sum' is not the same as 'sum of distributions'). Please make your title and question-body consistent. (Note also that you're using the symbol $\sigma$ in a non-conventional fashion, one likely to lead people into confusion or errors) – Glen_b Jun 05 '19 at 02:51

2 Answers2

2

You can estimate the three variances by fitting a mixed model with $R_{i,j}$ as the response and the factors $i$ and $j$ included as random effects. All three variances would be identifiable but not only the sum of the $\mu$'s.

The following R code simulating data and fitting the mixed model

i <- factor(rep(1:20,each=25))
j <- factor(rep(1:20,25))
set.seed(1)
A <- rnorm(nlevels(i), mean=1, sd=1)
B <- rnorm(nlevels(j), mean=2, sd=2)
C <- rnorm(length(i), mean=3, sd=.5)
R <- A[i] + B[j] + C
library(lme4)
summary(lmer(R ~ (1|i) + (1|j)))

gives the estimates

Linear mixed model fit by REML ['lmerMod']
Formula: R ~ (1 | i) + (1 | j)

REML criterion at convergence: 944

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-2.9493 -0.6611 -0.0223  0.6345  3.5898 

Random effects:
 Groups   Name        Variance Std.Dev.
 i        (Intercept) 0.8373   0.9150  
 j        (Intercept) 3.0804   1.7551  
 Residual             0.2612   0.5111  
Number of obs: 500, groups:  i, 20; j, 20

Fixed effects:
            Estimate Std. Error t value
(Intercept)   6.1824     0.4432   13.95

that are pretty close to the true values.

Jarle Tufto
  • 7,989
  • 1
  • 20
  • 36
  • Thanks Jarle. I guess in Stata this would be a standard linear regression with random effects. There is no way to obtain the means? The mean of C is what I'm interested in... – Rick Jun 04 '19 at 15:39
  • I don't know Stata but I guess it can do this too. Yes, there is no way of getting the separate means because the multivariate normal distribution of $\mathbf{R}$ depends on the $\mu$'s only through $\mu_A + \mu_B + \mu_C$. – Jarle Tufto Jun 04 '19 at 15:43
1

Since $$\overbrace{\mathbf R}^{\text{vector of}\atop\text{$R_{ij}$'s}}=\overbrace{\mathbf X}^{\text{matrix of}\atop\text{$1$'s and $0$'s}}\underbrace{(\mathbf A^\top \mathbf B^\top)^\top}_{\text{vector of}\atop\text{$A_i$'s and $B_j$'s}} + \overbrace{\mathbf C}^{\text{vector of}\atop\text{$C_{ij}$'s}}$$the problem can be rewritten as a standard Normal regression problem with Normal priors on $\mathbf A$ and $\mathbf B$

Xi'an
  • 90,397
  • 9
  • 157
  • 575