5

I have a number of studies describing families tested for a genetic condition. For each study the following data are described:

  • $n_p$, number of probands (the proband is the first person in a family to be diagnosed with the genetic condition, so this is generally equal to the number of families)
  • $n_r$, number of relatives identified (this is the total number of people identified across families minus the number of probands)
  • $n_c$, number of relatives given genetic counseling
  • $n_t$, number of relatives given genetic testing

Relatives must be identified before they can be given genetic counseling, and they must be given genetic counseling before they can be given genetic testing.

If I assume that $n_r \sim r \times n_p$, $n_c \sim p_c \times n_r$ and $n_t \sim p_t \times n_c$ how can I infer $r$, $p_c$ and $p_t$ given that some values of $n_p$, $n_r$, $n_c$ and $n_t$ are missing (not reported)?

I am happy to work in WinBUGS, Stata, R and maybe others!

Thanks

tristan
  • 1,160
  • 8
  • 9
  • I think it would matter as to what tends to be missing the most and why. – Michael R. Chernick Aug 08 '12 at 15:18
  • Would this information be available on a family level or just overall? – Aniko Aug 08 '12 at 21:49
  • @MichaelChernick {p,r,c,t} are provided for 2 studies, {r,c,t} for a further 1, {p,r,t} for 9 studies and {p,t} for 4. Counselling is most often missing, but the reasons are not always stated and are unlikely to be uniform. – tristan Aug 09 '12 at 05:52
  • @Aniko Each study has the information at the overall level, so for each study there are up to four data. – tristan Aug 09 '12 at 05:53

1 Answers1

1

Here is my own current solution:

Let $N_{p,i}$ be the random variable denoting the number of probands in study $i$, and let $N_{r,i},N_{c,i},N_{t,i}$ respectively denote the numbers of relatives, those counseled and those tested in study $i$.

I do not specify a distribution for $N_{p,i}$. I model $N_{r,i}$ as:

$N_{r,i} \mid N_{p,i} = n_{p,i} \sim Poisson(r \times n_{p,i})$

This is perhaps not the right distribution to use, please comment with any other suggestions! I then model $N_{c,i}$ as:

$N_{c,i} \mid N_{r,i} = n_{r,i} \sim Bin(p_c, n_{r,i})$

If $N_{r,i}$ is not specified but $N_{p,i}$ is, then $N_{c,i} \mid N_{p,i} = n_{p,i} \sim Poisson(r \times p_c \times n_{p,i})$. I then model $N_{t,i}$ as:

$N_{t,i} \mid N_{c,i} = n_{c,i} \sim Bin(p_t, n_{c,i})$

If $N_{c,i}$ is not specified but $N_{r,i}$ is, then $N_{t,i} \mid N_{r,i} = n_{r,i} \sim Bin(p_c \times p_t, n_{r,i})$. If neither $N_{c,i}$ nor $N_{r,i}$ are specified but $N_{p,i}$ is, then $N_{t,i} \mid N_{p,i} = n_{p,i} \sim Poisson(r \times p_c \times p_t \times n_{p,i})$.

I then divide the studies according to which variables are observed and apply the relevant marginal distribution. Here is my WinBUGS code:

model {
  # Where all four variables are observed
  for (i in 1:N_PRCT) {
    # N_{r,i} | N_{p,i} = n_{p,i} ~ Poisson(r * n_{p,i})
    lam[Q_PRCT[i]] <- r * N_p[Q_PRCT[i]]
    N_r[Q_PRCT[i]] ~ dpois(lam[Q_PRCT[i]])
    # N_{c,i} | N_{r,i} = n_{r,i} ~ Bin(p_c, n_{r,i})
    N_c[Q_PRCT[i]] ~ dbin(p_c, N_r[Q_PRCT[i]])
    # N_{t,i} | N_{c,i} = n_{c,i} ~ Bin(p_t, n_{c,i})
    N_t[Q_PRCT[i]] ~ dbin(p_t, N_c[Q_PRCT[i]])
  }

  # Where the number being counseled is not observed
  for (i in 1:N_PRT) {
    # N_{r,i} | N_{p,i} = n_{p,i} ~ Poisson(r * n_{p,i})
    lam[Q_PRT[i]] <- r * N_p[Q_PRT[i]]
    N_r[Q_PRT[i]] ~ dpois(lam[Q_PRT[i]])
    # N_{t,i} | N_{r,i} = n_{r,i} ~ Bin(p_c * p_t, n_{r,i})
    N_t[Q_PRT[i]] ~ dbin(p_cp_t, N_r[Q_PRT[i]])
  }

  # Where only the number of probands and the number of
  # relatives tested are observed
  for (i in 1:N_PT) {
    # N_{t,i} | N_{p,i} = n_{p,i} ~ Poisson(r * p_c * p_t * n_{p,i})
    lam[Q_PT[i]] <- rp_cp_t * N_p[Q_PT[i]]
    N_t[Q_PT[i]] ~ dpois(lam[Q_PT[i]])
  }

  # Where the number of probands is not observed
  for (i in 1:N_RCT) {
    # N_{c,i} | N_{r,i} = n_{r,i} ~ Bin(p_c, n_{r,i})
    N_c[Q_RCT[i]] ~ dbin(p_c, N_r[Q_RCT[i]])
    # N_{t,i} | N_{c,i} = n_{c,i} ~ Bin(p_t, n_{c,i})
    N_t[Q_RCT[i]] ~ dbin(p_t, N_c[Q_RCT[i]])
  }

  # Vague priors on r, p_c and p_t
  r ~ dgamma(1, 0.001)
  p_c ~ dbeta(1, 1)
  p_t ~ dbeta(1, 1)

  rp_c <- r * p_c
  rp_cp_t <- r * p_c * p_t
  p_cp_t <- p_c * p_t
}

# DATA
list(
  N_PRCT=2, Q_PRCT=c(6,8),
  N_PRT=9,  Q_PRT=c(1,2,3,7,12,13,14,15,16),
  N_PT=4,   Q_PT=c(5,9,10,11),
  N_RCT=1,  Q_RCT=c(4),
  N_p=c(537,    18,     84,     NA,
        10,     36,     1,      4,
        44,     111,    6,      39,
        32,     113,    147,    17),
  N_r=c(10283,  167,    2309,   286,
        NA,     446,    96,     208,
        NA,     NA,     NA,     643,
        620,    3104,   6195,   405),
  N_c=c(NA,     NA,     NA,     113,
        NA,     347,    NA,     92,
        NA,     NA,     NA,     NA,
        NA,     NA,     NA,     NA),
  N_t=c(2622,   68,     694,    112,
        21,     334,    39,     84,
        249,    1359,   156,    38,
        127,    525,    432,    157)
)

Next steps are then to move it from a fixed-effects meta-analysis to a random-effects and hopefully onto a meta-regression.

tristan
  • 1,160
  • 8
  • 9