0

Let $T\in\mathbb{R}_+$ and $R\in\{0,1\}$

Then if $$P[T]=\sum_RP\big[T,R\big]=\sum_RP\big[T\!\mid{\!R}\big]P\big[R\big],$$

does it imply $T$ and $R$ dependent?

user 31466
  • 1,197
  • 13
  • 31
  • 1
    P(Ri=0) P(Ti<=t|Ri=0) = P(Ti<=t and Ri=0). Similarly when Ri=1. Since Ri can only be 0 or 1 the sum is just P(Ti<=t). This does not require Ti to be uncorrelated with Ri. But maybe for the purpose of your problem they are. – Michael R. Chernick Dec 20 '16 at 03:57
  • 1
    More fundamentally what does survival mean here? Is it the time until a positive response to treatment? Ri doesn't seem to be related to time. It looks more like a censoring variable. But I am guessing. – Michael R. Chernick Dec 20 '16 at 04:05
  • @MichaelChernick Actually, this is a two-stage randomized design. Say, we initially give an induction therapy to a patient. If he respond to the induction therapy and consent to the second stage randomization, then we give the patient another treatment randomly at the second stage. So, here survival time is the time to initial randomization to death (initial randomization to response to the induction therapy + second stage randomization to death). And you're right. $R_i$ is a censoring variable. – user 31466 Dec 20 '16 at 04:13
  • 2
    Beware -- your question conflates "not correlated" with "independent". They're not the same thing in general. – Glen_b Dec 20 '16 at 04:21
  • Now it sounds like time to death is the outcome and you also have dropouts after the first stage. Shouldn't dropouts be included as censored? It seems for this to make sense you have to make a decision as to whether or not the patient responded to treatment. Doesn't it take time to determine if the patient responds. If that is the case and it is variable I think you need to pick a fixed time to wait in order to determine whether or not the patient responded. – Michael R. Chernick Dec 20 '16 at 04:44
  • It seems like you also have confounding since some patients get only one treatment and others get two. I don't see how to interpret the survival curve. – Michael R. Chernick Dec 20 '16 at 04:50
  • @MichaelChernick This problem can be tackled using the concept of "treatment policy". Instead of waiting until the second stage therapy is to be administerd, we use the pre-speccified design, as a patient enters the study, a combination will be assigned. Later, if he achieves remission/consent, a second-stage therapy will be assigned as pre-specified. In such case, those who dropouts after the first stage only reprasent themselves, and hence get a weight of one. – user 31466 Dec 20 '16 at 05:02
  • @MichaelChernick And you're right. it takes time to determine if the patient responds. In that case we pick a fixed time to wait in order to determine whether or not the patient responded. – user 31466 Dec 20 '16 at 05:09
  • If there is an underlying question about the experimental design/interpretation, you should probably update the question to reflect this. As it reads now, it is more a generic probability question. (For that question I believe all that is required for continuous $T$ to be dependent on but *uncorrelated* with binary $R$ is that the conditional *means* are equal while the conditional *distributions* differ, i.e. $\langle{T|R=0}\rangle=\langle{T|R=1}\rangle$ but $P[T|R=0]\neq{P}[T|R=1]$.) – GeoMatt22 Dec 20 '16 at 05:20
  • @GeoMatt22 But if $P(T\le t|R=0)=P(T\le t|R=1)$, does it imply continuous $T$ is uncorrelated with binary $R$ ? – user 31466 Dec 20 '16 at 05:30
  • 1
    Yes, see my answer. In general: independent implies uncorrelated, but uncorrelated does not imply independent. (See e.g. [here](http://stats.stackexchange.com/questions/85363/simple-examples-of-uncorrelated-but-not-independent-x-and-y).) – GeoMatt22 Dec 20 '16 at 06:23
  • The situation was about survival times and a censoring time. This is not just computing probabilities in a generic sense. The issues GeoMat22 raises about the relationship between independence and correlation are valid. The answer he gives is a good exercise in showing counter examples to some of the OPs claims. But it doesn't help with the original question. – Michael R. Chernick Dec 20 '16 at 11:07
  • I was reading it in the context of survival analysis. But it's okay to consider $T$ is positive real valued and $R$ is binary. – user 31466 Dec 21 '16 at 00:39
  • And I was confused that if $P(T)=\sum_R P(T|R)$, then aren't $T$ and $R$ dependent ? – user 31466 Dec 21 '16 at 00:46
  • @MichaelChernick I agree, but upthread I said *"If there is an underlying question about the experimental design/interpretation, you should probably update the question to reflect this. As it reads now, it is more a generic probability question."*, to which OP replied *"My question is indeed a generic probability question."* – GeoMatt22 Dec 21 '16 at 00:54
  • 1
    @Leaf the term in the sum should be joint Pr rather than conditional (i.e. not "$\mid$", but "$,$" or "$\cap$"). This is the [Law of Total Probability](https://en.wikipedia.org/wiki/Law_of_total_probability), which says nothing about dependence/independence. Dependence is when $P(T|R)\neq{P(T)}$. – GeoMatt22 Dec 21 '16 at 01:00
  • 1
    @Leaf your edited question is now just the law of total probability. It applies whether the variables are dependent or independent. In the case of independence it still holds, but could be simplified because then $P(T|R)=P(T)$, i.e. $P(T=t|R=0)=P(T=t|R=1)$. In understanding the various types of distributions involved (joint vs. conditional vs. marginal) perhaps [this](http://stats.stackexchange.com/a/239042/127790) could help? – GeoMatt22 Dec 21 '16 at 04:12

1 Answers1

2

Here is an answer to what I think is the general probability question (?)

Given the two random variables, $T\in\mathbb{R}_+$ and $R\in\{0,1\}$, then the marginal distribution of $T$ is given by $$P[T]=\sum_RP\big[T,R\big]=\sum_RP\big[T\!\mid{\!R}\big]P\big[R\big]$$ This is true whether $T$ and $R$ are dependent or independent. So your last equation says nothing definite on the topic of independence.

If $T$ and $R$ are independent then their joint distribution is the product of their individual marginal distributions $$P[T,R]=P[T]P[R]$$ and since $P[R=0]+P[R=1]=1$ by definition, then the first equation simply reduces to the tautology $P[T]=P[T]$.

On the other hand, two variables are uncorrelated if their covariance is zero $$\mathrm{cov}[T,R]=\langle{TR}\rangle-\langle{T}\rangle\langle{R}\rangle=0$$ Now obviously if the variables are independent then this will hold.

However in general uncorrelated does not imply independent. To see this, denoting $p=P[R=1]$ we can compute the expectations as \begin{align} \langle{T}\rangle &= \langle{T|R=0}\rangle{(1-p)} + \langle{T|R=1}\rangle{p} \\ \langle{R}\rangle &= (0)(1-p) + (1)p = p \\ \langle{TR}\rangle &= \langle{T(0)|R=0}\rangle{(1-p)} + \langle{T(1)|R=1}\rangle{p} = \langle{T|R=1}\rangle{p} \end{align} So all that is required for the variables to be uncorrelated is $$\langle{T|R=0}\rangle=\langle{T|R=1}\rangle=\langle{T}\rangle$$ i.e. the conditional expectations of $T$ are identical no matter the value of $R$.

Obviously this integral condition is much less restrictive than the requirement $$P[T|R=0]=P[T|R=1]=P[T]$$ that the conditional probability distributions are entirely independent of $R$ in a point-wise sense.

GeoMatt22
  • 11,997
  • 2
  • 34
  • 64