1

Suppose I have a vector of probability densities (for risk of disease): c(0.01, 0.002, 0.03, 0.05) for ages 1, 2, 3, and 4. How do I obtain the cumulative density? Do I (1) simply add them up, i.e. the cumulative density at ages 1-4 are: c(0.01, 0.012, 0.042, 0.092)? Or should I (2) find the area under the curve?

Does this depend on whether my random variable is discrete or continuous? I.e. if my random variable is discrete, then (1) is appropriate. Otherwise (2) is appropriate because $P(X = 1) = 0$ if $X$ is continuous?

Adrian
  • 1,665
  • 3
  • 22
  • 42
  • Your intuition is correct – KenHBS Jul 24 '17 at 05:27
  • (1) What would you do if the data were $(0.5, 0.6, 0.7, 0.8)$? (2) What is this "curve" you refer to? (3) What is the "random variable," given that all you have described so far is a vector of probability densities? (4) Could you clarify the meaning of "risk of disease"? Presumably it is a *conditional annual risk*; for instance, the $0.05$ might mean the chance of someone contracting the disease at age 4 *given they have never had it before* is $0.05$. But maybe it's the frequency of the disease among all four-year-olds? Or perhaps it's the chance that they contract it at age 4? – whuber Jul 24 '17 at 15:34
  • @whuber (1) If the data were discrete, then I would be inclined to add them up. But the sum is > 1, so I'm not sure. (2) the curve I'm referring to is the one obtained via `plot(x = 1:4, y = c(0.01, 0.002, 0.03, 0.05)` (3) I'm not sure any more. I think I am mixing up "random variable" with "data"? (4) just the chance that they contract it at age 4 – Adrian Jul 24 '17 at 19:43
  • Your response to (1) should convince you that adding these numbers may be meaningless. The standard first-order rectangle estimate of the area under that curve is the sum of the numbers (its Riemann sum), so that will rule out the area as having any meaning, either. You might be conflating "probability" with "random variable." Although the two have some conceptual connection, they are different concepts and have distinctly different roles in statistical models. See https://stats.stackexchange.com/questions/50 and be picky about the answers you read! – whuber Jul 24 '17 at 20:15

0 Answers0