1

I'm currently trying to identify what approach should I use for my problem. I want to make a risk assessment for each subject having or not a specific condition C (5% of prevalence), sample size moderate w.r.t to prevalence (30K). I have for each subject condition (yes/no) and also history of past "accidents". I see these small accidents as precursors of future condition as they are moderately correlated to it (0.20). Accidents are also much more frequent than positive condition so it makes sense to use this precious information.

Each time I get new accidents, I would like to update my risk assessment and for that reason I see bayesian inference as a natural fit (disclaimer: I'm not a statistician and I'm trying to learn about Bayesian statistics). From what I understand, the prior on $\theta$ can be multiplied by the likelihood of data $P(D|\theta)$ to get a posterior which can be used to refine my prior each time new accidents arrive.

Problem I have is that the risk assessment I want to give is about condition, not about accidents. In other words, $\theta$ is the belief I have about the condition C. How can I take in account accidents in this scheme? It seems like I miss something that makes the link between condition and accidents.

EDIT: I'm just thinking about it, is something like this would make sense?

$P(\theta|D) \propto P(D|\omega) P(\omega|\theta) P(\theta)$

where $\omega$ is the parameter of the probability distribution of the accidents.

Patrick
  • 393
  • 1
  • 9

1 Answers1

1

In cases like this you usually don't know the exact form of the conditional distribution for such events and use some model to approximate the distribution. The most common choice for this kind of data would be some form of logistic regression.

Tim
  • 108,699
  • 20
  • 212
  • 390
  • Thanks again @Tim, your help is greatly appreciated. When you say "you usually don't know the exact form of the conditional distribution", does this conditional distribution correspond to $P(\omega|\theta)$ in my formula above, is this formula correct? And is the choice of logistic regression influenced by the nature of the accident variables (for example, suppose it's a count over the years of accidents). – Patrick Mar 30 '19 at 12:27
  • or would it be instead $P(\omega |D) \propto P(D|\theta) P(\theta|\omega) P(\omega)$ – Patrick Mar 30 '19 at 20:15