1

I am trying to work on a Dirichlet regression problem where I look at three true proportions the first two are actually measured while the third is computed as the complement:

$$p_3=1-p_1-p_2$$

The issue that I have is that $p_1$ and $p_2$ are measured with error and hence $p_3$ will sometimes be negative i.e. when $p_1+p_2>1$.

Is there any way to “correct” for this problem? I know there might be a way to do this with a Bayesian approach where you model the measurement error. But I would rather look at a simpler approach such as a proper way to normalise the data.

MarianD
  • 1,493
  • 2
  • 8
  • 17
user6384
  • 33
  • 4
  • 1
    Build a hierarchical model where the $p_i$'s are distinguished from their noisy counterparts, the $q_i$'s. – Xi'an Jan 18 '21 at 15:01
  • Since $p_1$ and $p_2$ are measured with error, how do you guarantee that each of them is in the unit interval? If that happens "naturally" then why not compute $p_3$ the "same" way and then normalize the three of them to sum to one? – mef Jan 18 '21 at 23:13
  • Thanks mef, the issue is that I don't have p3. It would be equivalent to measuring a leaf area initially all with healthy tissue and then you have a leaf disease where at a later time you measure (in proportion of the initial area) the proportion of leaf diseased (p1), healthy(p2) and then you have some leaf that disintegrated that you cannot measure. I was trying to avoid having to use a hierarchical model as Xi'an suggested. – user6384 Jan 19 '21 at 10:42

1 Answers1

2

Think of $p_3$ as a derived parameter that is not part of the model. Once you get posterior draws of $(p_{1}, p_{2})$ compute $p_3$ from these draws by subtraction. Then everything remains consistent.

Frank Harrell
  • 74,029
  • 5
  • 148
  • 322
  • Thank you for your answer Frank. I am a bit unclear on what you mean. Is the idea to set up the likelihood as: (p_1, p_2,1-p_1-p_2)~Dirichlet treating the p's as parameters and then have two measurement errors likelihoods for example: y_1~N(p_1,SD) and y_2~N(p_2,SD)? – user6384 Jan 19 '21 at 10:33
  • Yes to the first part (I think) and not sure about the second part. – Frank Harrell Jan 19 '21 at 13:15
  • Thank you. Do you think that by simulating values based on known parameters and retrieving those values would be good enough to be "confident" this approach is sensible? – user6384 Jan 20 '21 at 05:53
  • This is more about math than about checking algorithm accuracy, but yes it's usually a good idea to simulate form a model and see what data you get. – Frank Harrell Jan 20 '21 at 13:32