0

I am conducting some analysis on my data I found a strange behavior and would greatly appreciate some guidance or suggestions.

I am trying to investigate the effect of a categorical variable (cl) to three percentages that sum 1 (M). Naturally, I conducted a dirichlet regression on my dataset and a multivariate beta regression, but when compared using loo the beta regression presented a significantly better fit the data than the dirichlet.

$M \sim Dirichlet([1, \beta_a * tb, \beta_b * tb])$

or

$M1 \sim Beta(1, \beta_a * tb)$

$M2 \sim Beta(1, \beta_b * tb)$

$M3 \sim Beta(1, \beta_c * tb)$

Strangely, the predicted variables sum varies between 50% to 150% which is nonsense. However, the fitted variables sum varies 95% to 105% that is an acceptable error.

Is it fair to compare the models? or due to the natural constraints of a Dirichlet model it yields worst fit than a multivariate beta regression?

  • 2
    Why would you assume that three independent probabilities will sum to 1? – Tim Apr 06 '19 at 13:12
  • To follow on to @Tim's note (+1) - if you remove the constraint that the probabilities sum to 1, you'll naturally get a better fit - the unconstrained fit can't be worse than the constrained one, after all! – jbowman Apr 06 '19 at 18:36

0 Answers0