4

Short: I have a series of joint probabilities (likelihoods) for how likely sample $Q$ belongs to group $K$. I need to compute a p-value describing how "significant" the "top" group is compared to other groups. Something like a likelihood ratio test, but since models aren't nested I don't know how to implement this.


My problem, oversimplfied: Given a new query sample $Q$ with particular values for features a, b, and c, figure out which known group $K$ the sample most likely belongs to given the probabilities each group has for values of those features.

The accepted methodology in my field for doing this is simply: for each group, calculate the product of the probability of observing $Q$'s values for features a, b, and c in group $k$. Because those numbers get tiny I do this on the log scale. After doing this I have a joint probability (what I'm calling a likelihood here) for each group $k$, indicating how likely it is that sample $Q$ originated from group $k$. I now need to put a p-value on this likelihood, to assess how "significant" the "top" group is from the other groups, based on these likelihoods.

It doesn't seem like I can use a typical likelihood ratio test because the models don't seem to be nested. They all use the same features (probability of seeing values for a, b, and c in each population), so the number of "parameters" is the same.

I've looked at 1-6 below, and these make me think I'm not asking the question properly.

Finally, I'd like to implement this in R.

  1. Generalized log likelihood ratio test for non-nested models
  2. Likelihood ratio test - lmer R - Non-nested models
  3. Comparing non-nested models with out of sample likelihood
  4. Non-nested model selection
  5. How to discriminate between non-nested models?
  6. Comparison of log-likelihood of two non-nested models
Stephen Turner
  • 4,183
  • 8
  • 27
  • 33
  • Have you looked into [AIC](https://en.wikipedia.org/wiki/Akaike_information_criterion#How_to_apply_AIC_in_practice)? If you have a likelihood, then you should be able to implement this. – call-in-co Jul 16 '15 at 14:31
  • Thanks @ScouserInTrousers. I have, but this section of the WP article explains in simple english how to implement. My problem is I'm unsure what the number of parameters should be here. The number of features used to estimate the likelihood (that is, _a_, _b_, and _c_ in my example above?) – Stephen Turner Jul 16 '15 at 15:01
  • I'm not sure what you mean by features but if you have, say, $y =\beta_0 + x_1\beta_1 + \epsilon$ you have three parameters to estimate: namely $\beta_0$, $\beta_1$, and $\sigma^2$ which is the variance of $\epsilon$. So, maybe your answer is three ($a$, $b$, and $c$) but I'm not sure how those three are making a model so I'm not sure of the answer. – call-in-co Jul 16 '15 at 15:17
  • Related thread: https://stats.stackexchange.com/q/137557/930. – chl Nov 09 '20 at 08:35

0 Answers0