0

I was reading a post on Introduction to Maximum Likelihood and the following excerpt is provided:

The distinction between probability and likelihood is fundamentally important: Probability attaches to possible results; likelihood attaches to hypotheses.

So the question is, Is hypothesis a set of probabilities say 5 different probability models and the goal of likelihood is to determine which probability model best defines the data?

The following is a concrete example:

To illustrate how likelihood is attached to hypothesis. Say, I hypothesize that a magician has two-headed coin (which he really has but ONLY he knows), I am implying that the probability of heads is 1. Based on this hypothesis $$P(\textrm{Head|Two-headed-coin-hypothesis})=1$$ $$P(\textrm{Tail|Two-headed-coin-hypothesis})=0$$ Now when the magician performs $n$ trails and we see its heads each time then joint probability is $$P(\textrm{Trail 1|Two-headed-coin-hypothesis})*...*P(\textrm{Trial n|Two-headed-coin-hypothesis})$$ $$(1 *...*1)$$all evaluates to 1, thereby confirming hypothesis. An interesting outcome of this is that say if the magician had fair coin then any trial resulting in Tail would have included 0 in the joint probability above evaluating the whole expression to 0! therefore disproving the hypothesis.

However, when we do this for fair coin, say some random individual has fair coin but we dont know and we hypothesis it is indeed fair then $$P(\textrm{Head|Fair-coin-hypothesis})=0.5$$ $$P(\textrm{Tail|Fair-coin-hypothesis})=0.5$$ So we ask him to perform the trials $n$ times and getting only the results back. Now for say 2 trials the outcome shall be $0.254$ and for 3 trials its $0.125$ and for $n$ trials its $0.5^$, how does the final joint probability values confirms or disproves the hypothesis here? How do we compare 0.25 for 2 trials to our hypothesis, unlike the 2-headed-coin case where the joint probability value matched the hypothesis value, both being 1.

GENIVI-LEARNER
  • 720
  • 4
  • 13
  • Yes: but usually an infinite manifold of probability models is considered rather than just a finite number. – whuber Jan 22 '20 at 17:47
  • Ok, makes sense. What is manifold here? Is that a 2d surface? – GENIVI-LEARNER Jan 23 '20 at 10:07
  • 2
    Good guess: it's the multidimensional generalization of a surface, including curves, surfaces, and so on. For instance, the "Normal distribution" refers to the 2D manifold of all specific Normal distributions. Each "point" on this manifold *is* a Normal distribution. The points can be described by coordinates, much like points on the Earth's surface can be described with latitude and longitude. One coordinatization of the "Normal distribution" is the (mean, SD) pair. The "Pearson Distribution," for instance, requires four such coordinates and so is a 4D manifold. – whuber Jan 23 '20 at 12:37
  • 1
    +1 for intuitive insight – GENIVI-LEARNER Jan 23 '20 at 12:40
  • @whuber, I did a thought experiment yesterday about how likelihood is attached to hypothesis. Say, I hypothesis that a magician has two-headed coin (which he really has but only he knows), I am implying that the probability of heads is one. Based on this hypothesis P(Head|Two-headed-coin-hypothesis)=1, and P(Tail|Two-headed-coin-hypothesis)=0, so when the magician performs n trails and we see its heads each time then joint probability is P(trail1|Two-headed-coin-hypothesis)*..*P(trial-n|Two-headed-coin-hypothesis) all evaluates to 1, thereby confirming hypothesis. However .. – GENIVI-LEARNER Jan 24 '20 at 23:15
  • when we do this for fair coin..say some random individual has fair coin but we dont know and we hypothesis it is indeed fair then P(Head|fair-coin)=0.5 and we asks him to perform the trials n times and getting the results back then for say 2 trials the outcome shall be 0.25 and for 3 trials its 0.125 and for n times its $0.5^n$, how does the final joint probability values confirms or disproves the hypothesis here? how do we compare 0.25 for 2 trials to our hypothesis, unlike the 2-headed-coin case where the joint probability value matched the hypothesis value, both being 1. – GENIVI-LEARNER Jan 24 '20 at 23:23
  • @whuber Is that the beginning of information geometry? – Dave Jan 25 '20 at 13:21
  • 1
    @Dave That's what I had in mind, yes, but what I described is antecedent to that: the mechanics of maximum likelihood estimation imply such a description, which is purely topological. What turns it into *geometry* is the existence of additional structure (an "affine connection") which, in most cases, gives a natural distance between distributions. Geometry begins with distance. – whuber Jan 25 '20 at 14:04
  • @GENEVI-LEARNER: I'm confident that whuber's manifold viewpoint is correct ( although I have trouble viewing manifolds ) but, as far as your question goes, a s another way to look at your situation is to look at the trials as the outcome of a binomial where you're testing $p = 0.5$. Assuming the number of trials is fairly large, you can construct the z-score by using $\frac{( \hat{p} -0.5)}{var(\hat{p})}$ to approximate it for larger $n$. Maybe use continuity correction to deal with integer issue. – mlofton Jan 25 '20 at 15:02
  • @mlofton Do we need to have a model of hypothesis such as binomial, normal, poisson distribution to evaluate whether the likelihood "model" is suitable implying the "hypothesis" is suitable? Or we can purely look it numerical basis as in the example in the question. I think your z-score is the numerical evaluation of the hypothesis right? – GENIVI-LEARNER Jan 25 '20 at 15:16
  • @mlofton, in other words when we hypothesize, do we need to have a model of the hypothesis? or we can just do a "numerical guess" of the hypothesis as in the coin example? – GENIVI-LEARNER Jan 25 '20 at 15:18
  • The mathematics doesn't know where the hypothesis comes from. It could be solid theory, an estimate from other data, a number that occurred to the researcher in a dream. As researchers we might have different attitudes to them, but that's not part of the calculation. – Nick Cox Jan 26 '20 at 08:43
  • @NickCox can you please elaborate. – GENIVI-LEARNER Jan 26 '20 at 13:54
  • 1
    Hypotheses come in different flavours, Numerical guesses are one of several flavours. – Nick Cox Jan 26 '20 at 16:06
  • @GENIVI-LEARNER: You've asked a lot in your comment back to me. I don't know how to fit it here. The distribution sometumes has to be figured out. But, in this case, there's no doubt that it's binomial. This is because, if you a flip with prob $p$ of successes, then the number of successes in n-trials is binomial(n, p). But forget what I what I said about the normal distribution. First try to develop the test just using the binomial. That's a better way to go about things for now. Then, you can learn about normal stuff later. – mlofton Jan 26 '20 at 21:51
  • 2
    @GENIVI-LEARNER: This is not meant to be offensive but you sound like you're interested in stat theory. Have you taken any intro to probability-statistics classes that use say the degroot text or the port-hole and stone text. I recommend doing that if you haven't because I don't have the space nor the ability to address all of your comments. You do ask good questions which is why I'm making the suggestion. – mlofton Jan 26 '20 at 21:54
  • @mlofton yes I am agree the distribution or density has to be figured out. Also in the first case of the example in the question its uniform and in the second case it is binomial. So the notion of my question is that should we not rely on the model and purely on numerical basis, is there any rule that approves or disproves the hypothesis. – GENIVI-LEARNER Jan 27 '20 at 10:03
  • @mlofton, also thanks for the suggestion. I am not familiar with degroot text or the port-hole and stone text. Keyword search of those terms also dont reveal any useful information. Also there is plenty of space for your answer in the answer field :) – GENIVI-LEARNER Jan 27 '20 at 10:03
  • 2
    @GENIVI-LEARNER: Sometimes you can have data and need to test what distribution the data adheres to.. But, in some cases, because of the nature of the experiment, you can figure out the distribution of a random variable. In the case where there are n trials and the probability of success on each trial is p, then the rv has a binomial distribution. I'm not sure how to explain but any intro book would give decent detail. Let me find links to degroot and port hole and stone. I think it would be good to check out one of those books. Are you in school and if so, what level and what major ? – mlofton Jan 28 '20 at 02:18
  • 1
    this is a link to degroot. back in the dinosaur age, he had his own book. But iit looks like he collaborated for the updated edition. https://www.amazon.com/Probability-Statistics-4th-Morris-DeGroot/dp/0321500466 – mlofton Jan 28 '20 at 02:20
  • 1
    this is a link to port, hoel and stone. https://www.amazon.com/Introduction-Statistical-Theory-Houghton-Mifflin-Statistics/dp/0395046378 – mlofton Jan 28 '20 at 02:21
  • 2
    If you have two semesters of calculus, then this book is quite good but it's somewhat more advanced than the other two. The general sequence is called the "two semester math-stat sequence " where probability material is taken in the first course and statistics is taken in the second course. I believe in self-teaching but sometimes a good text or taking a class can help to get you started. https://www.amazon.com/Statistical-Inference-George-Casella/dp/0534243126/ref=sr_1_2?keywords=casella+and+berger&qid=1580178112&s=books&sr=1-2 – mlofton Jan 28 '20 at 02:24
  • 1
    @mlofton I am a junior, CS major and choosing ML as specialization. The links you provided are great! I shall explore them. Thanks a lot for the effort. – GENIVI-LEARNER Jan 28 '20 at 08:53
  • 2
    sure. I'm glad to help. my guess is that you have enough background and passion to take a course that uses casella-berger or go through it yourself slowly. the other two texts are easier but also good. all the best. – mlofton Jan 29 '20 at 14:13
  • Have a look at: https://stats.stackexchange.com/questions/112451/maximum-likelihood-estimation-mle-in-layman-terms/112480#112480 – kjetil b halvorsen Apr 02 '21 at 18:30

0 Answers0