10

This is more of a history question than a technical question.

Why is the ``Neyman-Pearson lemma'' a Lemma and not a Theorem?

link to wiki: https://en.wikipedia.org/wiki/Neyman%E2%80%93Pearson_lemma

NB: The question is not about what is a lemma and how lemmas are used to prove a theorem, but about the history of the Neyman-Pearson lemma. Was it used to prove a theorem and then it happened to be more useful? Is there any evidence of this beyond suspicion that this was the case?

Scortchi - Reinstate Monica
  • 27,560
  • 8
  • 81
  • 248
Tauto
  • 103
  • 5
  • 6
    [Terminology](https://en.wikipedia.org/wiki/Theorem#Terminology): A lemma is a "helping theorem", a proposition with little applicability except that it forms part of the proof of a larger theorem. In some cases, as the relative importance of different theorems becomes more clear, what was once considered a lemma is now considered a theorem, though the word "lemma" remains in the name. – Carl Dec 29 '18 at 02:42
  • 2
    @Carl Sure, but why is the Neyman-Pearson lemma a lemma and not a theorem? was there a Theorem? and is there evidence of it? As I said, it is history question, not a technical one. – Tauto Dec 29 '18 at 03:12
  • 2
    Well, the N-P lemma's used to prove the Karlin-Rubin theorem, & that Rao's score test is locally most powerful; these results are perhaps applied more widely than the N-P lemma itself (point null vs point alternative). – Scortchi - Reinstate Monica Jan 02 '19 at 12:49

2 Answers2

10

The classic version appears in 1933, but the earliest occasion of its being referred to as a "lemma" is possibly in Neyman and Pearson's 1936 article Contributions to the theory of testing statistical hypotheses (pp. 1-37 of Statistical Research Memoirs Volume I). The lemma, and the proposition it was used to prove, were stated as follows: enter image description here

This is known today as the generalized Neyman-Pearson Fundamental Lemma (cf. Chapter 3.6 of Lehman and Romano's Testing Statistical Hypotheses), and it reduces to your everyday Neyman-Pearson when $m=1$. The lemma itself was then studied by several big names from that era (e.g. P.L. Hsu, Dantzig, Wald, Chernoff, Scheffé) and the name "Neyman and Pearson's lemma" thus stuck.

Here's a list of relevant articles/books if one's interested in the history of the Neyman-Pearson lemma:

  • The Neyman–Pearson Story: 1926-34, E.S. Pearson, in Research Papers in Statistical: Festschrift for J. Neyman.
  • Introduction to Neyman and Pearson (1933) On the Problem of the Most Efficient Tests of Statistical Hypotheses, E.L. Lehmann, in Breakthroughs in Statistics: Foundations and Basic Theory.
  • Neyman-From Life, C. Reid.
Francis
  • 2,972
  • 1
  • 20
  • 26
  • Yes, but, Neyman-Pearson's lemma fit the definition a lemma in 1933, i.e., it was a lemma at that time, which is why it was subsequently referred to as a lemma. – Carl Jan 03 '19 at 21:35
  • 1
    @Carl, what is your point by using 'but'. Is there something wrong with this answer? – Sextus Empiricus Jan 04 '19 at 10:43
  • What is the particular proposition that they intended to proof with the aid of the lemma? – Sextus Empiricus Jan 04 '19 at 13:46
  • @MartijnWeterings: I cropped the proposition and the lemma together from the scan. I doubt it's going to be helpful without looking at the original paper though. – Francis Jan 04 '19 at 15:17
  • I believe that the proposition is interesting because that is what some texts (e.g. wikipedia) refer to as the lemma. I wonder then what the first use has been of the term *'Neyman-Pearson lemma'* and how it evolved. – Sextus Empiricus Jan 04 '19 at 15:50
  • 1
    @MartijnWeterings: You can search the term on Google Scholar and confine the date range. The earliest use is from P.L. Hsu it seems. [Wald's lecture note](https://catalog.hathitrust.org/Record/009217188) from 1940 also cited it. – Francis Jan 04 '19 at 16:27
  • @MartijnWeterings Francis does not appear to answer *why* the Neyman-Pearson lemma is a lemma and attempts to satisfy us with *when* is was so called. The song [a boy is named Sue](https://www.youtube.com/watch?v=WOHPuY88Ry4) give us a reason for the name without which the answer to when the name was given would not satisfy. – Carl Jan 04 '19 at 17:40
  • 2
    @Carl, did you miss the following part? *"**NB**:The question is not about what is a lemma and how lemmas are used to prove a theorem, but about the history of the Neyman-Pearson lemma."* It is about the *history*. The question ask for *context* how this theorem became called a lemma. Not *why* a theorem (or more specifically this theorem) can be called a lemma. – Sextus Empiricus Jan 04 '19 at 19:31
  • @MartijnWeterings I did not miss that quip. It was added **after** my upvoted comment and changed the implications of the question not at all. Unless the OP modifies the original question entirely, the question is primarily *why* not *when*. A lemma does not become a lemma when someone says so, it becomes one when it fulfills that role. – Carl Jan 04 '19 at 19:32
  • 2
    @Carl, then this answer explains nicely how it fulfilled that role and it includes some history how people have been viewing that role. – Sextus Empiricus Jan 04 '19 at 19:36
  • @MartijnWeterings And yet the OP accepted my original answer to the original question, which is only fair, and the lemma became a lemma in 1933 not in 1936. – Carl Jan 04 '19 at 19:42
-2

NB: This historically first answer to the OP question. In statistics, the Neyman–Pearson lemma was introduced by Jerzy Neyman and Egon Pearson in a paper in 1933.. Also, it is used in practice by statisticians as a theorem, not a lemma, and it is called a lemma largely because of the 1936 paper. IMHO, the historical treatment does not answer the "why" question, and this post attempts to do that.

What a lemma is as contrasted to a theorem or corollary is addressed elsewhere and here. More exactly, as to the matter of definition: Lemma, first meaning: A subsidiary or intermediate theorem in an argument or proof. I agree with the Oxford dictionary but would have changed the word order, and note the exact language: intermediate or subsidiary theorem. Some authors mistakenly believe that a lemma must be intermediary in a proof, and this is the case for many unnamed lemmas. However, it is common, at least for named lemmas, for the lemma result to be an implication arising from an already proven theorem such that the lemma is an additional, i.e., subsidiary theorem. From the New World Encyclopedia The distinction between theorems and lemmas is rather arbitrary, since one mathematician's major result is another's minor claim. Gauss' lemma and Zorn's lemma, for example, are interesting enough per se that some authors present the nominal lemma without going on to use it in the proof of any theorem. Another example of this is Evans lemma, which follows not from proof of a simple theorem of differential geometry which...shows that the first Cartan structure equation is an equality of two tetrad postulates...The tetrad postulate [Sic, itself] is the source of the Evans Lemma of differential geometry. Wikipedia mentions the evolution of lemmas in time: In some cases, as the relative importance of different theorems becomes more clear, what was once considered a lemma is now considered a theorem, though the word "lemma" remains in the name.

However, note well that whether or not they stand alone lemmas are also theorems. That is, a theorem that is a lemmas may sometimes be an answer to the question, "What does the (above) theorem imply?" Sometimes lemmas are a stepping stone used to establish a theorem.

It is clear from reading the 1933 paper: IX. On the problem of the most efficient tests of statistical hypotheses. Jerzy Neyman, Egon Sharpe Pearson, and Karl Pearson, that the theorem being explored is Bayes' theorem. Some readers of this post have difficulty relating Bayes' theorem to the 1933 paper despite an introduction that is rather explicit in that regard. Note that the 1933 paper is littered with Venn diagrams, Venn diagrams illustrate conditional probability, which is Bayes' theorem. Some people refer to this as Bayes' rule, as it is an exaggeration to refer to that rule as being a "theorem." For example, if we were to call 'addition' a theorem, as opposed to being a rule, we would confound rather than explain.

Therefore, the Neyman-Pearson lemma is a theorem concerning the most efficient testing of Bayesian hypotheses, but is not currently called that because it was not to begin with.

Carl
  • 11,532
  • 7
  • 45
  • 102
  • 1
    I'm a bit confused as to what exactly you're saying here. Clearly not that the N-P lemma is used to prove Bayes' theorem, in this paper or elsewhere. So the question "Why 'lemma'?" remains. The N-P lemma *is* used in Sections III & IV of this paper in the derivation of UMP similar tests, & might justly have been called a lemma for this reason. – Scortchi - Reinstate Monica Jan 04 '19 at 09:59
  • @Scortchi Figuratively speaking, lemmas pick up and digest some of the stale bread crumbs left behind by a theorem. Unlike corollaries, which are directly implied by theorems, the chain of induction of a lemma tends to be longer. That role is more important than the matter of proof, *per se,* and the structure tends toward predictions implied by a theorem. See https://math.stackexchange.com/a/463364/373043, and https://math.stackexchange.com/a/111490/373043. – Carl Jan 04 '19 at 19:10
  • 3
    Your statement "Therefore, the Neyman-Pearson lemma could be called a theorem " is unfounded and explains nothing why we refer to the 'Neyman-Pearson lemma' as a lemma. Furthermore, what it has to do with Bayes theorem is entirely unclear and seems false. Your answer deserves downvotes for being vague and nonsensical but since you do not like those downvotes I will just state that it does deserves them without giving any. – Sextus Empiricus Jan 04 '19 at 19:41
  • @MartijnWeterings I was answering the question "Why is the ``Neyman-Pearson lemma'' a Lemma and not a Theorem?" and commented quoting Wikipedia that "as the relative importance of different theorems becomes more clear, what was once considered a lemma is now considered a theorem, though the word "lemma" remains in the name." Your criticism reflects a rigidity that is not commensurate with historical development, names can change as the implications of the essence of a subject become apparent, but also do not always do so due to social inertia. – Carl Jan 04 '19 at 19:54
  • @MartijnWeterings Why don't you answer the question "Why is it not a theory?" rather than just criticizing my attempt a doing so. If you think that calling the lemma a theorem is rediculous, why do you not say that? If so, the question is incorrect. I took a stab at it. All you did was criticize, try to contribute a bit more. – Carl Jan 05 '19 at 02:48
  • 2
    A lemma is just a theorem (only placed in a different context as a 'help' in a larger proof). This is not the question and it has been answered in several threads on the mathematics site. We know that lemma's can start to live a life on their own (without their former theorem's that they helped). The question explicitly asks for the history of this in relation to the Neyman Pearson Lemma. Francis has already given a fine answer to this and there is no need for another answer. I criticized your answer because it is confusing (with stuff about Bayes rule) and not helpful or even detrimental. – Sextus Empiricus Jan 05 '19 at 09:50
  • Carl, you have been referring a few times to the change of the question. But it is unclear what you refer to since there has been little changes to the question after you have answered. Which of the nine versions of the question have you been answering? – Sextus Empiricus Jan 05 '19 at 17:58
  • @MartijnWeterings I answered Dec 30 '18 at 2:12 to the question at the time of my tag edit of that question on Dec 29 '18 at 2:39. Francis's answer appeared on Jan. 5, after the question morphed, and the question morphed because of my comment, the very first comment on this post. I have contributed much to this dialogue, but one cannot tell that from the negative reactions, which are simply heartbreaking. Kick a guy for helping, thanks loads for all the bias. – Carl Jan 05 '19 at 18:17
  • What does it mean for a 'lemma' to 'modify' a theorem? Also, does the discussion of things being called 'theory' imply that this answer claims 'theory' and 'theorem' are interchangeable? Otherwise I don't understand the point of the discussion about things being called 'theory'. – Juho Kokkala Jan 05 '19 at 18:20
  • @JuhoKokkala Theorem is from Ancient Greek θεώρημα (theṓrēma, “speculation, proposition to be proved”) (Euclid), from θεωρέω (theōréō, “I look at, view, consider, examine”), from θεωρός (theōrós, “spectator”), from θέα (théa, “a view”) + ὁράω (horáō, “I see, look”), and theory is from Ancient Greek θεωρία (theōría, “contemplation, speculation, a looking at, things looked at”), from θεωρέω (theōréō, “I look at, view, consider, examine”), from θεωρός (theōrós, “spectator”), from θέα (théa, “a view”) + ὁράω (horáō, “I see,look”), i.e., the etymology is the same. – Carl Jan 05 '19 at 18:31
  • 1
    Carl, so which version did you answer that has become different? – Sextus Empiricus Jan 05 '19 at 18:33
  • @JuhoKokkala I used the word modify in a grammatical sense; an adjective modifies a noun. I will change this to 'refers to' – Carl Jan 05 '19 at 18:34
  • @MartijnWeterings Actually, Francis's answer appeared after mine did, and after the question itself morphed to make that answer look appropriate, even when it does not answer the question. There is no need for such behaviour. Moreover, the concept of 'larger' or 'smaller' for a lemma than the theorem it refers to is incorrect. What the lemma references, according to the introduction in the 1933 paper, is Bayes' theorem. So this is a lemma to what, in your opinion? – Carl Jan 05 '19 at 18:38
  • 1
    @Carl I don't understand 'refer to' either. A lemma is used to prove a theorem. If lemma A is used to prove theorem B, do you say A refers to B. Do you think that Neyman and Pearson proved Bayes theorem using the Neyman-Pearson lemma as an intermediate step? (That's not the impression I get from a brief glance of the introduction, but admittedly I have not read it in detail) – Juho Kokkala Jan 05 '19 at 18:41
  • @MartijnWeterings Asked and answered, version 5, from Dec 29 '18 at 2:39. – Carl Jan 05 '19 at 18:41
  • @JuhoKokkala NO. From [our companion site](https://math.stackexchange.com/a/463364/373043): A lemma is generally used to describe a "helper" fact that is used in the proof of a more significant result. And that is not correct either. A lemma is a chain of implication that takes a theorem as a starting point, but what it implies may be something that may or may not point to the originating theorem, i.e., it may prove the theorem, or, it may lead to something else entirely. – Carl Jan 05 '19 at 18:53
  • 4
    Do you have a source for that interpretation/usage of the word 'lemma'? Otherwise I believe you have simply misunderstood what the 'lemma' means. To borrow language from the linked answer from the companion site, I would interpret both the current and the previous versions of this question to mean "What is the more significant result for which the Neyman-Pearsion lemma was a 'helper' fact". – Juho Kokkala Jan 05 '19 at 18:58
  • @JuhoKokkala I am using logic, not references. The lemma definition in the link is *supposed* to be the *more significant result*, which is an unnecessary claim. Lemmas can point to results of lesser or greater significance than the theorem used as a starting point. Lemmas answer the question, "What does this theorem imply?". – Carl Jan 05 '19 at 19:05
  • @JuhoKokkala Lemma, [first meaning](https://en.oxforddictionaries.com/definition/lemma): A subsidiary or intermediate theorem in an argument or proof. I agree with the Oxford dictionary but would have changed the word order, and note the exact language: intermediate or subsidiary theorem. – Carl Jan 05 '19 at 19:17
  • @Carl thanks. I'm not fully convinced still as I've never heard lemma used in any other manner than 'intermediate theorem in a proof,' and I don't understand what 'subsidiary' means in the dictionary definition but cannot argue against that.(Perhaps you want to post a answer to one of the math.SE questions if you don't think the answers there are fully correct). – Juho Kokkala Jan 05 '19 at 19:26
  • @JuhoKokkala OK, try this explanation. Suppose you are generating a theorem (which may or may not be completely original). You create a theorem, then you look at it, and realize that it implies something that is interesting. So you explore that chain of thought to see where it leads. Now, since it is not exactly what the theorem states, but rather a somewhat different chain of thought, then you need a name for what you did, and you think: lemma. Such is the nature of things; first I was born; sinful, naked and nameless, then my soul was cleansed and entered into the Good Book; Carl. – Carl Jan 05 '19 at 19:40
  • 1
    *intermediate or subsidiary theorem. That is, a lemma answers the question, "What does the (above) theorem imply?"* This makes no sense. See the quote given by Francis *"the proof of this proposition is a simple consequence of the following Lemma"* There the lemma is *not* an implication of the (above) theorem/proposition. The lemma is another theorem based on which you can build a (larger/more complex/more important) proof for another theorem. – Sextus Empiricus Jan 05 '19 at 20:10
  • @MartijnWeterings Lemmas *can* be either (1) intermediary or (2) subsidiary. Francis's entry relates to (1) above. That does not exclude (2), subsidiary theorems, as also being lemmas. Your comment makes no sense. – Carl Jan 05 '19 at 20:16
  • @MartijnWeterings Look at https://www.thefreedictionary.com/lemma, two of three sources claim that a lemma is a subsidiary proposition, proved for use in the proof of *another* proposition. That is an exaggeration because it does not have to be "another". One of three definitions claims, as you do, that a lemma is "A subsidiary proposition assumed to be valid and used to demonstrate a principal proposition." That that entry is poor quality should be obvious given the nonsense phrase "assumed to be valid," we do no such thing. – Carl Jan 05 '19 at 20:32
  • 2
    *"That is an exaggeration because it does not have to be "another". "* where does this claim come from? This (without being, originally, part of a proof for 'another' theorem) is not how mathematicians use the term lemma. It is very similar to [the use in logic](https://en.wikipedia.org/wiki/Lemma_(logic)) A -> B -> C and the question asks what is C in the case of the lemma B being the Neyman Pearson lemma (it is definitely not Bayes rule/theorem). – Sextus Empiricus Jan 05 '19 at 20:38
  • @MartijnWeterings It comes from the fact that a lemma *can* but does not *have to* prove the theorem that the lemma addresses. That is, if we say *another* we are referencing something other that the principal theorem. I will perhaps hunt around for an example of this, but this is difficult time wise given the cascade of critical comments this thread is generating. Calm down people, would yuh, please? – Carl Jan 05 '19 at 20:47
  • @JuhoKokkala I will eliminate the word "theory" as a synonym of "theorem" as per your comment, even though the meaning is the same, and because people think in terms of context and not usually in terms of semantic equivalency. Interesting that, "all information is context dependent" however, the same Kolmogorov complexity can generate different contexts. – Carl Jan 05 '19 at 21:02
  • @MartijnWeterings Here is a sufficient condition lemma for closure of a theoretical process. It is not a necessary condition to establish that theory, i.e., a proof. An [Itô process](https://en.wikipedia.org/wiki/It%C3%B4_calculus#It%C3%B4_processes) is defined to be an adapted stochastic process that can be expressed as the sum of an integral with respect to Brownian motion and an integral with respect to time. [Itô's lemma](https://en.wikipedia.org/wiki/It%C3%B4%27s_lemma) is an identity used in Itô calculus to find the differential of a time-dependent function of a stochastic process. – Carl Jan 05 '19 at 21:41
  • @MartijnWeterings Do you accept that example as proving the rule or do you need a dozen more? I think the moral of the story is that actual usage of the word *lemma* follows the need to categorize theorems as they are generated, and does not follow from presumptuousness concerning what a lemma *should be*. – Carl Jan 05 '19 at 22:01
  • @MartijnWeterings Regarding a lemma answers the question, "What does the (above) theorem imply?" **This makes no sense.** To you, it didn't. I have altered that to read "a lemma ***may sometimes*** answer the question, "What does the (above) theorem imply?"" – Carl Jan 06 '19 at 01:57
  • @Scortchi Agreed, and a lemma indeed does not have to be part of the proof of the theorem to which it refers. What, at this point, is confusing you, if anything? – Carl Jan 06 '19 at 02:11
  • @JuhoKokkala You said "*Perhaps you want to post a answer to one of the math.SE questions if you don't think the answers there are fully correct*", they are not correct, and no thanks, I do not want to go through the torture of being criticized by people who have not investigated how the term *lemma* is actually being used all over again. – Carl Jan 06 '19 at 02:19
  • 1
    @Carl: It's still not clear precisely what relation between the N-P lemma & Bayes' Theorem you're describing, & why you think that this relation explains the former's being called a lemma. – Scortchi - Reinstate Monica Jan 08 '19 at 10:41
  • @Scortchi Edit: The basic structure of the 1933 paper is the same as the 1936 paper, which is lemme-like; the inequalities flow from assumptions from conditional probability and derivatives. However, I have my doubts as to a justification of its being called a lemma, that is, it is a theorem, and context determines whether it is a lemma or not. Still checking but it may be that in recent work that the N-P inequalities arise in a theorem context that then implies the derivative properties. – Carl Jan 08 '19 at 21:45
  • 1
    @Carl the N-P lemma has been used as a lemma in [several articles](https://scholar.google.nl/scholar?cites=1036175528635150601&scipsc=1&q=lemma) that refer to the 1936 paper. In the 1936 paper itself it is even mentioned and used explicitly as a lemma. Note as well that as well that the lemma (equations 23 to 26 in Francis' answer) has in itself nothing to do with probability directly. It is a statement about a set of integrable Functions $F_i$ and a comparison of integrals of $F_o$ over regions $w$ for which integrals of $F_i$ (where $i>0$) result in the same constant $c_i$. – Sextus Empiricus Jan 09 '19 at 17:47
  • @MartijnWeterings True enough, only integrability of some type is required (?borel measurable functions), which begs the question. Is it a lemma of the proposition or a theorem that stands alone? It appears to have larger applicability that the proposition(s) it is claimed to be subordinate to, which proposition(s) were originally presented as probability functions. So when is it fair to pick and choose what we wish to keep while ignoring what does not seem appropriate to our argument of the moment? – Carl Jan 09 '19 at 22:51
  • 1
    Lemma's are always theorems as well. It is just that when the theorem is a subsidiary or intermediate theorem in other proofs then people tend to call it a lemma. This is of course subjective and you can debate/challenge it whether or not people should or shouldn't have called it a lemma. However, the fact is that it has been called a lemma, and the question is 'why, what is the history about it?'. It is not strange to see why since it is often used as a intermediate step in proofs for other theorems, and on it's own it has little practical use (statisticians use the proposition not the lemma) – Sextus Empiricus Jan 09 '19 at 23:11
  • @MartijnWeterings I agree with you up to a point, and there are other opinions. For example, [In statistics, the Neyman–Pearson lemma was introduced by Jerzy Neyman and Egon Pearson in a paper in 1933.](https://en.wikipedia.org/wiki/Neyman%E2%80%93Pearson_lemma). Also, it is used in practice by statisticians as a [theorem](https://www.stat.washington.edu/jaw/COURSES/580s/581/LECTNOTES/ch6a.pdf), not a lemma, and it is called a lemma. – Carl Jan 10 '19 at 05:31
  • @Carl, could you say where exactly the lemma is mentioned in that article, it could be one of many errors in wikipedia. Note that statisticians that have been referring to the lemma all refer to the 1936 article and not the 1933 article. Regarding your second point. The Neyman Pearson test is *not* the Neyman Pearson lemma. See the [German wiki](https://de.wikipedia.org/wiki/Neyman-Pearson-Test) which is better in separating the two *"seine Bedeutung erlangt er durch das Neyman-Pearson-Lemma, das besagt, dass der Neyman-Pearson-Test ein gleichmäßig bester Test ist"* – Sextus Empiricus Jan 10 '19 at 06:45
  • @MartijnWeterings The very first sentence of that Wikipedia entry is the one cited. It is not, I think, incorrect. The concepts are outlined in the 1933 paper, and presented as a lemma only in 1936. I think we are splitting hairs here. You brought up "what statisticians use" and that is an application of the lemma, is it not? I do not read German, only English, French and Polish, if you insist, I will slog through the German. – Carl Jan 10 '19 at 09:30
  • *"and that is an application of the lemma, is it not?"*, indeed an application is used (so not the lemma itselve). That is what I already said: *"statisticians use the proposition not the lemma"* The lemma is subordinate to the proposition(s). – Sextus Empiricus Jan 10 '19 at 11:44
  • If you like you can also not mention the lemma at all and just make the proof without any mention of an intermediary step (but just insert the arguments for the proof of the lemma into the proof for the proposition). That is what Neyman and Pearson did in their 1933 article (on pages 300-301).: *"We shall now show that the necessary and sufficient condition for a region $w_0$, being the best critical region for $H_0$ with regard to the alternative hypothesis, $H_1$, consists in the fulfilment of the inequality $p_0(x_1,x_2,..., x_n)>k p_1(x_1,x_2,...x_n)$ ... at any point outside $w_0$..."* – Sextus Empiricus Jan 10 '19 at 11:47