32

This is a similar question to the one here, but different enough I think to be worthwhile asking.

I thought I'd put as a starter, what I think one of the hardest to grasp is.

Mine is the difference between probability and frequency. One is at the level of "knowledge of reality" (probability), while the other is at the level "reality itself" (frequency). This almost always makes me confused if I think about it too much.

Edwin Jaynes Coined a term called the "mind projection fallacy" to describe getting these things mixed up.

Any thoughts on any other tough concepts to grasp?

probabilityislogic
  • 22,555
  • 4
  • 76
  • 97
  • (I don't know enough to put this as an answer, hence adding a comment.) I always thought it was strange that PI crops up in statistical equations. I mean - what's PI got to do with statistics? :) – Reinstate Monica - Goodbye SE Jan 27 '11 at 09:16
  • 2
    I'd agree (In my surprisal) - I think its that $\pi$ pops up in many mathematical analysis. Just a note you can write $\pi$ by with Latex commands as $\text{\pi}$ enclosed within $ signs. I use the wiki page to get the syntax http://en.wikibooks.org/wiki/LaTeX/Mathematics . Another trick is to "right click" on an equation you see on this site, and select "show source" to get the commands that were used. – probabilityislogic Jan 27 '11 at 10:59
  • @Wiki If you accept that $\pi$ crops up when you go from measuring the length of a straigh piece of line to the length of a piece of circle, I don't see why it would not appear while going from measuring a probability to fall down on a segment to measuring the probability to fall down in a piece of circle ? – robin girard Jan 27 '11 at 12:19
  • @Wiki Whenever you have trigonometric funcions (sine, cosine, tangent etc.) you risk having $\pi$ pop up. And remember that whenever you derive a function you're actually finding a tangent. What is surprising is that $\pi$ doesn't appear *more* often. – Carlos Accioly Jan 28 '11 at 18:36
  • @Carlos I suspect the prevalence of $2\pi$ is mostly due to the use of the $\ell^2$ metric, leading to n-spheres. In the same vein, I would expect it's $e$ whose prevalence is due to analysis. – sesqu Jan 29 '11 at 22:00
  • Monty hall problem? :) – Arun Oct 02 '11 at 02:40

12 Answers12

31

for some reason, people have difficulty grasping what a p-value really is.

shabbychef
  • 10,388
  • 7
  • 50
  • 93
  • 3
    @shabbychef: Most of the people grasp it in the _worst_ possible way i.e. probability of making Type I error. – suncoolsu Jan 27 '11 at 07:19
  • 2
    I think that's mostly related to how p-values are explained in classes (i.e.: just by giving a quick definition and without specifying what p-values are NOT) – nico Jan 27 '11 at 07:34
  • I think this is mainly to do with how it is introduced. For me, it was an "add-on" to the classical hypothesis test - so it appears as though its just another way to do a hypothesis test. The other problem is that it is usually only taught with respect to a normal distribution, where everything "works nice" (e.g. p-value *is* a measure of evidence in testing a normal mean). Generalising the p-value is not easy as there is no specific principles to guide the generalisation (e.g. there is no general agreement on how a p-value should vary with the sample size & multiple comparisons) – probabilityislogic Jan 27 '11 at 11:12
  • @shabbychef +1 though student often have difficulties with p-values (roughly because the concept in testing is a bit more subtle than a binary decision process and be cause "inverting a function" is not easy to aprehend). When you say "for some reason" do you mean it is unclear for you why people have difficulties ? PS: If I could, I would try to make statistics on this site about the relation between "being a top answer" and "talking about p-value" :) . I also even ask myself if the hardest statistical concept to grasp can have the most upvote (if it is difficult to grasp ... :) ) – robin girard Jan 27 '11 at 12:28
  • @robin girard - in the spirit of the last part of your comment, one quote I like, I heard it from a comedian Bill Bailey is: *I analyse things too much...or maybe not enough?*. Another one I like (from me, not Bill Bailey): *I used to be uncertain, but now I'm not so sure...* :). It does look like p-value is going to win – probabilityislogic Jan 28 '11 at 17:06
  • If the null hypothesis is "your girlfriend hasn't been cheating on you with your best friend," and you catch them hugging, cuddling and holding hands in a dark place, then the p-value is **really** low. Unless she is ugly, in that case, the p-value might still be high. I'm completely sure even high-school students could understand p-values that way. – pyon Jan 29 '11 at 22:16
  • @Eduardo - a reasonable alernative hypothesis "your girlfriend was not chased by a rapist into a dark place, and your best friend found her and got the rapist to leave". If you see them hugging, cuddling, and holding hands in that dark place, this is inconsistent with the above hypothesis (unless your best friend or your girlfriend are uncomfortable about touching people). Hence it too will have a small p-value – probabilityislogic Jan 30 '11 at 01:52
  • @probabilityislogic: Well. At least we both understand p-values. – pyon Jan 30 '11 at 01:59
  • @Eduardo - I don't really get the "ugly" argument. "Ugly" people have sex, and for some people the saying "forbidden fruit is the sweetest" holds true. Another thing, is that it is your best friend who needs to think she is ugly for it to apply, and you cannot be certain of this (unless you can read his/her mind). – probabilityislogic Jan 30 '11 at 02:02
  • @probabilityislogic: I was just trying to use an example even the most mentally-challenged high-school student could understand. – pyon Jan 30 '11 at 02:09
  • @eduardo - I think the problem that comes about is that the data are insufficient for the problem. If you replaced observing "hugging, cuddling, and holding hands" with "walking in on your best friend and girlfriend having sex", then it would be much harder to come up with alternatives (although having thought about it, an alternative could be assault or blackmail) – probabilityislogic Jan 30 '11 at 02:10
  • @probabilityislogic: This is rapidly going off-topic. We might argue elsewhere. – pyon Jan 30 '11 at 02:11
  • @eduardo - I'd say the example is off topic, but the analogy isn't. This is because you can always come up with the hypothesis of "honesty" - i.e. you can get a small p-value by saying the hypothesis is "these data were not changed to be consistent with a different hypothesis". Because the probability of observing the particular data you observed is always small, the p-value will also be small. hence it is only by using your *prior information* about the integrity of the data, that you can eliminate these kinds of hypothesis. – probabilityislogic Jan 30 '11 at 02:18
  • @probabilityislogic: I know. I never claimed that the p-value would be zero. But a small enough p-value is sufficient to claim "Bullcrap!" when you are (1 minus p-value) sure they lying to you. Of course, how small does "small" mean to you is a matter of policy, and statistics doesn't deal with that. – pyon Jan 30 '11 at 02:32
  • 1
    @eduardo - yes a small enough p-value is sufficient to cast doubt on the null hypothesis: but it is calculated *in complete isolation* to an alternative. Using p-values alone, you can never formally "reject" $H_0$, because *no alternative has been specified*. If you formally reject $H_0$, then you must also reject the calculations which was based on the assumption of $H_0$ being true, which means you must reject the calculation of the p-value that was derived under this assumption (it messes with your head, but it is the only way to reason *consistently*). – probabilityislogic Jan 30 '11 at 05:14
  • @eduardo - using p-values in hypothesis testing is similar to the logical case of requiring a set of axioms to prove their own consistency. It just can't be done. However you can assess a given *theorem*, and work out which sets of axioms are consistent with the theorem. (axiom=hypothesis, theorem=data) – probabilityislogic Jan 30 '11 at 05:19
  • @probabilityislogic: Again, I know. If the null hypothesis is wrong, then the test statistic is meaningless. – pyon Jan 30 '11 at 06:18
  • @Eduardo - but if you think about it, Say we reject the null hypothesis, due to a small p-value. Once we have rejected the null; this means that the decision to reject the null hypothesis was based upon a *meaningless statistic*. Therefore, this invalidates the rejection we just made! – probabilityislogic Jan 30 '11 at 07:20
  • @probabilityislogic: No it doesn't invalidate the rejection we just made. We have already concluded that the null hypothesis can't be true. The p-value is meaningless because it is defined in terms of something that isn't true. From a strictly logical point of view, the rejection of the null hypothesis (on the premise that p-values shouldn't be that low) just forces you to come up with another null hypothesis or stay content with being ignorant about what's going on. – pyon Jan 30 '11 at 07:31
  • I think the above discussion between @eduardo and myself shows quite clearly why p-values are a conceptual nightmare. even in a seemingly obvious example, they do not appear to be what we think they are. – probabilityislogic Jan 30 '11 at 07:43
  • @eduardo - So you would consider second order logic to be fallicious reasoning? because that is what your answer implies. If you reject the *conditions* ($H_0$ is a condition of the p-value) then you must reject the conclusions which were based on those conditions. The same thing goes for axioms: you cannot simultaneously reject an axiom, but keep the theorem that depends on it for its proof. – probabilityislogic Jan 30 '11 at 11:29
  • @probabilityislogic: That's where you're wrong. The p-value isn't a _conclusion_ of the null hypothesis, but a _means to determine its believability_. Thus, if the p-value is too low, I can say, "Hahaha! Bullcrap!" – pyon Jan 30 '11 at 16:46
  • if the p-value isn't a conclusion of the null hypothesis, they why does it appear as one of the conditions required when calculating the null hypothesis. presumably, your response implies that the p-value could be constructed without the null hypothesis. And you cannot determine the *believability of the null* without referring to the prior probability of the null. Otherwise, you are to believe incompatible null hypothesis simulatneously (the data is fake vs the data is not fake can give the same p-value) – probabilityislogic Jan 31 '11 at 22:11
  • @eduardo - suppose *every* p-value for every null you could think of (including faking the data) was smaller than $10^{-5}$, what are you to believe? – probabilityislogic Jan 31 '11 at 22:14
  • @probabilityislogic: Setting the alpha of a test is a matter of "policy," if you want to give it a name. Given that the null hypothesis is true, the p-value could be _anything_ from 0 to 1, even a small value. So, from a strictly logical point of view, you cannot _decide_ (in a deductive way) whether the null hypothesis is true or not. So, at the risk of being wrong, you establish that the p-value shouldn't be lower than alpha, otherwise, you _don't believe_ that the null hypothesis is true, regardless of whether it is actually true, which you will never know. There is no paradox. – pyon Feb 01 '11 at 07:11
  • @Eduardo - but if you don't believe the null is true, then you are acting *as if* $P(H_0)=0$. And when this occurs, the conditional probability based on $H_0$ is *undefined* $P(T>T_{obs}|H_0)=\frac{P(T>T_{obs},H_0)}{0}$. By putting your *prior probability* of $H_0$ in addition to the p-value $P(H_0)\times P(T>T_{obs}|H_0)$ the $P(H_0)$ in the denominator cancels out, and you are no longer dividing by zero, and you are just left with the joint probability. This is a more sound way to reason because you are reject the null AND the data as a combination. – probabilityislogic Feb 01 '11 at 10:49
  • @probabilityislogic: No. Not believing it is plausible that the null hypothesis be null doesn't mean acting as if the probability of the null hypothesis were zero. It means "it might be true, but I won't _bother_ taking that possibility into consideration anymore." Remember that, in real life, we use statistical inference to make decisions when we don't have enough information to use the strictly deductive approach. – pyon Feb 01 '11 at 17:39
  • @Eduardo - but rejecting the null hypothesis *does* mean you are acting *as if it is false*. If you are using p-values as a heuristic guide to your inference, then there is no quams with that. But a low p-value on its own is *not a direct measure of the evidence* against the null( [Bernardo and Rueda](http://www.stat.duke.edu/research/conferences/valencia/IntStatRev.pdf) ). [This paper](http://predictive.files.wordpress.com/2010/03/binder1.pdf) also shows some of the logical flaws of p-values – probabilityislogic Feb 01 '11 at 20:02
  • @probabilityislogic: Who said statistical inference was a logical process? Basically, the situation is this: Someone would like to know something about a distribution (usually to make a decision), however, all he has is a limited subset of data drawn from that distribution. **From a strictly logical point of view, he simply does not have enough information to do anything at all. But he must do something anyway.** Thus, he introduces "premises" (such as, if a p-value were calculated, it shouldn't be lower than alpha) in order to make up for the information he doesn't have. – pyon Feb 01 '11 at 21:32
  • @Eduardo - I think you would have a hard time convincing someone that abandoning logic is the best thing to do. You simply move from *deductive* logic to *inductive* logic when you have insufficient information to do *deductive* logic. And of course he must do something - but the something that he does should be consistent, based on a proper test statistic. A proper test statistic has to be defined when the null is true and when it is false. A p-value is not, as my previous comments show. – probabilityislogic Feb 03 '11 at 02:56
  • @probabilityislogic: Inductive "logic" is not logic at all; it is just (sometimes educated) guessing. Fortunately for us, some forms of induction (such as statistical inference) are _reliable enough_, so the risk associated to the possibility that the induction be wrong is overshadowed by the benefit of getting more information, even if it is somewhat uncertain. We don't seek to have perfect information, just enough information to make good decisions. Remember that both deduction and induction are just tools. – pyon Feb 03 '11 at 05:02
  • @Eduardo - The problem with most forms of statistical inference is that they are hard to generalise, becuase they are often based on intuition rather than solid foundations. The p-value is such an example (how to adjust for multiple comparisons, how to deal with nuisance parameters, arbitrary choice of the statistic on which the p-value is based). Without a set of more broad axioms or desiderata to guide the generalisation, p-values are basically restricted to the realm of 1 parameter problems, or where pivotal quantities exist. – probabilityislogic Feb 04 '11 at 10:59
  • @probabilityislogic: I don't think it's desirable to generalize p-values to n-parameter problems. I find it conceptually simpler to reduce those problems to 1-parameter goodness of fit problems. It might not fit your definition of beautiful (it certainly doesn't fit mine), but it works. – pyon Feb 04 '11 at 14:17
  • @Eduardo - but then how do you adjust the p-value to account for the multiple 1-parameter comparisons you are making? there is no principles to guide you, only your intuition. – probabilityislogic Feb 08 '11 at 08:50
  • @probabilityislogic: I'm more often interested in the whole model than in individual 1-parameters. If the model works (i.e., helps me understand how a physical system works, and predict how it would behave given certain conditions), then each parameter has a good enough value. – pyon Feb 08 '11 at 14:52
  • @Eduardo: I think we're talking at different levels here. p-values usually give good indications in models (such as OLS regression) of the size of parameters. This is because they are based on a sufficient statistic, and no nuisance parameters (T-test). But once you leave this "safe" area (e.g. non-linearity), p-values do not necessarily have good properties. They can be useful as a guide, simply because the tend to be easy to calculate. It makes more sense to calculate the prob of the null, given the data, because we know what data we saw, but do not know what hypothesis to accept – probabilityislogic Feb 09 '11 at 12:53
  • ...cont'd... why do we calculate the probability of something which is certain (the data)? And why do we condition on something which is uncertain (the null)? This has always seemed backwards to me (although, if by prob of "data" we actually mean "future data", then it makes a bit more sense). To me, it makes more sense to condition on what you observe, because you cannot "unobserve" anything once it has been observed. – probabilityislogic Feb 09 '11 at 12:59
23

Similar to shabbychef's answer, it is difficult to understand the meaning of a confidence interval in frequentist statistics. I think the biggest obstacle is that a confidence interval doesn't answer the question that we would like to answer. We'd like to know, "what's the chance that the true value is inside this particular interval?" Instead, we can only answer, "what's the chance that a randomly chosen interval created in this way contains the true parameter?" The latter is obviously less satisfying.

Charlie
  • 13,124
  • 5
  • 38
  • 68
  • 1
    The more I think about confidence intervals, the harder it is for me to think of what kind of question they can answer at a conceptual level that cannot be answered by asking for "the chance a true value is within an interval, given one's state of knowledge". If I were to ask "what is the chance (conditional on my information) that the average income in 2010 was between 10,000 and 50,000?" I don't think the theory of confidence intervals can give an answer to this question. – probabilityislogic Jan 27 '11 at 11:28
21

What is the meaning of "degrees of freedom"? How about df that are not whole numbers?

13

Conditional probability probably leads to most mistakes in everyday experience. There are many harder concepts to grasp, of course, but people usually don't have to worry about them--this one they can't get away from & is a source of rampant misadventure.

dmk38
  • 1,534
  • 10
  • 13
9

I think that very few scientists understand this basic point: It is only possible to interpret results of statistical analyses at face value, if every step was planned in advance. Specifically:

  • Sample size has to be picked in advance. It is not ok to keep analyzing the data as more subjects are added, stopping when the results looks good.
  • Any methods used to normalize the data or exclude outliers must also be decided in advance. It isn't ok to analyze various subsets of the data until you find results you like.
  • And finally, of course, the statistical methods must be decided in advance. Is it not ok to analyze the data via parametric and nonparametric methods, and pick the results you like.

Exploratory methods can be useful to, well, explore. But then you can't turn around and run regular statistical tests and interpret the results in the usual way.

Harvey Motulsky
  • 14,903
  • 11
  • 51
  • 98
  • 5
    I think John Tukey might disagree http://en.wikipedia.org/wiki/Exploratory_data_analysis ;o) – Dikran Marsupial Jan 27 '11 at 17:32
  • 3
    I would partially disagree here. I think the caveat that people miss is that *the appropriate conditioning operations are easy to ignore* for these kinds of issues. Each of these operations change the conditions of the inference, and hence, they change the conditions of it applicability (and therefore to its generality). These is definitely only applicable to "confirmatory analysis", where a well defined model and question have been constructed. In exploratory phase, not looking to answer definite questions - more looking to build a model and come up with hypothesis for the data. – probabilityislogic Jan 27 '11 at 18:05
  • I edited my answer a bit to take into account the comments of Dikran and probabilityislogic. Thanks. – Harvey Motulsky Jan 28 '11 at 14:52
  • 1
    For me, the "excluding outliers" is not as clearly *wrong* as your answer implies. For example, you may only be interested in the relationships at a certain range of responses, and excluding outliers actually helps this kind of analysis. For example, if you want to model "middle class" income, then excluding the super rich and impoverished outliers is a good idea. It is only the outliers within your frame of inference (e.g. "strange" middle class observations) were your comments apply – probabilityislogic Jan 29 '11 at 06:10
  • 2
    Ultimately the real problem with the issues raised in the initial answer is that they (at least partially) invalidate p-values. If you are interested in quantifying an observed effect, one should be able to do any and all of the above with impunity. – russellpierce Jan 29 '11 at 19:24
  • @drknexus - I wouldn't say they invalidate the p-value per say, rather that the null hypothesis that it is based on is implicitly changed – probabilityislogic Jan 30 '11 at 01:38
  • @probablityislogic: Agreed. In practice, since we assume a straightforward null and the underlying probabilities governing various choices the experimenter made are probably undefined, do we have any realistic hope of converting a mangled p value into our standard frame of reference? I guess the difficulty of this task is what prompted me to call them invaldated. – russellpierce Jan 30 '11 at 16:22
  • As far as sample size is concerned: there is this adaptive sampling approach that people seem to take seriously. So I guess you don't always have to pick your sample size up front. – xmjx Sep 23 '11 at 06:41
9

Tongue firmly in cheek: For frequentists, the Bayesian concept of probability; for Bayesians, the frequentist concept of probability. ;o)

Both have merit of course, but it can be very difficult to understand why one framework is interesting/useful/valid if your grasp of the other is too firm. Cross-validated is a good remedy as asking questions and listening to answers is a good way to learn.

Nick Cox
  • 48,377
  • 8
  • 110
  • 156
Dikran Marsupial
  • 46,962
  • 5
  • 121
  • 178
  • 2
    I rule I use to remember: Use probabilities to predict frequencies. Once the frequencies have been observed, use them to evaluate the probabilities you assigned. The unfortunately confusing thing is that, often the *probability* you assign is equal to a *frequency* you have observed. One thing I have always found odd is why do *frequentists* even use the word probability? wouldn't it make their concepts easier to understand if the phrase "the frequency of an event" was used instead of "the probability of an event"? – probabilityislogic Jan 28 '11 at 16:58
  • Interestingly, cross validation can be seen as a Monte Carlo approximation to the integral of a loss function in Decision Theory. You have an integral $\int p(x) L(\textbf{x}_{n},x) dx$ and you approximate it by $\sum_{i=1}^{i=n} L(\textbf{x}_{[n-i]},x_i)$ Where $\textbf{x}_{n}$ is data vector, and $\textbf{x}_{[n-i]}$ is the data vector with the *ith* observation $x_i$ removed – probabilityislogic Jan 30 '11 at 01:30
8

From my personal experience the concept of likelihood can also cause quite a lot of stir, especially for non-statisticians. As wikipedia says, it is very often mixed up with the concept of probability, which is not exactly correct.

radek
  • 1,207
  • 2
  • 15
  • 37
7

Fiducial inference. Even Fisher admitted he didn't understand what it does, and he invented it.

onestop
  • 16,816
  • 2
  • 53
  • 83
6

What do the different distributions really represent, besides than how they are used.

mariana soffer
  • 1,091
  • 2
  • 15
  • 18
  • 3
    This was the question I found most distracting after statistics 101. I would encounter many distributions with no motivation for them beyond "properties" that were relevant to topics at hand. It took unacceptably long to find out what any represented. – sesqu Jan 29 '11 at 22:12
  • 1
    Maximum entropy "thinking" is one method which helps understand what a distribution is, namely a state of knowledge (or a description of uncertainty about something). This is the only definition that has made sense to me in all situations – probabilityislogic Jan 30 '11 at 04:57
  • Ben Bolker provides a good overview of this in the 'beastiary of distributions' section of [Ecological Models and Data in R](http://emdbolker.wikidot.com/) – David LeBauer Sep 23 '11 at 05:04
5

I think people miss the boat on pretty much everything the first time around. I think what most students don't understand is that they're usually estimating parameters based on samples. They don't know the difference between a sample statistic and a population parameter. If you beat these ideas into their head, the other stuff should follow a little bit easier. I'm sure most students don't understand the crux of the CLT either.

Adam
  • 813
  • 10
  • 16
5

I think the question is interpretable in two ways, which will give very different answers:

1) For people studying statistics, particularly at a relatively advanced level, what is the hardest concept to grasp?

2) Which statistical concept is misunderstood by the most people?

For 1) I don't know the answer at all. Something from measure theory, maybe? Some type of integration? I don't know.

For 2) p-value, hands down.

Peter Flom
  • 94,055
  • 35
  • 143
  • 276
  • Measure theory is neither a field of statistics nor hard. Some types of integration are hard, but, once again, that isn't statistics. – pyon Jan 29 '11 at 22:12
5

Confidence interval in non-Bayesian tradition is a difficult one.

Shige
  • 307
  • 1
  • 2