10

Is there any formal (mathematical) definition of what frequentists understand under ''probability''. I read that it is the relative frequency of occurrence ''in the long run'', but is there some formal way to define it ? Are there any known references where I can find that definition ?

EDIT:

With frequentist (see comment by @whuber and my comments to the answer @Kodiologist and @Graeme Walsh below that answer) I mean those that ''believe'' that this long run relative frequency exists. Maybe this (partly) answers the question of @Tim also

  • 7
    Please explain what you mean by "Frequentist." The uses I have seen in other threads indicate many people have no consistent or clear sense of what this term might mean. A definition would therefore help keep any answers relevant. – whuber Aug 29 '16 at 17:18
  • @whuber: my question was just that, what is ''frequentist probability'', how is this probability defined formally ? I read this : https://en.wikipedia.org/wiki/Frequentist_probability, but are there any other references than wiki ? Is this a ''valid'' definition of probability ? –  Aug 29 '16 at 17:38
  • 5
    @whuber I guess the definition of frequentist is "non Bayesian" and of Bayesian is "non frequentist" in most cases :) – Tim Aug 29 '16 at 18:35
  • 1
    Closely related: https://en.wikipedia.org/wiki/Empirical_probability – Silverfish Aug 29 '16 at 18:48
  • @whuber after thinking about it, it seems that your question is interesting, to be asked: http://stats.stackexchange.com/questions/232356/who-are-frequentists – Tim Aug 29 '16 at 18:52
  • 2
    I was going to say that this http://stats.stackexchange.com/a/230943/113090 would probably be of interest to you, but then I realized that you are the person who posted that answer, so never mind. Anyway your thought process there might be of interest to others who also have the same question as you (e.g. me) "does there exist a formal frequentist definition of probability" – Chill2Macht Aug 29 '16 at 19:03
  • @whuber: ''what is frequentist''; see my comment below Kodiologist's answer (the one that reacts to Graeme Walsh's comment) –  Aug 29 '16 at 19:41
  • 6
    I am not sure I will have the energy to write an answer myself, but I would like to leave here the same link to the Stanford Encyclopedia of Philosophy entry on [Interpretations of Probability](http://plato.stanford.edu/entries/probability-interpret/) that I posted under your answer in the related thread. The section on frequentist interpretation/definition is a good read. It talks extensively about various conceptual problems with attempts to give a frequentist definition of probability. – amoeba Aug 29 '16 at 23:08
  • `'believe' that this long run relative frequency exists` This might open Pandora's box of opinions what is "to exist" and what is "to believe" and how they are related to what is "reality". It might turn in the end that the Q raised by you in its current form is a pseudo-question. – ttnphns Sep 01 '16 at 15:11

2 Answers2

6

I don't think there is a mathematical definition, no. The difference between the various interpretations of probability is not a difference in how probability is mathematically defined. Probability could be mathematically defined this way: if $(Ω, Σ, μ)$ is a measure space with $μ(Ω) = 1$, then the probability of any event $S ∈ Σ$ is just $μ(S)$. I hope you agree that this definition is neutral to questions like whether we should interpret probabilities in a frequentist or Bayesian fashion.

Kodiologist
  • 19,063
  • 2
  • 36
  • 68
  • that's fine but this definition of probability as $\mu$ that fulfills the axioms of Kolmogorov is very abstract, it needs to be defined in specific cases. It is the same as 'a circle is the set of points that is at a given distance from a fixed point'. It does not mean anything as long as you don't say in which metric space you are: you should say what is the definition of ''distance''. I think that defining $\mathbb{P}$ as a long run relatve frequency does fulfill the axioms of Kolmogorov, what do you think ? P.S. The defintition in the comment of @Silverfish also fulfill these axioms. –  Aug 29 '16 at 19:12
  • (continued) so for short, I can define (**define** is the right word), many $\mu$ that fulfill the axioms of Kolmogorov and these are all valid probabilities according to the axiomatic theory. –  Aug 29 '16 at 19:17
  • Arguably, Kolmogorov's system provides _an_ axiomatic basis - which does not necessarily entail a frequentist or Bayesian interpretation. In the spirit of the frequentist view, the basic idea is that as the number of trials increases to infinity, the empirical frequency stabilizes around, or converges to, some value; the probability of the event. $$ \lim_{n\rightarrow \infty} \left(n_{A}/ n\right) = P_A = P(A).$$ Although the frequency approach improves the classical approach, the lack of rigour leads to the axiomatic foundation. Is this more a question about the history of probability theory? – Graeme Walsh Aug 29 '16 at 19:24
  • @Graeme Walsh: could you put that into an answer, and complete it with arguments why such a definition of $P(A)$ is in line with Kolmogorov's axioms ? (of course one can question the existence of the limit, but then we might say that frequentists are those that ''believe'' in the existence of it?) –  Aug 29 '16 at 19:28
  • @fcop First, I don't get the point about having to be in line with Kolmogorov - my point was that he provided _an_ axiomatic approach - indeed, the most popular one, but there are other systems. Second, in history, the frequentist view precedes the very existence of Kolmogorov. The lack of rigour in frequentist interpretation ultimately led to the axiomatic approach. It's a matter of the course of history. No? Are you asking for a formulation of the frequentist approach as a special case of an axiomatic theory like Kolmogorov's? Maybe I am missing something. – Graeme Walsh Aug 29 '16 at 19:42
  • @Graeme Walsh: well it is not so hard to show that the definition with the 'limit' that you provided fulfills all the axioms of Kolmogorov. So 'yes' I think it is a special case of the axiomatic theory. That is not unexpected, as Kolmogorov probably wanted to find an axiomatic system that ''contained'' the definitions that preceded Kolmogorov's theory. I assume he did not want to replace everything but only wanted to find a consistent (axiomatic) basis for all the existing definitions ? –  Aug 29 '16 at 19:50
  • @fcop I don't see how you're going to get a mathematical formalization of a concept that isn't entirely mathematical, namely, frequentism. But if you disagree, I invite you to try to prove me wrong. – Kodiologist Aug 29 '16 at 20:00
  • @Kodiologist: see the definition given by Graeme Walsh in the comments supra. ''frequentism'' is the long run relative frequency of occurence, so if $n$ is the number of experiments and $n_A$ the number of times that the event $A$ occurs then $P(A)=lim_{n\to +\infty} n_A/n$ ? –  Aug 29 '16 at 20:04
  • 2
    @fcop As Walsh notes, this "definition" is not rigorous. – Kodiologist Aug 29 '16 at 20:11
  • @Kodiologist: please explain why it is not rigorous ? –  Aug 29 '16 at 20:12
  • @fcop It's not rigorous for two reasons (I think). First, how does one create an infinite number of trials? Second, what happens in situations where repeated trials are not possible? – Graeme Walsh Aug 29 '16 at 20:15
  • @Graeme Walsh: How can you then define $lim_{x \to +\infty} f(x)$ ? You can not walk to infinity do you ? –  Aug 29 '16 at 20:18
  • @fcop " please explain why it is not rigorous ?" — The first thing that comes to mind: suppose $X$ is a standard normal random variable, and try using this definition to compute the probability that a draw from $X$ is irrational. The sequence of outcomes of $X$ could be all rational, all irrational, or a mixture, but the unique correct answer is 1. – Kodiologist Aug 29 '16 at 20:24
  • @Kodiologist: well talking about rigorous; I will tell you if you give me a rigorous definition of ''irrational'', without that I can't tell and please, tell me in terms of all the concepts of the probability space that you mentioned. –  Aug 29 '16 at 20:43
  • @fcop A real number $x$ is irrational if there are no integers $a$ and $b$ with $x = a/b$. Am I missing something? Irrationality requires no probabilistic or measure-theoretic concepts to define. – Kodiologist Aug 29 '16 at 20:52
  • @Kodiologist: look at your answer, the probability is only defined for the elements of $\Sigma$, am I missing something when I say that $\Sigma$ does not mention anything about ''rational'' or ''irrational'' or ''integer'' ? What are you talking about, you should only talk about $\Omega$ or $\Sigma$ or $\mu$, all these other ''things'' are undefined so non-rigorous ? –  Aug 29 '16 at 20:59
  • @fcop Yes, you are missing something: the set of irrational numbers is a measurable set and hence is an element of $Σ$. In particular, in this case, $Σ$ is the Borel σ-algebra on ℝ. – Kodiologist Aug 29 '16 at 21:51
4

TL;DR It doesn't seem like it is possible to define a frequentist definition of probability consistent with the Kolmogorov framework which isn't completely circular (i.e. in the sense of circular logic).

Not too long so I did read: I want to address what I see as some potential problems with the candidate frequentist definition of probability $$\underset{n \to \infty}{\lim} \frac{n_A}{n} $$ First, $n_A$ can only be reasonably be interpreted as a random variable, so the above expression is not precisely defined in a rigorous sense. We need to specify the mode of convergence for this random variable, be it almost surely, in probability, in distribution, in mean, or in mean squared.

But all of these notions of convergence require a measure on the probability space to be defined to be meaningful. The intuitive choice, of course, would be to pick convergence almost surely. This has the feature the limit needs to exist pointwise except on an event of measure zero. What constitutes a set of measure zero will coincide for any family of measures which are absolutely continuous with respect to each other -- this allows us to define a notion of almost sure convergence making the above limit rigorous while still being somewhat agnostic about what the underlying measure for the measurable space of events is (i.e. because it could be any measure absolutely continuous with respect to some chosen measure). This would prevent circularity in the definition which would arise from fixing a given measure in advance, since that measure could (and in the Kolmogorov framework usually is) defined to be the "probability".

However, if we are using almost sure convergence, then that means we are confining ourselves to the situation of the strong law of large numbers (henceforth SLLN). Let me state that theorem (as given on p. 133 of Chung) for the sake of reference here:

Let $\{X_n\}$ be a sequence of independent, identically distributed random variables. Then we have $$ \mathbb{E}|X_1| < \infty \implies \frac{S_n}{n} \to \mathbb{E}(X_1)\quad a.s.$$ $$\mathbb{E}|X_1| = \infty \implies \underset{n \to \infty}{\lim\sup}\frac{|S_n|}{n} = + \infty \quad a.s. $$ where $S_n:= X_1 + X_2 + \dots + X_n$.

So let's say we have a measurable space $(X, \mathscr{F})$ and we want to define the probability of some event $A \in \mathscr{F}$ with respect to some family of mutually absolutely continuous probability measures $\{\mu_i\}_{i \in I}$. Then by either the Kolmogorov Extension Theorem or Ionescu Tulcea Extension Theorem (I think both work), we can construct a family of product spaces $\{(\prod_{j=1}^{\infty} X_j)_i\}_{i \in I}$, one for each $\mu_i$. (Note that the existence of infinite product spaces which is a conclusion of Kolmogorov's theorem requires the measure of each space to be $1$, hence why I am now restricting to probability, instead of arbitrary, measures). Then define $\mathbb{1}_{A_j}$ to be the indicator random variable, i.e. which equals $1$ if $A$ occurs in the $j$th copy and $0$ if it does not, in other words $$n_A = \mathbb{1}_{A_1} + \mathbb{1}_{A_2} + \dots + \mathbb{1}_{A_n}.$$ Then clearly $0 \le \mathbb{E}_i \mathbb{1}_{A_j} \le 1 $ (where $\mathbb{E}_i$ denotes expectation with respect to $\mu_i$), so the strong law of large numbers will in fact apply to $(\prod_{j=1}^{\infty} X_j)_i$ (because by construction the $\mathbb{1}_{A_j}$ are identically and independently distributed - note that being independently distributed means that the measure of the product space is multiplicative with respect to the coordinate measures) so we get that $$\frac{n_A}{n} \to \mathbb{E}_i \mathbb{1}_{A_1} \quad a.s. $$ and thus our definition for the probability of $A$ with respect to $\mu_i$ should naturally be $\mathbb{E}_1 \mathbb{1}_{A}$.

I just realized however that even though the sequence of random variables $\frac{n_A}{n}$ will converge almost surely with respect to $\mu_{i_1}$ if and only if it converges almost surely with respect to $\mu_{i_2}$, (where $i_1, i_2 \in I$) that doesn't necessarily mean that it will converge to the same value; in fact, the SLLN guarantees that it won't unless $\mathbb{E}_{i_1} \mathbb{1}_A = \mathbb{E}_{i_2} \mathbb{1}_A$ which is not true generically.

If $\mu$ is somehow "canonical enough", say like the uniform distribution for a finite set, then maybe this works out nicely, but doesn't really give any new insights. In particular, for the uniform distribution, $\mathbb{E}\mathbb{1}_A = \frac{|A|}{|X|}$, i.e. the probability of $A$ is just the proportion of points or elementary events in $X$ which belong to $A$, which again seems somewhat circular to me. For a continuous random variable I don't see how we could ever agree on a "canonical" choice of $\mu$.

I.e. it seems like it makes sense to define the frequency of an event as the probability of the event, but it does not seem like it makes sense to define the probability of the event to be the frequency (at least without being circular). This is especially problematic, since in real life we don't actually know what the probability is; we have to estimate it.

Also note that this definition of frequency for a subset of a measurable space depends on the chosen measure being a probability space; for instance, there is no product measure for countably many copies of $\mathbb{R}$ endowed with the Lebesgue measure, since $\mu(\mathbb{R})=\infty$. Likewise, the measure of $\prod_{j=1}^n X$ using the canonical product measure is $(\mu(X))^n$, which either blows up to infinity if $\mu(X) >1$ or goes to zero if $\mu(X) <1$, i.e. Kolmogorov's and Tulcea's extension theorems are very special results peculiar to probability measures.

Chill2Macht
  • 5,639
  • 4
  • 25
  • 51
  • 1
    Thanks for the nice answer (+1). I agree that there are ''problems'' with the definition in terms of a long-run relative frequency, that was probably one of the reasons why Kolmogorov developed his Grundbegriffe. However when we speak about frequentists we have to place ourselves in the time frame before Kolmogorov's theory I think ? –  Aug 30 '16 at 06:40
  • 2
    @fcop I guess honestly I have no idea. I guess what I'm trying to say is that I don't see how any rigorous justification for the frequentist understanding of probability could lead to a useful/non-circular definition. – Chill2Macht Aug 30 '16 at 16:05
  • @fcop I really appreciate the generous bounty -- I was in really quite a bad mood today before receiving it. It honestly has me somewhat floored (in a good way). Again, I really appreciate it – Chill2Macht Sep 02 '16 at 20:51
  • don't mention it, your answer is very well developed and mathematically sound. –  Sep 03 '16 at 06:34