10

The definition of a continuous variable in our class seems to be, well, not a definition, as there are exceptions not included in its definition.

I am a 4th year math student and find it appalling that such a rinky dink thing can be a definition. Could someone possible give me a definition that is capable of differentiating between continuous variables and discrete ones that is completely accurate or give me a list of all the continuous variables that need to be treated as discrete and when. So far I have that money is somehow discrete as well as time sometimes?

Concentrations I'm not sure about (e.g parts per million) but weight, length and temperature seem to be continuous. Time seems to be completely confused.

They claim that a continuous random variable is one that would take an infinite amount of time to list all possible values, whereas a discrete one is one that can be counted.

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
Faust
  • 203
  • 7
  • 7
    How well do you know measure theory? Really formal definitions involve the Radon-Nikodym derivative. – Dave Sep 13 '21 at 15:20
  • 10
    It might help if you could edit the question to quote the definition of "continuous variable" that you have been provided in your class. – EdM Sep 13 '21 at 15:27
  • 1
    @Dave you are good to go I'm familiar with measure theory ^^ – Faust Sep 13 '21 at 15:30
  • 1
    @EdM added what they use it seemed to be pretty wishywashy to me but i am used to definitions being like they are in analysis or abstract algebra. – Faust Sep 13 '21 at 15:39
  • 2
    Oh, goodness, that definition in your edit is terrible (not your fault). A Poisson distribution has an infinite sample space, meaning that it would take an infinite amount of time to list all possible values, yet Poisson most certainly is discrete! // I'm not the measure theoretic probability expert, but we have some members who know that material well. // I was helping someone study for the statistics section of a licensing exam, and the study material gave a similarly terrible definition of a discrete variable as having finitely-many possibilities (to which Poisson is a counterexample). – Dave Sep 13 '21 at 15:40
  • 1
    It's pretty bad, they state that definition and then give the example of all the possible values between 0 and 0.5 and state that its continuous cause it would take an infinite about of time to write them all down... They do specifically state that countable infinite is included in the countable part of the definition of discreet. I didn't want to argue with them that it would take an infinite amount of time to write down everything in something that is countably infinite. I Just assumed that they meant to say something is discrete if its finite or in bijection with N – Faust Sep 13 '21 at 15:44
  • 5
    For the level where your class is operating, countable vs uncountable sample space is likely to suffice as the definition. – Dave Sep 13 '21 at 15:49
  • 5
    Statistics is full of assumptions that often turn out to be dispensable. Your pain is only just beginning. – Nick Cox Sep 13 '21 at 17:53
  • 7
    @Faust You keep writing discreet but you mean discrete! – AdamO Sep 13 '21 at 18:05
  • 4
    Isn't this the same question addressed at https://stats.stackexchange.com/questions/455904? – whuber Sep 13 '21 at 19:14
  • 1
    @AdamO you are welcome to reprogram my autocorrect. – Faust Sep 13 '21 at 21:34
  • 3
    Time and money are not well-defined mathematical concepts. If you want to bring them into mathematics, you need choose a variable space, or set of numbers, to represent them. Such a representation could either be discrete or continuous (or a mix of the two). For example, you could consider the amount of seconds since some starting point and treat that as either an integer, which would be discrete, or a real number, which would be continuous. – Bernhard Barker Sep 14 '21 at 01:34
  • 1
    genius question that in fact in my opinion should be something every introductory probability instructor will give its students instead of that there will be 1 exceptional student like yourself to ask it – BCLC Sep 14 '21 at 03:45

5 Answers5

18

A random variable $R$ is said to be continuous if for every real number $t,$ the probability that $R$ equals $t$ is zero $P(R = t) = 0.$ A random variable $R$ is said to be discrete if there exists a countable set of values $t_1, \ldots, t_n, \ldots$ such that $P(R = t_i) > 0$ for all $i$ and $\sum\limits_i P(R = t_i) = 1.$ The Radon-Nikodym and Lebesgue Decomposition theorems show every the cumulative distribution function (a.k.a. CDF) of every random variable can be expressed as $$ F = aF_{ac} + b F_{dc} + c F_{pm} $$ where $a, b, c \geq 0$ and $a + b + c = 1,$ where $F_{ac}$ is the CDF of an absolutely continuous random variable (i.e. $F_{ac}$ admits a density), and $F_{dc}$ is the CDF of degenerated continuous random variable and $F_{pm}$ is the CDF of a discrete random variable (so pm stands for point-mass). It is hard to construct examples of degenerated continuous random variables for their CDF must be continuous, increasing, not constant, and have a zero derivative almost everywhere. A typical example is Cantor's Devil Staircase function (https://en.wikipedia.org/wiki/Cantor_function). So you usually assume that random variables are either absolutely continuous, discrete or mixture of these two types.

EDIT: this question received a lot of attention, so let me expand a bit. This definition is motivated on the 1D case (univariate random variables). The condition that $P(R = t) = 0$ for all $t$ signifies that the CDF of $R$ is a continuous function $\mathbf{R} \to [0,1].$ Indeed, it is a well-known fact that a CDF is non-decreasing function, a fortiori it can only have jump discontinuities. But a jump discontinuity of a CDF is precisely at the "atoms" of the distributions (an "atom" of a random variable $R$ is a value values $t$ such that $P(R = t) > 0$). To see this, we use that the CDF $F$ is already (by definition) continuous on the right, so that $F$ is continuous if and only if is continuous on the left. Now, $$ F(t) - F(t - \delta) = P(R \leq t) - P(R \leq t - \delta) = P(t - \delta < R \leq t), $$ by measure-theoretic properties of $P,$ the right hand side converges to $P(R = t),$ so that $F$ is continuous on the left if and only if $P(R = t) = 0,$ which is the main motivation to call an atomless random variable a "continuous random variable."

William M.
  • 400
  • 1
  • 10
  • You should probably have an integration condition for the continuous case: the Lebesgue integral is equal to $1.$ – Adrian Keister Sep 13 '21 at 19:35
  • 2
    I believe that I am working with $F_{ac}, F_{dc}$ and $F_{pm}$ representing probability measures, The condition on $a+b+c=1$ guarentees their sum is also a probability measure. Is this what you mean? – William M. Sep 13 '21 at 19:48
  • 1
    Oh, sure - that works. Hadn't thought through that aspect of your answer carefully enough, I guess. – Adrian Keister Sep 13 '21 at 19:50
  • 1
    pdf usually means density though esp in elementary probability? – BCLC Sep 14 '21 at 03:47
  • 1
    @BCLC thank you, I totally forgot that in elementary probability they use "pdf" in lieu of "density" as I did. – William M. Sep 14 '21 at 15:12
  • 2
    "Degenerated continuous" is perhaps better known as "singular" or "singular continuous". – Peter O. Sep 14 '21 at 20:58
  • what does lebesgue decomposition have to say for [probability spaces where all the events are trivial (probability 0 or 1)](https://stats.stackexchange.com/questions/560751/if-every-event-is-trivial-0-or-1-probability-then-every-random-variable-is-a)? – BCLC Jan 17 '22 at 09:37
  • @BCLC I believe in that case $a = b = 0$ and $F_{pm}$ is concentrated in a point. – William M. Jan 18 '22 at 16:57
6

A random variable is a function that maps a sample space to the real numbers. A continuous random variable is such a function such that it can take on any value in an interval - not any arbitrary interval, but an interval which makes sense for any particular random variable under consideration. A discrete random variable is a random variable that can only assume a finite or countably infinity number of distinct values.

For reference, see Mathematical Statistics with Applications, 5th Ed., by Wackerly, Mendenhall, and Scheaffer. The random variable is defined as Definition 2.11 on p. 65. The discrete random variable is defined as Definition 3.1 on p. 76. The continuous random variable is defined on p. 136.

In the referenced textbook, I have never seen exceptions to these definitions, except perhaps the so-called "mixed" random variables that are partly discrete, partly continuous.

Adrian Keister
  • 3,664
  • 5
  • 18
  • 35
  • It must be some genius idea from my school then. They state that financial items are discreet and depending on the situation time can be discreet i was just hoping for a clear definition that would allow me to determine which was which for the purposes of the class. However your definition would imply that both money and time must be continuous random variables at all times a contradiction of my stupid stats class. – Faust Sep 13 '21 at 15:35
  • Finance could go either way, honestly. It depends on how you're thinking about it, and whether $\$0.005$ is a sensible idea or not. If you only allow your analysis to go to the cents level (well, here in the US, anyway), you could legitimately think of money as discrete. Time is usually continuous (by the way, in this context, you want to spell the word 'discrete', not 'discreet'. They have VERY different meanings.), but it can actually be discrete depending on your data and how often it's collected. – Adrian Keister Sep 13 '21 at 16:05
  • 4
    It is **not** that, say, "Money" is discrete or is continuous. Money is in the real world, within mathematics it can be *represented* by a continuous varoable, or it might be *represented* as a discrete variable! – kjetil b halvorsen Sep 13 '21 at 16:18
  • 2
    Your definition of a continuous variable is incorrect. For instance, any Gamma-distributed random variable can take only positive values. Thus, in the interval $[-1,1]$ (say), it obviously does not satisfy your definition. – whuber Sep 13 '21 at 19:12
  • @whuber: The interval inside which the continuous random variable can take on any value is not any arbitrary interval, but an interval that makes sense according to whichever particular random variable you're examining. – Adrian Keister Sep 13 '21 at 19:31
  • 2
    That is empty or circular--it cannot stand either as a definition or a characterization. Even if by "make sense" you mean "subset of the range of the variable," the characterization is incorrect. Consider, for instance, the distribution of a uniform $[0,1]$ random variable $X$ that has been modified to a variable $X^\prime$ by setting it to $0$ whenever $X$ is rational. Now *every* nonempty closed interval in the range of $X^\prime$ includes rational values, thereby *excluding* a value of $X^\prime.$ We can scarcely claim that *no* interval "makes sense" for $X^\prime$! – whuber Sep 13 '21 at 20:38
  • 1
    Well, perhaps you've pointed out a flaw in Wackerly, Mendenhall, and Scheaffer. (I'm just reporting their definition.) They didn't bother to have a nice, set-out definition of a continuous random variable. I suppose you could define a continuous random variable as a random variable that is not discrete, but negative definitions aren't as good as positive ones. How would you define a continuous RV? – Adrian Keister Sep 13 '21 at 20:52
  • 1
    I define continuous RVs at https://stats.stackexchange.com/a/298434/919. – whuber Sep 14 '21 at 15:28
  • whuber: I think some authors allow $F$ to be continuous from the left at every point (or continuous from the right at every point, I forget which). Don't there have to be limit criteria as well, such as \begin{align*} \lim_{x\to-\infty}F(x)&=0\\ \lim_{x\to\infty}F(x)&=1? \end{align*} Or is that implied by some of your other comments? – Adrian Keister Sep 14 '21 at 15:38
5

Supplemental to answers of @WillM and @AdrianKeister, both (+1).

Sometimes we model essentially discrete situations as continuous. If you want to be fussy about amount of debt on individual credit cards in the US, then that debt is truly discrete at the one cent level. Even interest charges will be rounded to the nearest cent. But if a continuous model such as $\mathsf{Norm}(\mu = 10000, \sigma=2000)$ is approximately correct for a particular group of cardholders, that is easer to deal with than a discrete random variable with something like $1\,200\,000$ discrete values spaced 1-cent apart.

Also, it is common to approximate $\mathsf{Binom}(n=100, p-1/2)$ as $\mathsf{Norm}(\mu=50, \sigma=5).$ Even though this is perhaps done less frequently now that we have software that deals gracefully with 101 discrete outcomes. The probability of getting between 45 and 55 (inclusive) heads in 100 independent tosses of a fair coin is exactly $0.72875$ to five places--in R using the CDF pbinom or the PDF (PMF) dbinom.

diff(pbinom(c(44,55), 100, .5))
[1] 0.728747
sum(dbinom(45:55, 100, .5))
[1] 0.728747

A normal approximation, using the 'continuity correction', is $0.72867$ to five places. In practical terms, it would take an enormous number of coin tosses to distinguish between the two answers.

diff(pnorm(c(44.5, 55.5), 50, 5))
[1] 0.7286679

enter image description here

R code for figure:

x = 0:100; PDF = dbinom(x, 100, .5)
hdr="PDF of BINOM(100, .5) with Normal Approximation"
plot(x, PDF, type="h", lwd=2, main=hdr)
abline(h=0, col="green2")
curve(dnorm(x, 50, 5), add=T, col="blue", lwd=2)
abline(v=c(44.5, 55.5), col="red", lty="dotted")

Formally, working below the measure-theoretic level, a continuous random variable can be defined in terms of its density function. [WMS 6e, p155], which has $f(x)\ge 0,$ for real $x,$ with $\int_{-\infty}^\infty f(x)\,dx = 1.$ Probabilities are defined for intervals $[a,b],$ with $a\le b$ as $\int_a^b f(x)\,dx,$ with the consequence that the probability of any single point is $0.$

BruceET
  • 47,896
  • 2
  • 28
  • 76
4

Not all physical phenomena are strictly discrete or continuous. Some are obvious, like number of children. But "continuous" has more to do with convenience and physical processes rather than a be-all-end-all data dictionary. For instance, weight can be rounded to the nearest tenth, or hundredth, down to the actual sensitivity of the instrument used to measure weight; it remains discrete in the sense that you can enumerate the possible values starting with 0.0, 0.1, 0.2, ..., but it's also continuous in the sense that theoretically it makes sense to consider weight as real valued. (As a 4th year maths student, you can appreciate that the rationals are dense in the reals, and those are countable.)

The designation (discrete vs. continuous) only matters when you assign a probability model to those values. Ideally, the choice of model is somewhat justified by the physical, random process giving rise to the values. For instance, if you assume the arrivals of cars over a segment of highway are of constant intensity, and independent in inter-arrival times, you can consider a Poisson counting process to estimate the probability of a certain volume of traffic at a particular time. Interestingly, Quetelet developed the BMI because the distribution looked normal, but modern biometricians suggest that we should have created a metric based on dividing by height cubed as it would represent a weight density, rather than surface area. Not withstanding, weight (or mass index) makes sense as a continuous value any way you cut it.

AdamO
  • 52,330
  • 5
  • 104
  • 209
0

Here is the most elementary definition I can think of.

If $X$ is a random variable then it has a "distribution function", also known as as CDF. If we let $F(t)$ denote the distribution function then by definition $F(t) = P(X\leq t)$. If this distribution function is continuous (as a function in ordinary calculus) then we say that $X$ is a continuous random variable.

We say that $X$ is a discrete random variable if the "support of $X$" consists of the integers. Recall the "support of a random variable" is the set of possible values the random variable may take. Let us denote the support by $\text{supp}(X)$. Therefore, we say that $X$ is a "discrete random variable" if $\text{supp}(X)\subseteq \mathbb{Z}$. Even more simply, we say $X$ is discrete if the value of $X$ is always some integer value.

Note, the above definition for "discrete" is not the most general definition, but as an initial definition it is good enough. The most important discrete random variables are integer valued anyway. Hence you do not need to be concerned about (yet) what it means to be a "countable set" at the moment.

If $X$ is a discrete random variable then its CDF will look like a staircase, the CDF will be piecewise constant.

Nicolas Bourbaki
  • 1,080
  • 9
  • 22
  • These definitions fail to meet the "completely accurate" requirement of the question. (1) Discrete variables need not be supported only on the integers (a limitation you note, but I would dispute your "good enough" characterization because there are simple, common discrete variables not supported on integers). (2) A suitable definition cannot refer just to the number of values a variable can take on, since that number can be changed by an uncountable infinity without changing its distribution. Discreteness is a property of the *distribution,* not the variable. – whuber Sep 15 '21 at 20:22
  • @whuber I am well familiar with all of this. However, the original poster clearly indicated that he is an undergraduate student. We need to focus on trying to find the simplest possible definition that will make sense to him, and inform him, that our definition is intended to be simplified. Take for example continuous random variables. If we taught undergraduates about absolute continuity it will confuse almost the entire class. Therefore, we intentionally hold back. There is nothing majorly wrong in my definition, and as a first definition, it will let one do undergraduate probability theory. – Nicolas Bourbaki Sep 16 '21 at 07:29