Why I get high observation when I generate data from t-distribution in R

Question

I want to generate 200 samples from t-distribution with the degree of freedom=1 and sample size is 10 and in R

I use this code

set.seed(1234)
B <- matrix(rt(10*200, 1), 200)

But when I see the sample number 167 (B[167,]) I found this high number 602.1691029

And this strange thing in the t-distribution. What is wrong here?

The support of the t-distribution is `-Infinity` to `Infinity`, even though is rare is it possible to generate very high ou very low numbers — Vinícius Félix, Sep 26 '21 at 19:01
See `set.seed(667); matrix(rt(10*200, 1), 200)[90,]` (it includes `-636967`). — r2evans, Sep 26 '21 at 19:13
... or `set.seed(9586); matrix(rt(10*200, 1), 200)[150,]` (`-2264546`). — r2evans, Sep 26 '21 at 19:14
Bottom line, *nothing is wrong*. This is how the student's t distribution (among many) works: while a low-probability, it is definitely feasible to get really high (positive) or really low (negative) numbers with a small sample such as this. — r2evans, Sep 26 '21 at 19:16
Further context: while many things claim that something is Normally distributed (e.g., [human height](https://ourworldindata.org/human-height)), it's rarely in fact perfectly true. If that were true, then: (1) height could be negative; (2) height could be zero; (3) height could be infinite; (4) humans could be 1mm tall; etc. While there is claim to a [16ft tall human](https://www.guinnessworldrecords.com/records/hall-of-fame/robert-wadlow-tallest-man-ever), I hardly expect a 200ft tall human. Perhaps it should be "truncated-Normal"? — r2evans, Sep 26 '21 at 21:55
Have a look at [What is the difference between finite and infinite variance](https://stats.stackexchange.com/questions/94402/what-is-the-difference-between-finite-and-infinite-variance/100161#100161) — kjetil b halvorsen, Sep 27 '21 at 00:50

Glen_b · Answer 1 · 2021-09-27T00:24:19.857

4

And this strange thing in the t-distribution.

No, it isn't, not with $1$ degree of freedom.

The tails in the Cauchy are so heavy even its mean is undefined (not finite). Very, very large deviations happen reasonably often -- the more values you generate the bigger the largest-magnitude value will tend to be; indeed with the Cauchy it grows roughly linearly with sample size (e.g. $\text{median}({\max_i}(|X_{i}|))$ increases approximately in proportion to $n$; with $2000$ standard Cauchy values the median of the distribution of the largest-magnitude one is over $1800$ and the median of the distribution of the second-largest-magnitude observation is over $750$).

Note that $P(|X|\geq 602)\approx 0.001$. If you generate $2000$ of them you expect roughly about $2$ of those observations to be at least that large in magnitude.

Rather than being surprised to see one of that size, you would often see even larger ones.

What is wrong here?

Nothing, this is typical. You might like to read more about the Cauchy and other t distributions with low d.f.

https://en.wikipedia.org/wiki/Cauchy_distribution

https://en.wikipedia.org/wiki/Student's_t-distribution

A number of posts on site here discuss interesting properties of the Cauchy ($t_1$) distribution.

edited Sep 27 '21 at 00:24

answered Sep 26 '21 at 22:32

Glen_b

257,508
32
553
939

+1 This is an important rejoinder to the comments, all of which (at present) suggest $602$ is *possible* (or "feasible") but *unusual.* As you show, absolute values this large are to be *expected* in the sample. – whuber Sep 27 '21 at 13:15
(+1). Indeed, the median of the maximum of $n$ absolute values of a standard cauchy rv is $\tan(2^{-1-1/n}\pi)$ according to Mathematica. – COOLSerdash Sep 27 '21 at 13:48
Thanks. It should be possible to derive the exact value pretty directly from the transformed median of a beta (specifically one with second-shape parameter $\beta=1$ -- so it should be fairly doable once you account correctly for the $|X|$ part), but I figured the exact value was not needed. – Glen_b Sep 27 '21 at 16:28
1

If I recall correctly from the uniform distribution the median of the largest order statistic comes out to $\frac12^{1/n}$, so you would just take $F^{-1}$ of that. Which looks to correspond. That it then comes out almost proportional to $n$ isn't then hard to see. – Glen_b Sep 27 '21 at 16:44

score 1 · Answer 2 · answered Sep 26 '21 at 19:14

1

Number 135 is even larger. I guess the problem comes from the fact you're using df = 1, which looks odd. Because the uncertainty is just too large with one observation..

You won't get these large numbers starting at e.g. df = 5.

answered Sep 26 '21 at 19:14

F. Privé

231
1
7

7

You could expand on this. `df=1` represents a **Cauchy distribution**, which has an infinite mean (and thus this behaviour is entirely expected). – Ben Bolker Sep 26 '21 at 19:15
2

@BenBolker "infinite" or just "undefined"? – r2evans Sep 26 '21 at 19:18
Technically, undefined, I guess. It's all a mess. It would be nice to describe the Cauchy as having an "infinite variance" (which, naively, is it how it seems to behave), but that doesn't make sense since the variance is the variation around the (undefined) mean ... – Ben Bolker Sep 26 '21 at 19:25
2

@Ben An accurate and useful characterization is that the Cauchy distribution has an infinite *absolute* first moment. Therein lies the source of the difficulties. The variance need not be defined as variation around a mean, btw: it can be expressed in terms of the expectation of $(X-Y)^2$ where $(X,Y)$ are *iid.* However, when the mean is undefined, perforce the variance will be infinite. – whuber Sep 28 '21 at 13:29

Why I get high observation when I generate data from t-distribution in R

2 Answers2