4

I have a doubt on the meaning of a conditioning within a definition.

In a book I've found the following definition of upper tolerance limit:

$P(P(X<\bar X+kS|\bar X, S)>p)=1-\alpha$

where $X$ is a random variable, $\bar X$ is the sample mean of values taken from the distribution of $X$ and $S$ is their standard deviation. $p$ and $\alpha$ are numbers between 0 and 1.

The question is:

Why do we need the condition on $\bar X$ and $S$, and what does this conditioning mean in this context?

I.e. What's wrong with just writing the definition as

$P(P(X<\bar X+kS)>p)=1-\alpha$

xanz
  • 449
  • 2
  • 11

1 Answers1

2

This is just an obfuscating way to express a simple idea: an upper tolerance limit is just an upper confidence limit for a percentile.

The random variable $X$ is supposed to have some definite (but unknown) distribution $F$. The statistic $\bar X + k S$ (derived from a random sample from $F$) is intended to estimate the $p^\text{th}$ percentile of $X$, $F^{-1}(p)$. That is, we hope that

$$F(\bar X + k S) = p.$$

Of course that won't be exactly true, because $\bar X + k S$ is random. An upper tolerance limit is a procedure intended not to underestimate $F^{-1}(p)$. The value of $\alpha$ is the chance you can tolerate of the procedure being wrong. In other words, you want it to overestimate its target at least $1-\alpha$ of the time. In many cases you can choose $k$ to assure this chance is exactly $1-\alpha$. Thus,

$${\Pr}_F(F(\bar X + k S) \ge p) = 1-\alpha\tag{1}$$

is the defining criterion for a "$1-\alpha$ confidence upper tolerance limit of coverage $p$." In English we could read it as

There is a $1-\alpha$ chance that the true percentile corresponding to the sample statistic $\bar X + k S$ will exceed $p$.

If you wanted to make expression $(1)$ look more complicated, you could unravel it using the definition of $F$; to wit,

$$F(z) = {\Pr}_F(X \le z)\tag{2}$$

for any real number $z$. Fixing $z = \bar X + kS$ for the moment and plugging it into $(2)$ would give

$$F(\bar X + k S) = {\Pr}_F(X \le \bar X + k S).$$

That's a mighty ambiguous expression, though, because $F$ determines the distribution of both $X$ (thought of as an abstract random variable in $(2)$) as well as the distribution of $\bar X + k S$ (because that is determined by a random sample from $F$). To make it clear we are talking in this context only of $X$ as the random variable, with $\bar X + k S$ being treated as a constant, we might write

$$F(\bar X + k S) = {\Pr}_F(X \le \bar X + k S\,|\, \bar X, S).$$

Plugging this into $(1)$ gives an expression like that in the book. (It differs only in that I have been more careful in distinguishing $\ge$ and $\gt$, but that is of no matter.)


References

A standard book is Hahn & Meeker, Statistical Intervals, A Guide to Practitioners (John Wiley & Sons, 1991). Here is its explanation:

The following characterizes a tolerance interval that one can claim contains a proportion $p$ of the population with $100(1-\alpha)\%$ confidence: "If one calculated such intervals from many independent groups of random samples, $100(1-\alpha)\%$ of the intervals would, in the long run, correctly include at least $100p\%$ of the population values..."

whuber
  • 281,159
  • 54
  • 637
  • 1,101
  • Actually $\bar X$ and $S$ are random variables and not constants... So this is basically just a "formal" way to express the concept that we are focusing our attention on the distribution of $X$ rather than on that of $\bar X+kS$ within the $Pr_F$ operator? – xanz Jun 11 '16 at 17:00
  • 1
    Actually before you take the sample $\bar X$ and $S$ *are* random variables. All the uncertainty lies in them. The confusing part of how your book defines a UTL is that "$X$" refers to an abstract construct and doesn't even need to appear in the definition $(1)$, which is based solely on $F$. – whuber Jun 11 '16 at 19:46
  • Ok, I just had a talk with a professor in stochastic mechanics who told me that the definition of the book (the one with the conditioning) is kind of "weird" and unclear for a matematician and thus not suitable to be published in a paper (exactly I didn't understand what he was complaining about)... Honestly to me this seems right (even if maybe a little bit confusing at first) but I'm no expert. What is your opinion on this? – xanz Jun 15 '16 at 14:37
  • 1
    The book's definition could be fixed up by first explaining that the author is considering the multivariate distribution of $(\bar X, S, X)$ where $X$ is independent of the $X_i$ from which $\bar X$ and $S$ are derived and has the same distribution as the $X_i$. I happen to think that the reference to $X$ is superfluous (as well as potentially confusing) and would recommend using a more traditional (simpler, clearer) definition similar to that given in my answer. If you would like a reference, please see Hahn & Meeker, *Statistical Intervals.* – whuber Jun 15 '16 at 14:42