4

This question is inspired by Confidence Interval on a random quantity?. That question introduces an interesting concept for a type of interval that is neither a prediction nor a confidence interval (possibly one could see it as a tolerance interval although I believe it is neither that).


A frequentist interval estimate

In short: For pairs of (possibly multidimensional) variables $x_i,y_i$, which are both distributed according to a distribution parameterized by $a$, and where $x_i|a \not\!\perp\!\!\!\perp y_i|a$, we wish to perform interval estimation for the value of $x_i$ as function of $y_i$, where $a$ is unknown.

Given the following:

  • Let $X,Y$ be random variables that are paired.
  • The random variables $X$ and $Y$ follow a distribution function that is parameterized by $a$ $$f_{Y|a}(y|a) \equiv g_Y(y,a)$$ $$f_{X|a}(y|a) \equiv g_X(y,a)$$
  • There is a known relationship between $X$ and $Y$ and $a$, that defines a conditional distribution for $X$ $$f_{X|y,a}(x|y,a) \equiv h(x,y,a)$$
  • There is a sample of measured values $y_i$

We wish to compute:

for each $x_i$ a one-sided interval bound $c(y_i,\alpha)$ such that: $$\forall a : P(X<c(Y,\alpha)) = \alpha$$ or less strong $$\sup \lbrace P(X<c(Y,\alpha)):a \rbrace = \alpha$$

That is, probability in a frequentist sense. If we would have a large sample with pairs $x_i,y_i$ (where we only measure $y_i$ and do not know $a$) then the frequency/fraction of 'failures' of the interval, $x_i<c(y_i,\alpha)$, should be around $\alpha$ independent from the true value of $a$ (or the smallest upper bound is $\alpha$).


How do/should we call that sort of interval?

This is not a confidence interval, because the estimate is for $X$, which is not a (fixed) population parameter, but a random variable.

This is neither a prediction interval, because $c(y_i,\alpha)$ is only a region for the $x_i$ that is paired with $y_i$ and it is not a region for future values of $X$.

What is it?


Example case problems

  • (this one was mentioned by shabbychef in the comments and relates to the before mentioned question)

    You observe returns from $p$ stocks in vector $\vec{y}_i$. Then from a sample of $n$ such observations, you form the Markowitz Portfolio, based on the sample mean and covariance. Then you wish to estimate the Sharpe Ratio of that sample Markowitz Portfolio.

  • Say I have a batch of films for which I want to predict the strength $X$ of each film. Let the strength be a function of two parameters, say film thickness $Y$ and film density $a$.

    Say I can not measure $X$ directly (would damage the film), and I do not know $a$ for every film, nor do I wish to measure it (say it is a costly measurement). I can, however, measure $Y$ for each film and I know that $Y$ is distributed according to some pdf that is parameterized by $a$.

    So now the idea is to use measurements of film thickness $Y$, which carries information of $a$ to compute some confidence/prediction/tolerance/whatever interval for $X$ which I know depends on $Y$ and $a$. I want this interval to fail only $\alpha$ percent of the time.

Sextus Empiricus
  • 43,080
  • 1
  • 72
  • 161
  • I think it's not useful to make the stipulation in the fourth (final) bullet, due to the dependence on the unknown parameter $a.$ You need to consider either the supremum or the infimum of the left hand side over the set of posited distributions of $Y,$ depending on your objective. – whuber Jan 28 '19 at 22:37
  • I agree. That is what I did in my answer [here](https://stats.stackexchange.com/a/388904/164061). Beyond that one may wonder whether there ain't better approaches for the problem in practice (but that is beyond the point of the question which is about the principle). – Sextus Empiricus Jan 28 '19 at 22:48
  • 1
    Another example would be: you observe returns from $p$ stocks in vector $\vec{y_i}$. Then from a sample of $n$ such observations, you form the Markowitz Portfolio, based on the sample mean and covariance. Then you wish to estimate the Sharpe Ratio of that sample Markowitz Portfolio. – shabbychef Jan 29 '19 at 05:42

1 Answers1

1

We could describe the distribution of $Y$, conditional on $X$ and $a$, as a distribution parameterized by $X$ and $a$:

$$f_{Y|x,a}(y,x,a) = \frac{f_{X|y,a}(x,y,a)f_{Y,a}(y,a)}{f_{X,a}(x,a)}$$

In this view the random variable $X$ is a parameter in the (conditional) distribution of $Y$, and we could see the interval estimation of $X$ as a confidence interval for the parameter $X$.

Complications are that the estimate of $X$ is dependent on the value of the parameter $a$ which acts as a nuisance parameter, and in addition $X$ itselve is distributed according to distribution parameterized by $a$. So one may not tackle the interval estimation as a 'regular' confidence interval estimation.

Sextus Empiricus
  • 43,080
  • 1
  • 72
  • 161