2

Dixon's Q statistic is the ratio of the "gap" between an outlier and the nearest value, over the range of the data. I would like to know is if this is ancillary to the parameters of the normal distribution.

I know that the denominator is ancillary to the normal mean $\mu$ and that by the same argument, so is the numerator. Is my reasoning correct that taking their ratio makes them ancillary to $\sigma$ as well?

Ben
  • 91,027
  • 3
  • 150
  • 376
Marj
  • 73
  • 5

1 Answers1

1

This statistic is indeed ancillary, but no, your reasoning is not correct.

The fact that both the numerator and denominator are ancillary for $\mu$ is not sufficient to ensure that the ratio is ancillary to $\sigma$. Notwithstanding this, it is quite simple to establish that the statistic is ancillary. As a general rule, if you have data from a normal distribution, then any location-scale-invariant statistic will be ancillary for the parameters. This simplest way to establish this result is via the following broader theorem.

Theorem: Let $X_1,...,X_n \sim p(\mu, \sigma)$ be generated independently from a parameterised family of distributions $p$ that is location-scale invariant. Then any statistic $S$ that is location-scale invariant is an ancillary statistic for the parameters $\mu$ and $\sigma$.


Proof: Since the distribution $p$ is location-scale invariant, there exists a random variable $Z_i \sim \phi$ with a fixed distribution $\phi$ such that $X_i \overset{\text{dist}}{=} \mu + \sigma \cdot Z_i$. Moreover, since $S$ is location-scale invariant, we have:

$$S(\mathbf{x}) = S(a \mathbf{x} + b \mathbf{1}) \quad \quad \quad \text{for all } a \neq 0 \text{ and } b \in \mathbb{R}.$$

Combining these two results gives:

$$\begin{equation} \begin{aligned} \mathbb{P}(S(\mathbf{X}) \in \mathcal{A}) = \mathbb{P}(S(\mu \mathbf{1} + \sigma \mathbf{Z}) \in \mathcal{A}) = \mathbb{P}(S(\mathbf{Z}) \in \mathcal{A}). \end{aligned} \end{equation}$$

Since this probability does not depend on $\mu$ or $\sigma$, the statistic $S$ is an ancillary statistic with respect to these two parameters.

Now, it is easily shown that the class of normal distributions is location-scale-invariant (i.e., the class is closed under shifts in location and changes in scale). Moreover, for the Dixon Q statistic can be shown to be location-scale invariant. To see this, suppose we have ordered data $x_{(1)}, ..., x_{(n)}$ and suppose further that the "gap" part of the statistic is the distance $x_{(k+1)} - x_{(k)}$ for some $k$. Then we have:

$$\begin{equation} \begin{aligned} Q(a \mathbf{x} + b \mathbf{1}) &= \frac{\text{gap}(a \mathbf{x} + b \mathbf{1})}{\text{range}(a \mathbf{x} + b \mathbf{1})} \\[6pt] &= \frac{a(x_{(k+1)} - x_{(k)})}{a (x_{(n)} - x_{(1)})} \\[6pt] &= \frac{x_{(k+1)} - x_{(k)}}{x_{(n)} - x_{(1)}} \\[6pt] &= Q(\mathbf{x}). \\[6pt] \end{aligned} \end{equation}$$

This establishes that the Dixon Q statistic is location-scale invariant, so the theorem ensures that it is an ancillary statistic for data taken from a normal distribution.

Ben
  • 91,027
  • 3
  • 150
  • 376
  • Thanks. The theorem helps. I did not mean to say that it is them being ancillary to $\mu$ that makes the ratio ancillary to $\sigma$ but that the ratio is ancillary to $\sigma$ because $\sigma$ is scaled out (I can add and subtract $\mu$ to both numerator and denominator and them divide the ratio by $\sigma/\sigma$ and that makes a function of standard normals. – Marj Nov 25 '19 at 00:54
  • Fair enough. You would need to be specific by what you mean by "scaled out" here, and that would lead you to the assertion of location-scale invariance of the statistic. – Ben Nov 25 '19 at 04:16
  • 1
    Yes, to which being ancillary follows from the theorem you stated. – Marj Nov 25 '19 at 15:33