5

Can a statistic depend on a parameter?

By definition, a statistic $T(\mathbf{X})$ is a function that depends on the r.v. taken from a population. In Berger's 'Statistical Inference', in the paragraph immediately below the definition of statistics, it's stated that a statistic cannot depend on parameters. In wiki, it's unknown parameters.

However, does not the t-statistic depend on a parameter?

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
An old man in the sea.
  • 5,070
  • 3
  • 23
  • 57
  • 2
    In what sense do you understand a t-statistic as depending on a parameter? Let's be concrete about this: I have a dataset of numbers $2,3,4$ and I wish to use a t-test to compare its mean to the value $5$. Could you please indicate precisely where in the formula for the t-statistic a parameter appears? – whuber Mar 24 '16 at 18:28
  • @whuber From Casella and Berger, parameters are variables whose different values will result in different distributions. Well, in my perspective, I would say the t-stat depends on a variable, not observed, that can alter the distribution of t-stat, when evaluated at a value other than the one under the null... – An old man in the sea. Mar 24 '16 at 19:47
  • 4
    Please be specific: *exactly what parameter* of the population are you referring to? I'm afraid I cannot detect any part of the formula for a t statistic that involves any property of the population: all parts of it refer directly to the *sample* and the *hypothesis*: the sample mean, the sample SD, the sample size, and the hypothesized value. – whuber Mar 24 '16 at 19:53
  • 1
    @whuber $t_{\mu_0}(\mathbf{X})=\frac{\bar X - \mu_0}{s(\mathbf{X})}$ I'm referring to $\mu_0$ – An old man in the sea. Mar 24 '16 at 20:33
  • $\mu_0$ is not a property of the population: it has no role at all in generating $\bar X$. Therefore it is not a parameter. The acid test of a parameter is this: if, no matter what value you give to it, the (theoretical underlying) distribution of the *data* is unchanged, then it is not a parameter. – whuber Mar 24 '16 at 20:33
  • @whuber I think I get it. $\mu_0$ is the value $H_0$ states for the $\mu$ that governs the population. $\mu_0$ 's value may not be equal to the value of the parameter($\mu$) of the DGP. correct? – An old man in the sea. Mar 24 '16 at 20:42
  • 2
    That sounds like a fair description. I think it's close to what @dsaxton writes in an answer in referring to $\mu_0$ as a "hypothesized value" of $\mu$. It might become even clearer when you consider a one-sided t-test, where the "hypothesized value" isn't a value at all: it's an entire (half-infinite) interval, either $(-\infty,\mu_0]$ or $[\mu_0, \infty)$. Technically, the null hypothesis is a *set of distributions.* This might reinforce the conceptual distinction between it and *the particular distribution* that actually governs the data. – whuber Mar 24 '16 at 20:54

2 Answers2

8

A statistic cannot be a function of unknown parameters by definition. In the case of the $t$ test our test statistic takes the form

$$ \frac{\sqrt{n}(\bar{x} - \mu_0)}{s} $$

where $\mu_0$ is the hypothesized value for the unknown mean. That is, the $t$ statistic is a function of the data and the particular hypothesis we happen to be testing (which of course is known), and is not a function of any unknown parameters.

dsaxton
  • 11,397
  • 1
  • 23
  • 45
  • This could be confusing because $\mu_0$ is not normally considered a parameter of the distribution of $x$. The distinction between "hypothesized value" of a parameter and the value of the parameter itself may be less clear than you intended. – whuber Mar 24 '16 at 18:31
  • 1
    Thanks for your answer dsxaton. what's your definition for parameter? also, what do you mean for hypothesized value? – An old man in the sea. Mar 24 '16 at 20:45
  • dsxaton would you like to complete your answer with some of whuber info on his comments? That way, I can accept your answer – An old man in the sea. Mar 24 '16 at 22:07
  • I tried to reword my answer, is that a bit more clear? – dsaxton Mar 25 '16 at 01:49
  • 1
    From the comments of @whuber I think that point was more that $\Theta_0$ (values under the null) are not the values of the population, which, granted are not known, cannot change the distribution of the population. – An old man in the sea. Mar 25 '16 at 20:58
0

A test statistic is a function of observable random variables whose distribution does not depend on any unknown parameters. For example, if n is large enough, then the central limit theorem says that the normal distribution with mean zero and variance one is approximately valid for the test statistic: $$ T=\frac{\bar{X}_n-\mu}{\sigma/\sqrt{n}}, $$ Clearly the test statistic involves unknown parameters. Generally, the inference question in this setting is to test whether or not the population mean, $\mu$, is equal to some value, say $\mu_0$, where $\mu_0$ is known (the test will decide whether or not it is really $\mu_0$). The standard error, $\sigma$, must be estimated. But, the null distribution of the test statistic is N(0,1), which, importantly, does not have any unknown parameters.

Many authors consider significance testing to be the same as hypothesis testing, which perhaps leads to confusion on this point. In hypothesis testing, the size of the test is determined a priori, which means the distribution of the test statistic must be estimable a priori, and hence must not have any unknown parameters. That is, before obtaining data and estimating $\bar{X}_n$ and $\mbox{se}(\bar{X}_n)=\sigma/\sqrt{n}$, the size of the test should be calculable. Here, the size of the test is the probability of making a type I error. More precisely, it is the supremum of the power of the test under the null hypothesis; where the power is the probability of rejecting the null hypothesis under a given parameter(s).

In significance testing, a p-value is determined a posteriori. The p-value is the probability of observing a test statistic "at least as large" as the one observed, based on a null distribution. It was not intended to be used in a hypothesis-test setting. One problem with doing so (e.g., rejecting the null hypothesis if the p-value is < alpha) is that there are different ways to calculate the p-value that can change the result depending on the type of test and the experiment conducted. See Goodman (1999, Ann Intern Med, vol. 130, pp. 995 - 1004) for a good discussion about the differences between the two testing procedures. Also, see the ASA's statement on p-values (2016, https://doi.org/10.1080/00031305.2016.1154108).

In the p-value/significance testing setting, it maybe is not important to have a sample statistic (i.e., $\bar{X}_n$ is a sample statistic because it is a function of observable random variables) have a distribution that is free of unknown parameters because it is calculated after observing the data without controlling for the size of the test.

In summary, a statistic like $\bar{X}_n$ is a sample statistic. Strictly speaking, it is not a test statistic because its distribution, say $N(\mu,\sigma^2)$, depends on unknown parameters. The size, and power, of a hypothesis cannot be regulated a priori with these nuisance parameters. But, authors who consider the two types of testing to be the same maybe do not worry about controlling for the size of the test. In their setting, a sample statistic would be the same as a test statistic. The statistic $(\bar{X}_n-\mu)/\mbox{se}(\bar{X}_n)$ is a test statistic because its distribution, $N(0,1)$, does not depend on unknown parameters, they are zero and one, resp.