Pivotal quantities, test statistics and hypothesis tests

Question

We are learning pivot functions, test statistics, and hypothesis testing at university but it makes no sense. I've tried reading my text book/notes, going through examples, etc., but the concepts seem like a random guess and I'm clueless about how to even start guessing what the answer could be.

1st part

Can you please explain how to calculate a pivot function? E.g $X_{1},\dots,X_{n} \sim N(\mu, \sigma^2)$. Pivot function for $\sigma^2$ when $\mu$ is known and when $\mu$ is unknown. Why does $\mu$ being un/known matter?

Also how would you calculate the pivot function for the ratio of two variances ($\sigma_{x}^2$ and $\sigma_{y}^2$)? Is it an F distribution? Assume $\mu_x$ and $\mu_y$ are known and $X_1,\dots,X_n \sim N(\mu_x,\sigma_x^2)$ and $Y_1,\dots,Y_n \sim N(\mu_y, \sigma_y^2)$.

2nd part

Can you please explain how to calculate a test statistic (I get how to show it's a test statistic but don't know how to form one from scratch).

Last, I have a few questions about hypothesis testing. I don't really understand how to calculate the power of a test or even what it means, to be honest. There is a whole bunch of theory and many definitions out there but they are rather abstract so I don't get it at all... I don't understand the notation or how to calculate the size/power of a test (generic form - not just with numbers).

Example: $X \sim N(\mu, \sigma^2)$. $H_0: \mu >= \mu_0$ and $H_1: \mu < \mu_0$. $\mu$ and $\sigma$ unknown. Calculate the power and size of this test. How do I even start? I'm so confused. :(

I'm really stuck with all of this and I hope you can help me! :) if there's a better resource out there to help please do let me know.

EDIT1:

Thanks for your reply.

I did ask my lecturer to clarify… but ended up even more confused. He agrees the notes are unclear but will not rectify them because everyone else seems to get them! :( I also went to my class teacher and read through the Statistical Inference chapter several times – I get the basics but still don’t really understand most of it. I have googled around – and read Wikipedia – but it’s just more and more theory with no step by step examples explaining what to do. Everything seems randomly chosen and guesswork and hence my massive confusion.

One thing though – I can’t read this: $$T_{X}=\sum_{i=1}^{N}\Big(\frac{X_{i}-\mu_{X}}{\sigma{X}}\Big)^{2} \sim \chi^{2}(N)$$

Is it supposed to be in mathematical notation? How do I view it properly?

Yes- you are correct – the pivot function is used to calculate the confidence interval. The thing is – once I have the pivot it’s quite straightforward to calculate the CI. But it’s the pivot that’s causing the problems.

I still don’t get the following: Pivot function for sigma^2 when mu is known and when mu is unknown. Why does mu being un/known matter? How would you calculate the pivot function for the ratio of 2 variances (sigmax^2 and sigmay^2)? Is it an F distribution? Assume mu x and mu y are known and X1....Xn - N(mu x,sigmax^2) and Y1...Yn -N(mu y, sigmay^2). Is it 1/Fn-1, m-1 = Fm-1,n-1? And the hypothesis testing questions above please…. Can you shed some light on this please?

EDIT2:

I did ask my lecturer to clarify… but ended up even more confused. He agrees the notes are unclear but will not rectify them because everyone else seems to get them! :( I also went to my class teacher and read through the Statistical Inference chapter several times – I get the basics but still don’t really understand most of it. I have googled around – and read Wikipedia – but it’s just more and more theory with no step by step examples explaining what to do. Everything seems randomly chosen and guesswork and hence my massive confusion.

One thing though – I can’t read this: $$T_{X}=\sum_{i=1}^{N}\Big(\frac{X_{i}-\mu_{X}}{\sigma{X}}\Big)^{2} \sim \chi^{2}(N)$$

Is it supposed to be in mathematical notation? How do I view it properly?

Yes- you are correct – the pivot function is used to calculate the confidence interval. The thing is – once I have the pivot it’s quite straightforward to calculate the CI. But it’s the pivot that’s causing the problems.

I still don’t get the following: Pivot function for sigma^2 when mu is known and when mu is unknown. Why does mu being un/known matter? How would you calculate the pivot function for the ratio of 2 variances (sigmax^2 and sigmay^2)? Is it an F distribution? Assume mu x and mu y are known and X1....Xn - N(mu x,sigmax^2) and Y1...Yn -N(mu y, sigmay^2). Is it 1/Fn-1, m-1 = Fm-1,n-1? And the hypothesis testing questions above please…. Can you shed some light on this please?

There are good questions in here but you're asking too much at once. Focus on one issue at a time and formulate a specific question for it. Specify what you do know and what efforts you have made towards answering the question. "How do I even start" doesn't give us anything to go on--we would just refer you to a textbook or Wikipedia (which is not a bad idea, by the way). — whuber, Feb 28 '11 at 22:32
@Whuber - this is a bit of a cop-out, @LSEactuary is not asking for the answer to everything. You could easily just provide an answer to one piece of his question. — probabilityislogic, Mar 01 '11 at 08:55
@Probability You don't need to attack me. Let's be constructive. The original post shares several characteristics of questions the FAQ specifies should *not* be asked, including "there is no actual problem to be solved" and "open-ended, hypothetical question[s]." In its original formulation at least four distinct questions were posed, including the hugely general "how to calculate a test statistic". This needed focusing to achieve appropriate, effective answers. — whuber, Mar 01 '11 at 16:46
@LSEactuary There are too much questions here, and I would suggest to split your initial text in (at least) two parts, as suggested by my edit. The 2nd part deserves to be formulated in a new question (IMHO, but see also @whuber's comment), since only pivotal quantities have been addressed in this thread. — chl, Mar 02 '11 at 13:10
@whuber - I did not mean my comment as an "attack" but I could have phrased it a bit better, in hindsight. I apologise for my poor choice of words in that respect. What I meant was that the comment you have written would have been much better placed as the start of an answer, in which you focus your response on one particular aspect of the question. The "how do I even start?" kind of question is a very good one I think, because this is often the kind of situation many statisticians actually face (I know I have). — probabilityislogic, Mar 06 '11 at 12:09
@probability Thank you for clarifying. You make a good point about the "how do I start" question. It seems to me, though, that there is a fundamental difference between the student and the statistician: the student is supposed to have the the resources at hand to tackle the problem, which is supposed to have a definite answer. "Resources" = teacher, textbook, etc. At a minimum, then, the student should be able to describe their initial efforts and articulate the obstacles they are encountering. If not, what are we supposed to do in response? Write a textbook? — whuber, Mar 06 '11 at 16:10
@whuber - I would suggest that "vague" questions can be given either "vague" answers, or they can specialise to a particular case - noting how it is linked to the "vaguary". But as you say, one cannot be expected to give the answer to everything. You could specialise the "how do I start?" question by giving a specific example, and saying how you would start in that particular case. I had a go in my answer with the notion of "standardising" a distribution. Presumably there are other ways to "start". — probabilityislogic, Mar 08 '11 at 07:09
@probabiliy All the more power to you! The vaguer the question, though, the greater the likelihood that all your good work will be of no interest to the OP because you didn't read their mind correctly ;-). That can be frustrating... — whuber, Mar 08 '11 at 07:12

probabilityislogic · Answer 1 · 2011-03-06T12:13:16.900

The first thing you should do is challenge your lecturer to explain these things clearly. If anything whatsoever seems counter-intuitive or backwards, them demand that he/she explains why it is intuitive. Statistics always makes sense if you think about it in the "right" way.

Calculating pivotal quantities is a very tricky business - I completely understand your bewilderment in "where should I start?"

For normal variance parameters, The "pivotal quantity" is the sum of squares divided by the variance parameters:

$$T_{X}=\sum_{i=1}^{N}\Big(\frac{X_{i}-\mu_{X}}{\sigma{X}}\Big)^{2} \sim \chi^{2}(N)$$

And a similar expression for $T_{Y}$. Note that the distribution only depends on $N$, which is known (if $\mu_{X}$ is unknown, replace by $\overline{X}$ and you lose one degree of freedom in the chi-square distribution). Thus $\frac{T_{X}}{T_{Y}}$ is a pivotal quantity, which has a value of:

$$\frac{\sum_{i=1}^{N}\Big(\frac{X_{i}-\mu_{X}}{\sigma{X}}\Big)^{2}}{\sum_{i=1}^{N}\Big(\frac{Y_{i}-\mu_{Y}}{\sigma{Y}}\Big)^{2}} $$

Note that because it is a pivotal quantity, we can create an exact confidence interval using the pivot as a starting point, and then substituting in our statistic. Now because the degrees of freedom are the same for each chi-square, we do indeed have an F distribution. So you can write:

$$1-\alpha=Pr(L < F < U)=Pr(L < \frac{T_{X}}{T_{Y}} < U)$$ $$1-\alpha=Pr(L < \frac{\sum_{i=1}^{N}\Big(\frac{X_{i}-\mu_{X}}{\sigma{X}}\Big)^{2}}{\sum_{i=1}^{N}\Big(\frac{Y_{i}-\mu_{Y}}{\sigma{Y}}\Big)^{2}} < U)$$ $$1-\alpha=Pr(L < \frac{\sigma_{Y}^{2}}{\sigma_{X}^{2}}\frac{\sum_{i=1}^{N}(X_{i}-\mu_{X})^{2}}{\sum_{i=1}^{N}(Y_{i}-\mu_{Y})^{2}} < U)$$

Writing the observed ratio of the sum of squares as $R$ we get:

$$1-\alpha=Pr(L < \frac{\sigma_{Y}^{2}}{\sigma_{X}^{2}} R < U)$$ $$1-\alpha=Pr(\frac{L}{R} < \frac{\sigma_{Y}^{2}}{\sigma_{X}^{2}} < \frac{U}{R})$$

As for how this solution comes about, I have absolutely no idea. What "principles" were followed (apart from being good at re-arranging statistical expression)?

One thing that I can think of is that you need to find some way to "standardise" your sampling distribution. So for example, normals you subtract mean and divide by standard deviation. For gamma you multiply by the scale parameter. I don't know many pivotal quantities that exist outside of the normal and gamma families.

I think this is one reason why ordinary "sampling statistics" is an art more than a sciecne, because you have to use your intuition about what statistics to try. And then you have to try and figure out if you can standardise your data.

I am almost certain your lecturer will bring up the subject of confidence intervals - be sure to ask him/her what you should do when you only have one sample, or when you have 2 or more nuisance parameters. :)

you might be cautious with your use of terminology, especially when addressing someone new to stats. For example, pivotal quantities are **not** statistics, so using the phrase "pivot statistics" can be (very) misleading. The OP uses "pivot functions", which strikes me as a little unconventional, but certainly more correct. Sorry if this seems pedantic of me. Cheers. — cardinal, Mar 01 '11 at 14:01
@cardinal - thanks for the comment. I have updated my answer accordingly. — probabilityislogic, Mar 06 '11 at 12:10

Pivotal quantities, test statistics and hypothesis tests

1st part

2nd part

1 Answers1