1

Imagine that I am studying racial discrimination. In particular, whether a white candidate (for a job) is chosen more often than an equally qualified minority candidate.

I ask people to choose between a white candidate and an equally qualified minority candidate. Suppose that white candidates are chosen 54% of the time, and minority candidate is chosen 46% of the time. Should the hypothesis test for whether white people are chosen as often as black people be

$H_0: \pi = 0.50$ or $H_0: \pi = 0.46$?

wwl
  • 668
  • 1
  • 6
  • 17

2 Answers2

2

I refer to What follows if we fail to reject the null hypothesis? to explain that, in hypothesis testing, the objective is to show that your data 'rejects' $H_0$ and 'supports $H_1$.

You want to 'show' that '... whether a white candidate (for a job) is chosen more often than an equally qualified minority candidate'.

So if $\pi$ is the fraction of white candidates that are chosen then $H_1: \pi > 0.5$ versus $H_0: \pi = 0.5$.

This is done before you have seen the data. This was also argued by @Greg Snow

If you draw a random sample of size $n$ ($n$ large enough), then for each sample $s$ you observe a fraction $p_s$ in that sample. Obviously, in another sample you will observe another $p_s$ so this 'sample fraction' changes from sample to sample and is therefore a random variable. If $n$ is large enough, then this random variable 'sample fraction' will be normally distributed with mean the population fraction $\pi$ and standard deviation $\sqrt{\pi(1-\pi)/n}$.

So if you choose a significance level (e.g.) $\alpha=0.05$ then you find that $P(p_s \ge \pi + 1.645 \sqrt{\pi(1-\pi)/n})=0.05$. So the rejection region with $\pi=0.5$ (if $H_0$ is true) and e.g. $n=100$ would be $p_s \ge 0.5+1.645\sqrt{0.25/100}$ or $p_s \ge 0.58$.

All this reasoning is done without looking at the data, you only have to fix $\alpha$ and the sample size $n$.

Only in the final step you look at your specific sample and in your your sample the fraction of white that are chosen is 0.54. Si the value of $p_s$ for your sample ($\bar{p}_s=0.54$) is not in the critical region ($p_s \ge 0.58$).

Note that this decision depends on the sample size and on the $\alpha$.

So it is very important to distinguish between the population fraction $\pi$ which is the one you use in your hypothesis and the outcome of your specific sample fraction $\bar{p}_s$. It is also important to distinguish the fraction of your sample $\bar{p}_s$ (which is only one number, namely the fraction in the sample that you have) from the outcome of any possible sample of the same size , this latter is a random variable $p_s$ because its value depends on the sample, so $p_s$ is not one value but a random variable (thus a distribution).

1

In practice the null hypothesis should be chosen before looking at the data (often before collecting any data). So if the 46% is representative of the data that you are using for the test, then it is not appropriate for the null hypothesis. On the other hand, if 46% is a historical value and you want to see if new data indicates that a change has occurred vs. the status quo being maintained, then the null comparing to 0.46 is appropriate.

Greg Snow
  • 46,563
  • 2
  • 90
  • 159