Consider a simple hypothesis test concerning the mean of a single sample.
- If the sample is normally distributed and the variance is known, the exact distribution of the sample mean is known ($N$~$(\mu,\sigma/\sqrt{n}$) (central limit theorem)
- When the variance is unknown and the random variable $\bar{X}$ still comes from a normal distribution, we can estimate it by using the sample variance $S^2$. In this case, the statistic $T = \frac{\bar{x}-\mu_0}{s/\sqrt{n}}$ follows a t-distribution. (normal distribution divided by a chi-squared distribution).
- When the variance is unknown, the random variable $\bar{X}$ still comes from a normal distribution and the sample size is large, the sample variance $S^2 \approx \sigma^2$. Therefore, we could again use the z-score and a standard normal distribution. (central limit theorem)
What are your options when the sample size is quite small, the variance is unknown and the distribution of $\bar{X}$ is unknown? My text book explicitly states that the t-distribution is only valid when the random variable has a normal distribution, yet in a number of examples the t-distribution is used when the distribution is unknown! In addition, some problems with a large sample size ($>30$ according to their rule of thumb) are solved with a t-distribution, rather than the normal one. Right after they explained that for a large sample size, the normal distribution is appropriate.
There is also a section devoted to the fact that the t-distribution is quite robust to the normality assumption. They also hint at non-parametric methods as an alternative.
Is there anything I'm missing? Or perhaps a flaw in my reasoning?
I found a helpful thread (When is the distribution of $(\overline{x}-\mu)/{\rm SE}(\overline{x})$ normal and when is it $t$?), but unfortunately it does not address the main problem I'm facing (small sample, unknown distribution).