Compare Z-scores (weight) before / after with a T-test?

Question

I'd like to ask you if it's possible to perform a T test to calculate Z-scores ( weight of persons ) https://www.cdc.gov/growthcharts/clinical_charts.htm

with a t test in the same group ( paired t test) ?

conditions :

1) N: 30

2)paired samples (same group tested before and after

3) Shapiro-Wilk normality test > 0.05 ( null hypothesis for nomality not rejected) ( and how to interpret it if it's borderline 0.051 ?)

the Z-score can have positive and negative values..

Thank you for your response

stefgehrig · Answer 1 · 2020-05-28T12:20:12.450

The use of a paired t-test seems appropriate in your situation, but note that the sample is rather small and, dependent on the size of the effect that you are expecting to show, might have low statistical power to detect it (i.e., risk of type II error).

Whether you run the t-test on z-scores of the outcome or on the outcome on the original scale will not matter for the result, so you might want to preserve the original scale for interpretation. In any case, you can apply a z-transformation to get your z-scores with mean = 0 and SD = 1, using a function like:

z_transform <- function(x) { (x - mean(x, na.rm=TRUE)) / sd(x, na.rm=TRUE) }

These z-scores can be interpreted as quantiles from a standard normal distribution. Note that the link you posted does not mention z-scores, so you might want to specify how the information under the link has to do with your question. Also "performing a T test to calculate Z-scores" is a bit cryptic, because the t-test itself does not transform your data.

Regarding hypothesis tests for distributional assumptions, like the Shapiro-Wilk that you used, opinions diverge. Using a sharp cutoff like 0.05 to determine in a binary fashion whether you have a normal distribution or not is a bit arbitrary, as becomes obvious in cases like p = 0.051, as you mentioned, or when the sample size is small and you run the risk of type II error. This might be a good use case for interpreting the p-value rather as a measure of evidence against the Null, see for example:

Amrhein, V., Korner-Nievergelt, F., & Roth, T. (2017). The earth is flat (p> 0.05): significance thresholds and the crisis of unreplicable research. PeerJ, 5, e3544.

If pays off to also look at histograms, and to think about the data generating process and which type of distribution you would expected. For measures like hight, weight or BMI, a normal distribution is very typical*, so you should be on the safe side.

*if you ask why this should be the case, there is a nice elaboration on gaussian processes in chapter 4.1. Why normal distributions are normal in

McElreath, R. (2020). Statistical rethinking: A Bayesian course with examples in R and Stan. CRC press.

Compare Z-scores (weight) before / after with a T-test?

1 Answers1