1

Given a sample, to calculate the z-score we do $$Z= \frac{X-\mu}{\sigma}$$

Intuitively, the standard deviation is an indicator of the spread of the data / the amount of variation, and the numerator is the distance between the mean and the data point. As a proportion, how can this be explained intuitively? I know that the z-score is 'the number of standard deviations from the mean' but what is the intuition behind the proportion?

For example, when explaining $$R^2 = \frac{TSS - RSS}{TSS} $$ I explain that the $TSS$ is amount of variance inherent before regression, $RSS$ is the amount of variance unexplained after regression, so the distance is the amount explained by regression. Divide this by $TSS$ is $$R^2 = \frac{\text{amount of variance explained by regression}}{\text{inherent regression}}$$ and so it is clear to see that the $R^2$ statistic gives a proportion / percentage of amount of variance explained by regression.

How can I explain what $Z$ is in the same intuitive manner?

Edit: made question clearer

15150776
  • 11
  • 2
  • The z-score is the one-dimensional signed Mahalanobis distance, explained at https://stats.stackexchange.com/questions/62092. – whuber Dec 04 '20 at 14:20
  • See also https://math.stackexchange.com/questions/630721/explaining-the-concept-of-z-scores-in-high-school-statistics and my inserted recent comment there also. – AJKOER Dec 04 '20 at 15:13
  • The problem with the r squared analogy is that the z score can be negative. So you can't think of it like r squared. – BigBendRegion Dec 05 '20 at 11:18

2 Answers2

1

Broadly speaking, a z-score gives us an idea of how far from the mean a data point is (in the "standard deviation" scale).

First let me take another example. Suppose A has a balance of 100\$, and after one year of doing business, receives 1\$ interest. B has 5$ and after one year, receives 1\$ interest. They have the same absolute interest (1\$) but very different relative interests (1% vs 20%).

Why is standard deviation important? Let us look at the following figure,enter image description here

obtained from the R codes below. The red point has the same absolute distance from the mean but very different relative distances. In $N(0, 1)$, the point is nearly at the very right tail. For the other, the red point is still at the central part.

In other words, the z-score gives us an idea where the point is located in the distribution.

n=10^6
x1=rnorm(n, mean=0, sd=3)
x2=rnorm(n)

d1=density(x1)
d2=density(x2)
plot(d1, ylim=c(0, 0.5), xlab = "", main = "")
lines(d2)

points(2, 0, col="red", cex=0.5, pch=16)
TrungDung
  • 749
  • 4
  • 13
0

Intuitively you can think of the Z-score, as the number of standard deviations above the mean. I.e. how much larger (or smaller) is the X (in standard deviations) than the expected outcome.

We can start an intuitive derivation in the following way:

  1. We want to know how large X is.
  2. To do this, we compare it to what we expect: $X-\mu$
  3. But what units are X measured in? We want a score, which is comparable.
  4. By dividing by $\sigma$ we get it in terms of standard deviations: $Z = (X-\mu)/\sigma$