For an intuition about why $\beta_0 + \beta_1 X_1$ is a $z$-value I find it easier to look at the latent variable interpretation of the Probit model.
$Y^\ast = X^T\beta + \varepsilon$
where $\varepsilon \sim N(0, 1)$ is the error and
$Y = \left.\begin{cases} 1 & Y^* > 0 \\
0 &\text{otherwise} \end{cases} \right\} = \begin{cases} 1 & - \varepsilon < \beta_0 + \beta_1 X_1, \\
0 &\text{otherwise}. \end{cases}$
I.e. it follows from the assumption that the error $\varepsilon$ has a standard normal distribution. This assumption works because any error mean can be moved to 0 just by adding a constant to the intercept $\beta_0$ and any error standard deviation can be moved to 1 just by multiplying $\beta_1$ with a constant, i.e. we can easily standardize by selecting $\beta_0$ such that the mean in 0 and selecting $\beta_1$ such that the standard deviation is 1. Actually, an arbitrary normal distribution could be used but as we're not losing any generality by assuming the standard normal (as we can always shift the distribution to the mean and standard deviation we want, as above) we stick with the standard normal.
When plotting, we're usually not interested in the standardized $X_1$ value ($z$) but rather the value on the original scale. Also, $\beta_0$ and $\beta_1$ are just location parameters, and thus are not interesting when plotting, the curve will always look the same and the interpretation will, as a probability, always lie on the [0, 1] bounded scale of $\Phi$. Another way to look at it is that when we plot we shift the model's error (standard normal) back into to the data error (some other normal). Mainly though, the plotting on the $X_1$ scale is convenience driven, to make interpretation easier.