Yes, of course. ARIMA models are no different than any other model. The workflow is always to first split your data into a training and a testing sample (for time series data, you of course always use the last observations for the test), then fit the model to the training data, then evaluate predictions on the test set. In-sample measures of fit are almost meaningless.
And of course the model fitting step also includes determining the ARIMA orders, which would therefore be done based on the training data only. Just as in fitting an OLS model, we would determine any transformations or interactions needed based on the training data, not the entire dataset. This is standard practice by (sorry) real forecasters, see any issue of the International Journal of Forecasting.
Incidentally, the procedure outlined in that tutorial for determining the AR and MA orders is iffy. ACF/PACF plots can only be used in this way for "pure" AR(p) or MA(q) models. In any case, one nowadays uses a search over possible models based on information criteria, rather than the earlier Box-Jenkins approach. This is implemented in the forecast
and fable
packages for R. I recommend Forecasting: Principles and Practice (2nd ed.) by Athanasopoulos & Hyndman
and
Forecasting: Principles and Practice (3rd ed.) by Athanasopoulos & Hyndman.