Wouldn't a better scaling factor be with the MAE produced by a naive forecast on the test data itself?
When evaluating MASE for the training set, this essentially becomes a comparison for the forecast model with a naive one, why do we not take this approach with the test set?