Why would you think that they should be normal OR even independent of each other Or have a constant error variance over time ? auto.arima does not perform tests of parameter significance so there is no need for distributional concerns unless one is concerned about confidence limits for forecasts , as we all should be .
auto.arima simply fits a set of presumed models and that is not modelling in the larger sense it is simply fitting and picks the best of the set that was tried.
That is not what Box & Jenkins had in mind while this is closer https://autobox.com/pdfs/ARIMA%20FLOW%20CHART.pdf reflecting an iterative self-checking process culminating in separating data to signal (the forecast) and noise (the random component).
The problem you are having is probably due to auto.arima not dealing well with the presence of the deterministic structure yielding large errors resulting in a skewed (non-normal) distribution .
"The correlogram should be calculated from residuals using a model that controls for intervention administration, otherwise the intervention effects are taken to be Gaussian noise, underestimating the actual autoregressive effect."
In other words for auto.arima to be useful you needed to have the following circumstances.
1) a series with no pulses,level shifts,seasonal pulses or deterministic time structure like trends et al .
2) a series where the parameters for the underlying arima model are constant over time
3) a series where the error variance of the underlying arima model does not change deterministically at different time points.
failing one or more of these assumptions is probably what caused your conclusion
Bye the way one doesn't willy-nilly use a power transform as there can be negative side-effects i.e. unexpected consequences . See When (and why) should you take the log of a distribution (of numbers)?