If you have a process $y_t$ that you assumed can be modelled with an $ARIMA(p,d,q)$, then it makes no difference if you first transform your data using $Ly_t=y_{t-1}$, $(1-L)y_t=\delta y_t$ and then fit an ARMA model or to use some routine to directly fit the appropriate $ARIMA(p,1,q)$ model.
The only difference from a computational perspective is that if your estimation procedure makes use of some statistical tools to determine automatically the order of integration d (the number of times to difference your data until it is stationary), then you can encounter following scenarios:
- You have differenced your data although the automatic routine would not choose to do so, in this case you have made the right call if you reject the proposed order of integration by the software. Otherwise you overdifferenced and will lose information as the relationship in levels can be exploited.
- You have not differenced the data enough such that fitting the ARMA instead of ARIMA will make a difference because automatic order determination will difference one more time. In this case, certain asymptotic results break down since your model may be non-stationary. Model estimation will still be consistent though.
- You difference exactly the number of times suggested by the software, in this case there is absolutely no difference for ordinary software packages, as the order of integration is not estimated together with the AR and MA components but usually determined by some sort of unit root test before the actual estimation happens.
Concluding, the pros and cons of differencing before or letting e.g. an R function do it automatically depend on your confidence that either you manually or the routine can make a better judgement given the data you try to model.