There is a trend in machine learning implementations to make things easier and easier for implementers, a very natural engineering concern. Easy APIs to create any kind of model you want, easy infrastructure to manage versions of data and models, easy deployment of models as APIs. One of these trends is AutoML, an end-to-end process of creating a model (out of many) on very few general hyperparameters, hiding more and more of the usual stats process, all to the point of reducing the need for understanding the many hard to learn nuances of the statistical practices involved.
On the whole other end of the spectrum are the methods to addresse replicability crisis occurring in many scientific areas, mostly motivated by the poor use of statistics: confusion of statistical and effect significance, p-hacking, HARK-ing, other superficial uses of statistics. All this is asking people who use these tools to know more and better the nuances of statistical thinking.
Details are missing about the innards of AutoML: is it running an SVM and a LR and a RF with multiple kernels, hyperparams, etc? Is it following basic defensive statistics like Bonferroni correction? Or is it just jumping straight in to picking he best p-value out of all?
I've set this up as a dichotomy between ease of use in engineering and the correct thought in the statistical procedures. AutoML seems like a great thing for creating successful models. But then I wonder if they're not only ignoring the entire history of statistical thinking but even running away from it.
Are the AutoML researchers taking into account the statistical nuances successfully or are they enabling even more problems with models by ignoring the nuances (choosing between too many models for the amount of data)? And likewise are those who are statisticians making it harder to make reputable models? As a side question, is this characterization of AutoML as a more problematic statistical procedure accurate?
I suppose a TL;DR to all this is is AutoML just p-hacking all the models?