If I understand correctly, I think you are noticing a general tendency in machine learning - that everything is guided by "trial and error", by "experimentation" rather than some overarching rules.
In many cases, when asked "Why did you do X or Y", you might not be able to give an answer any better than "well, that just works best on the data".
If I understand your question correctly, I think it's super interesting, philosophical and probably doesn't have a single, straightforward reason. I will give my thoughts on the subject, though.
I think there are a huge number of possible perspectives, but here is one. Our world is an insanely complex interaction of an innumerable quantity of various physical laws. The results of these interactions often have noticeable distributions, correlations etc.
Machine learning is not specific to any field - it is simply a "universal"/general set of methods which attempt to extract patterns from data (of any kind). Machine learning isn't built to incorporate laws of physics, or to take into account the physical/chemical/psychological/geological reason for observations. It is simply built for "put in distributions of data" and get predictions/structure as an output.
There are any number of possible variable interactions, correlations and otherwise "informational structure" in the world, and so machine learning attempts to capture this structure - but it is blind to the "reason" behind it. The job of RandomForest isn't to discover why people are clicking on Facebook ads. It's just to optimize a function which represents the predictive accuracy of this algorithm on whether a person will click or not, for example.
As a result, it is (in many ways) an "art" - because as long as you're not violating a very small number of core assumptions (data leakage etc.), it simply doesn't matter what you do with the data as long as it delivers a good result. Want to do feature selection first, then hyperparam tuning? Try it. Want to do it the other way around? Try it! There are no laws to break here - there is only good and bad predictive results. You get bad results when the process doesn't allow the ML algorithm to capture useful patterns in the data (if there are any). Good results happen when (for any number of reasons), the algorithm was given information in a way that allowed it to capture enduring patterns/dependencies.
"Why is the algorithm doing this?" . On a most base level, the answer will always just be "because of how the algorithm is designed to work + the data it was given". Any interpretation beyond that will involve subject matter expertise and careful and deep dives into the data. The algorithm doesn't care about that interpretation - it's just optimizing a function for the data that you give it. Any interpretation on top of that is your attempt at understanding the process that generated the data, and how that could have led to the algorithm noticing X or Y.
I know this was a big vague and hand-wavy, but I hope it gives some space to think about ML broadly and why it is so experimentally focused.
At the moment, a huge amount of ML is focused on prediction - so as long as the model gives good predictions, anything else is irrelevant, in a way. You just know "The particular way that I'm digesting the data and feeding it to the algorithm results in real world information being combined in such a way that there are noticeable informational/mathematical dependencies that are being captured by this sequence of optimization steps I call 'my ML model' ".
Important Addition: I think feature engineering (for predictive purposes) is a great illustration of the point. When engineering features, you don't necessarily need to be guided by any principles. Take the log() of the feature, normalize it, take a moving average, take a Z score. You can do all these things, and when you do (and if it works) you are left asking the same question - but why did taking a Z score help? You might find an interpretable answer. But part of the answer will always be something like "Because for whatever reason, because of whatever physical laws are governing this complex interaction you're observing, digesting the information in that particular way makes the pattern in the data more clear and so your model can pick it up easier". I think that kind of thinking can be applied generally to this whole question.