Students in linear regression courses are taught that more data is good. They're taught that checking assumptions are good. They're taught that the Shapiro-Wilk test is good. Then they're taught that having a lot of data and testing a normality assumption with the Shapiro-Wilk test is bad, because it is too good at detecting deviations from normality when you have a large sample. When you look at the mathematics of the test, this makes perfect sense. But if you take a step back from the mathematics, you may begin to realize how utterly farcical the situation that we've created here is. It's like the punchline of a joke about mathematicians and assumptions.
In a similar vein to the famous aphorism "all models are wrong, but some are useful," I think we can also admit "most assumptions in statistics are wrong, but some are close enough." With that in mind, is there a way to quantify the practical relevance of a test result in a way that checks a "close-enough" assumption? I'd like to avoid the farce of "we can't trust our tests, because they're too accurate."
EDIT 2: This question is being misunderstood as a question about model assumptions or normality assumptions, but it is not. Here's an attempt to clarify the question. The power of statistical tests is the ability of the test to detect deviation from the null hypothesis. The power of tests increases with sample size. If the sample size is huge, even a trivial deviation from the null could lead to "statistically significant" results. If the effect size is small, we may not care that the test is statistically significant. This is not a problem for tests that are performed within a model, e.g. a t-test on a coefficient in OLS, because we also have an estimate of the effect size.
In many other statistical tests, we do not have an estimate of the effect size. Moreover, we already know that the null hypothesis in many tests is false (e.g. we know that real data is never perfectly normal, we know that two populations are never perfectly identical), and what we really care about is the effect size. If the sample size is small or medium, we can imitate knowing the effect size, because if the power of the test is low or moderate, it won't detect small departures from the null. In such a situation, a test finding statistical significance can be interpreted as a rough proxy for a significantly sized effect, because that's the only way the effect could have been detected with a low power test. That scheme breaks down if the sample is large, because the power is so high that any tiny departure from the null is detected and considered statistically significant. In that latter case, statistical significance does not contribute anything to our understanding of the problem we're trying to address.
The only purpose of my question is to determine if there are currently existing ways to estimate the effect size of such tests so that they can still be used with large samples.