An answer from game theory would be that you should use the Shapley Value. In a nutshell, the Shapley Value tells you, if you have a company which creates M units of profit, how to share those units fairly amongst its employees, where fairness is defined relative to productivity/value add of the employee.
The analogue in machine learning (sticking with binary classification for the sake of an explanation but similar analogues can be drawn in regression) goes as follows. Say that your classifier is predicting $P(Y=1|\underline{x})=0.83$ whereas the base rate, i.e. $P(Y=1)$ in your data is perhaps 0.53. Thus for this example, the probability of Y being equal to 1 is 0.3 higher than the base rate. This is like a company which made 0.3 units of profit, and wants to share that profit up fairly amongst its M employees (where M is the dimensionality of $\underline{x}$, i.e. the number of features).
In order to calculate these Shapley values, you must figure out what your classifier would be predicting, if it only had access to any subset of the full set of features (these subsets are known as coalitions), of which there are $2^{M}$, and thus calculating these in practice can be difficult and certainly requires numerical trickery for all but the most simple cases. For the state of the art in this field, see here (disclaimer, I am not an author of this paper but I do work with some of the authors). For an open source package which takes some shortcuts but is easy to use, try TreeShap
Note, it is very tempting to interpret Shapley values Causally and this is plain incorrect. For example, if your feature vector $\underline{x}$ contains two features which are highly correlated, where one causally affects the target and the other does not, they will likely have similar Shapley values (there are ways around this using asymmetric Shapley values, when you know the causal relationships, but Shapley values can't help you determine what's causative and what's correlative if you don't already know). The Shapley value of a feature must be interpreted strictly as "the amount by which this example is predicted by our model to be more/less likely than the baseline to be of the positive/negative class, is attributable this much to this feature"
Even then, this can be somewhat disappointing/counterintuitive. Using the above example again, when you have two features which are highly correlated (perhaps one equals the other plus white noise), they will likely have very similar Shapley values. If however, you retrained your model, removing one of these features, the remaining feature would simply double its Shapley value.
The main utility in my opinion, is exactly for the kinds of use case you are talking about. Broadly speaking, you know that low-income low credit score individuals are more likely to be rejected for a loan, but it's nice to know, at the individual level, which feature is being used more. This tells you nothing causative, it doesn't tell you how to improve your chances of not defaulting on a loan, but it can tell you how to improve your chances of getting a loan (i.e. should I work on my credit score or do I need to get a pay rise before the algorithm changes its mind about me?)