The question of "which feature has the most impact on the target" (basically your Q1) is quite a common one and it comes in many flavours. I do however note that you specifically mentioned the bad result being "caused" by x_3
. There's basically no way to conclusively prove from raw data alone that x_3
"causes" bad results.
To use a simple example, if the result
column represented how long somebody lives (and thus low result = bad), and the x_3
column represented how much they spent in their life on running shoes, you would likely find a correlation. People who spend more on running shoes likely live longer. This is of course because spend on running shoes is likely correlated with things that tend to actually cause you to live longer, such as exercising more, or generally having more disposable income which is in turn correlated with better access to healthcare.
The problem with interpreting this as causation is that you would incorrectly conclude that "in order to live longer, spend more money on running shoes". People who have no intention of exercising would spend more on running shoes (thus breaking the correlations that existed in your training data) and they wouldn't start living longer. Obviously this is a silly example because it's all very intuitive, but most data is less intuitive than this. The only way you can truly test for causation is to run a test in which you randomise the value of the variable whose causative properties you're trying to establish/investigate....which in many useful/real life situations, is very difficult/impossible to do. To de-jaronise this a little, in your case, I'm saying you'd need x_3
to be a feature which you as the experiment designer are able to vary randomly without changing to values of any other features (and to be clear, this means features we have access to, and ones we don't). The value of x_3
must not be predictive of anything apart from the result
variable.
Generally, all you'll be able to do with your data is establish the extent to which x_3
predicts your result
. This should only be interpreted as "if I know the value of x_3
, how much more accurately can I predict result
than if I did not know" and not "if I want to get a better result, is taking action to get a favourable x_3
a viable strategy?"
On Q2: yes fit a linear regression. Then plot your residuals. If they appear normally distributed, or at least distributed according to some sensible distribution, you can say that x_3
is a linear combination of x_1
and x_2
up to a XXX noise term. If that's not the case, you can conclude it's not a linear combination.
On Q3: While this isn't conclusive, you could simply fit your favourite regressor (e.g. random forest, xgboost or a neural network depending on data size) using x_1
and x_2
as features and x_3
as the target. If you manage to predict x_3
quite well (e.g. good test $R^{2}$ value) and your residuals look sensibly distributed, then you can conclude that x_3
is well-described by a non-linear combination of x_1
and x_2
.