2

One could hypothesize that factors x1, x2, x3 predict y. Then, test that hypothesis using a statistical model or machine learning method restricted to those predictors while establishing criteria defining success. Yet there is still no cause/effect demonstrated per se nor an if/then relationship ventured. Given the ability of methods to "predict" given enough data and even modest manipulation, the ability to predict y will always be true given enough sample size. Risk prediction seems more analogous to having hammer, nails and wood in hand and asking "can I build a desk."

Is risk-modeling a hypothesis-driven, scientific endeavor or some other entity altogether?

Todd D
  • 1,649
  • 1
  • 9
  • 18
  • 1
    Still looking for an answer? – DeltaIV Jun 01 '18 at 18:43
  • I ask because if you are, I could write an answer. – DeltaIV Jun 04 '18 at 15:06
  • 1
    @DeltaIV yes, I am interested in an answer. – Todd D Jun 04 '18 at 15:09
  • Ok, I'll write one. – DeltaIV Jun 04 '18 at 15:19
  • Would you also call astronomy not a science? Except for the few earth based experiments there is no way to perform randomized control studies in Astronomy. So everything is just correlating uncontrolled observations with theories. – Sextus Empiricus Jun 12 '18 at 15:34
  • In astronomy, I could posit that the sun revolves around the earth, gather data to test the hypothesis, form conclusions and next steps. With risk modeling, I posit that `x's` predict`y`. If I cannot predict accurately maybe I have the wrong association or model for association. If I know `x` is associated with `y` from empiric evidence the ability to predict is thus determined and whether I can accomplish prediction is a technical question not a theoretical one. – Todd D Jun 12 '18 at 18:11

1 Answers1

-2

Risk modeling is a scientific endeavor, and it bears no resemblance whatsoever to DIY. Let's make a concrete example: suppose I have the following data

enter image description here

where $x$ is my input, $y$ is my response, the blue dots are experimental observations and the black line is the ideal model (which I wouldn't know in a real case)1. From my experimental results, it looks like $y$ has a global maximum around $x=0.5$.

Suppose now that I need a $y$ of at least 0.23 to meet my revenue target. Obviously if I set $x=0.5$, I'll minimize my chances of not meeting my targets, but I want to be more precise and quantify the risk of not meeting my target. In other words, I want to estimate

$$P(Y<0.23|x = 0.5)$$

Of course, in a real case we don't know this probability (we can only estimate it from data), but in my example I used the following generative model:

$$ Y|x \sim\mathcal{N}(x(1-x)\cos(x-0.5)^2, \sigma)$$

with $\sigma = 0.01$, thus the probability is easily computed as

$$ P(Y<0.23|x = 0.5) = \Phi\left(\frac{0.23-0.5(1-0.5)\cos(0.5-0.5)^2}{\sigma}\right) = \Phi\left(-\frac{0.02}{0.01}\right)= 0.02275013$$

where $\Phi(y)$ is the CDF of a standard Gaussian variable.

This probability doesn't depend on the number of samples we observe: only the accuracy of our estimate does. For example, using Gaussian Process Regression to estimate the ideal model, and 4 different samples of size $N=\{100, 1000, 3000, 5000\}$, here's what we get: enter image description here enter image description here enter image description here enter image description here

Our estimates changes, but the real risk probability (the probability of getting a $y$ value below the green line when $x=0.5$) doesn't:

enter image description here

the red line is the exact risk, while the cyan line is our estimate. Note: I used a log scale for the $y$ axis, in order to show more clearly how the estimated risk gets gradually closer to the exact risk.


1for example, $y$ could be the yield of a binary chemical reaction, and $x$ the mole fraction of one of the two reactants: if I $x=0$ or $x=1$, then I have respectively 1 or 0 of the other reactant, so the yield $y$ is 0. The yield seems to be highest around $x=0.5$, but since the actual yield is never equal to the theoretical yield, we can imagine I had to perform some experiments to get an actual yield curve (usually they look nothing like this, but it's just an example).

DeltaIV
  • 15,894
  • 4
  • 62
  • 104
  • @Todd I used a fairly standard definition of [statistical risk](https://en.m.wikipedia.org/wiki/Statistical_risk). If you had something else in mind, you should have stated it more clearly in your question. – DeltaIV Jun 08 '18 at 23:46
  • Your answer doesn't address the enterprise of whether risk modeling is science or not. The absence of presence of fit may be an example. If fit is not achieved, does this test a meaningful natural relationship or purely demonstrates lack of ability to model the universe of possibilities? – Todd D Jun 10 '18 at 22:30
  • I just proved that risk modeling is a scientific endeavour, by estimating a risk in a consistent way, whatever the number of samples. The sentence "does this test a meaningful natural relationship or purely demonstrates lack of ability to model the universe of possibilities" doesn't make sense. If you think that the sample size affects the Type I error rate of a test, [you're obviously wrong](https://stats.stackexchange.com/a/2519/58675). – DeltaIV Jun 11 '18 at 13:59