A Frequentist approach to modeling uncertainty around decision optimization

Question

I'm curious about how a Frequentist would approach an optimization problem, where said problem is constructed using inferred parameters. As an example, I'll use price optimization given a demand curve. $Revenue(x) = Price * Volume$. Let's assume my demand curve follows exponential decay.

Infer demand curve parameters
Maximize $Revenue(X)$

I can perform linear regression, given price and the log of the sales volume, to infer the slope and intercept. Using a Bayesian approach, I'll have a posterior chain to sample from. So, if I have 4000 samples in my posterior chain, I can uniformly draw indices in the range [1, 4000], and retrieve their corresponding slope and intercept values then use them to visualize my revenue function.

Big edit: Code included

simulate data

sd = 0.5
m,b = -0.25, 5
X = np.linspace(0,20,100)
Y = np.exp(np.random.normal(loc=m*X+b, scale=sd))
plt.scatter(X,Y)

Model w/ PyMC3

with pm.Model() as model:
  m = pm.Normal('m',mu=0, sd=2)
  b = pm.Normal('b',mu=0, sd=2)
  s = pm.Exponential('s',lam=1)
  y_hat = pm.math.dot(m, X) + b
  lik = pm.Normal('lik', mu=y_hat, observed=pm.math.log(Y), sigma=s)
  trace = pm.sample(chains=4)

plot posterior-predictive distribution

def post_plot(trace_obj=trace,samples=100,size=len(X)):
  for itr in range(samples):
    idx = random.choice(range(size))
    m = trace_obj.get_values('m')[idx]
    b = trace_obj.get_values('b')[idx]
    Y_hat = np.exp(m*X + b)
    plt.plot(X,Y_hat)
  plt.scatter(X,Y)

post_plot()

Posterior predictive revenue

def rev_posterior(samples=100, size=len(X)):
  for s in range(samples):
    idx = random.choice(range(size))
    m = trace.get_values('m')[idx]
    b = trace.get_values('b')[idx]
    rev = X * np.exp(m*X +b)
    plt.plot(X, rev)
  return 

rev_posterior()

.

This is useful in two ways: (A) I can see the variance around the sales volume at the optimal price. And (B) I can see that each sampled function has a peak around the same spot; in other words, there's very little variance around where the optimal price is located.

My actual question: Using parameter estimates and confidence intervals, how might a Frequentist construct a similar revenue function plot? I understand that they could simply use the point estimates of the slope and intercept and then find a point estimate of the optimal price, but I'm curious about the variance around the optimal value.

For the code that simulated this data, generated this plot, and Bayesian model that inferred the posterior distribution, see here

At the moment you have not shown any source of randomness or model parameters in your analysis. If you would like a clear exposition on how classical methods solve this problem then that would be helpful. — Ben, Nov 16 '21 at 20:16
I am not sure maximizing revenue is what you would want to do unless your variable costs are zero. — dimitriy, Nov 16 '21 at 20:27
@Ben True, it's not contained in the post directly. For brevity, I've used linked it (see the word "here" for Hyperlink to Google Colab) — jbuddy_13, Nov 16 '21 at 20:27
@dimitriy, in practice, I would have purchase, storage, and transportation costs to account for in order to maximize profits. But I decided that adding that additional complexity would take away from the focus of the question, which is ~'what's the frequentist analog of sampling from your posterior chain for downstream computations?' — jbuddy_13, Nov 16 '21 at 20:30
I do not know the answer, but it might be that you are taking a Bayesian approach and then trying to substitute a single step, and that does not work. Perhaps a frequentist analysis would take another path altogether. But I may be wrong. Formulating the statistical model more explicitly could be helpful. Such details need not be relegated to links. — Richard Hardy, Nov 16 '21 at 20:35
@RichardHardy, could you expand on 'substitute a single step'? I'm not sure that I understand. — jbuddy_13, Nov 16 '21 at 23:07
@jbuddy_13, suppose a typical Bayesian analysis consists of steps A, B and C while frequentist analysis consists of steps D, E, F and G. Further suppose B does not have a conceptual counterpart among D, E, F og G. Then asking how a frequentist would do B does not necessarily make sense. E.g. "How do you select a prior in your model?" is not a relevant question to ask a frequentist. This is roughly what I meant above. — Richard Hardy, Nov 17 '21 at 03:59
To succeed, you need more than the parameter estimates and (individual) confidence intervals: you need a joint *confidence region* for all the parameters. The reason is that the location of the optimum is a highly nonlinear, non-bijective function of the parameters, rendering the usual approximations ("delta method") useless. I suspect many people would use a bootstrap, provided there are enough data to justify appealing to its asymptotic properties. — whuber, Nov 17 '21 at 15:34
@whuber, this makes sense! I've gotten some skepticism in previous comments here, and I'm on the fence whether my original Bayesian approach is principled or if I've gone awry with what the posterior distribution can truly support. — jbuddy_13, Nov 17 '21 at 15:54
The Bayesian approach to this problem is appealing--I see nothing invalid in it. It might be interesting to compare that to a nonparametric bootstrap, because of the relaxed assumptions required by the latter. Note that the Bayesian posterior has a different interpretation than a confidence interval (or, perhaps, confidence distribution), but it seems likely that in practice these results would be used in substantially similar ways for the same purposes. — whuber, Nov 17 '21 at 16:00

Geoffrey Johnson · Answer 1 · 2021-11-16T23:25:37.667

For uncertainty concerning an unknown fixed true parameter the frequentist would construct a confidence interval. This can be viewed as the inversion of a hypothesis test to identify the set of plausible values for a parameter given the observed data.

The frequentist can use these parameter estimates (slope and intercept) to estimate the unknown fixed true mean revenue and construct a confidence band. This is also based on the inversion of a hypothesis test. Most of the time it is a Wald or t-test that is inverted to form a confidence interval. When dealing with non-normal data a proper link function (transformation of the parameter and the estimator) will yield an estimator that approximately follows a normal distribution. The Wald test can be safely inverted for the transformed parameter, and the confidence limits back-transformed to the scale of interest. Think of logistic regression where the estimator for the log odds is well-approximated by a normal distribution. By using the inverse link function we can obtain confidence limits for the population proportion. Let me know if you need more details. This link function approach is also used in Bayesian inference so that a normal distribution can be used for the prior and posterior.

Here is an answer providing a couple of ways to construct a confidence interval for a binomial proportion and to visualize this inference using a confidence curve, analogous to a Bayesian posterior. Here is a discussion of hypothesis tests and confidence intervals, and how they are all approximations to the likelihood ratio test.

A Frequentist approach to modeling uncertainty around decision optimization

1 Answers1