Optimization of Complicated Function with Two Random Variables

Question

I am trying to find $x$ that will minimize the following expression which involves two sources of randomness. I am stuck and not even sure where to start. Any suggestions would be appreciated. Please let me know if anything is not clear and if you need any further information.

QUESTION 1)

I am not sure closed form solutions exist so, can we safely say that numerical solutions are the only alternative?

QUESTION 2)

How do we even demonstrate that a minimum exists and whether there are multiple minima or a unique one?

OPTIMIZATION PROBLEM

\begin{eqnarray*} & = & \underset{\left\{ x\right\} }{\min}\: E\left\{ \vphantom{\left(\frac{\frac{\alpha P_{T-2}}{\sqrt{\sigma_{\varepsilon}^{2}}}}{\frac{\alpha P_{T-2}}{\sqrt{\sigma_{\varepsilon}^{2}}}}\right)}x\left(\sqrt{\gamma^{2}\sigma_{\eta}^{2}+\sigma_{\varepsilon}^{2}}\;\right)\left[\left(\frac{\alpha+\beta x}{\sqrt{\gamma^{2}\sigma_{\eta}^{2}+\sigma_{\varepsilon}^{2}}}\right)+\frac{\phi\left(\frac{\alpha+\beta x}{\sqrt{\gamma^{2}\sigma_{\eta}^{2}+\sigma_{\varepsilon}^{2}}}\right)}{\Phi\left(\frac{\alpha+\beta x}{\sqrt{\gamma^{2}\sigma_{\eta}^{2}+\sigma_{\varepsilon}^{2}}}\right)}\right]\right.\\ & + & \left(W-x\right)\left(\sqrt{\gamma^{2}\left\{ a+bx-\theta\eta+\varepsilon\right\} ^{2}\sigma_{\eta}^{2}+\sigma_{\varepsilon}^{2}}\;\right)\\ & & \left.\left[\left(\frac{\left\{ a+bx-\theta\eta+\varepsilon\right\} \left\{ \alpha+\beta\left(W-x\right)-\rho\eta\right\} }{\sqrt{\gamma^{2}\left\{ a+bx-\theta\eta+\varepsilon\right\} ^{2}\sigma_{\eta}^{2}+\sigma_{\varepsilon}^{2}}}\right)+\frac{\phi\left(\frac{\left\{ a+bx-\theta\eta+\varepsilon\right\} \left\{ \alpha+\beta\left(W-x\right)-\rho\eta\right\} }{\sqrt{\gamma^{2}\left\{ a+bx-\theta\eta+\varepsilon\right\} ^{2}\sigma_{\eta}^{2}+\sigma_{\varepsilon}^{2}}}\right)}{\Phi\left(\frac{\left\{ a+bx-\theta\eta+\varepsilon\right\} \left\{ \alpha+\beta\left(W-x\right)-\rho\eta\right\} }{\sqrt{\gamma^{2}\left\{ a+bx-\theta\eta+\varepsilon\right\} ^{2}\sigma_{\eta}^{2}+\sigma_{\varepsilon}^{2}}}\right)}\right]\right\} \end{eqnarray*}

\begin{eqnarray*} \varepsilon\sim N\left(0,\sigma_{\varepsilon}^{2}\right)\equiv\text{Zero Mean IID (Independent Identically Distributed) random shock or white noise} \end{eqnarray*} \begin{eqnarray*} \eta\sim N\left(0,\sigma_{\eta}^{2}\right)\equiv\text{Zero Mean IID (Independent Identically Distributed) random shock or white noise} \end{eqnarray*}

Here, $\phi$ and $\mathbf{\Phi}$ are the standard normal PDF and CDF, respectively. $E$ is the expectation operator. All others are known constants except $x$ ,of course, which is the control variable.

STEPS TRIED

I was initially stuck, but have simplified this to the below expression based on some helpful suggestions from group members.

\begin{eqnarray*} & = & \underset{\left\{ x\right\} }{\min}\: E\left\{ \vphantom{\left(\frac{\frac{\alpha P_{T-2}}{\sqrt{\sigma_{\varepsilon}^{2}}}}{\frac{\alpha P_{T-2}}{\sqrt{\sigma_{\varepsilon}^{2}}}}\right)}px+qx^{2}+\frac{rx\phi\left(\alpha+\beta x\right)}{\Phi\left(\alpha+\beta x\right)}+\left(W-x\right)\left\{ a+bx-c\eta+\varepsilon\right\} \left\{ \theta-kx-\gamma\eta\right\} \right.\\ & + & \left.\left(W-x\right)\left(\sqrt{d^{2}\left\{ \theta+bx-c\eta+\varepsilon\right\} ^{2}+\sigma_{\varepsilon}^{2}}\;\right)\frac{\phi\left(\frac{\left\{ a+bx-c\eta+\varepsilon\right\} \left\{ \theta-kx-\gamma\eta\right\} }{\sqrt{d^{2}\left\{ \theta+bx-c\eta+\varepsilon\right\} ^{2}+\sigma_{\varepsilon}^{2}}}\right)}{\Phi\left(\frac{\left\{ a+bx-c\eta+\varepsilon\right\} \left\{ \theta-kx-\gamma\eta\right\} }{\sqrt{d^{2}\left\{ \theta+bx-c\eta+\varepsilon\right\} ^{2}+\sigma_{\varepsilon}^{2}}}\right)}\right\} \end{eqnarray*}

You can *considerably* simplify this expression: the first term in the expectation is a constant (so you can ignore it when searching for optimal $x$). The next is a sum of two terms, one of which simplifies and has an easily evaluated expectation. The final one is the messy part, but suitable redefinitions of $\alpha,\beta$ *etc* will at least simplify its form. It would help to write it in terms of $f(z)=\phi(z)/\Phi(z)$. Doing that preliminary (routine) work would be a favor to your readers and increase your chances of obtaining useful answers. — whuber, Jun 30 '15 at 13:06
Is x a scalar and all other symbols are scalars having known values? This can basically be solved as a simulation optimization problem. it's "easy" due to only be one dimensional if the answer to my first sentence is yes on all counts. Step 1 is to write a stochastic (Monte Carlo) simulation routine to evaluate the expectation. Steps 2 and preferably 3 are to do the same for the first and second derivatives. Then you need to use a robustified safeguarded Newton or steepest descent algorithm on this. I'm not going to tell you how to do the robustification. — Mark L. Stone, Jun 30 '15 at 18:56
Gradient descent is not a very good or robust algorithm for this problem. There are no a priori guarantees on whether there is only one local minimum, or multiple local minima with different objective values, and that may depend on the values of the various input parameters. I might be able to figure that out to high confidence in the course of solving. Someone would have to make it worth my while to spend time solving this. But first, answer the question in the first sentence of my previous comment. — Mark L. Stone, Jun 30 '15 at 18:57
You ought to be able to differentiate "under the integral", i.e., d/dx(E(f(x)) = E(d/dx(f(x)). That lets you avoid using finite differences, which is a good thing to do. You also want to use common random numbers across function and derivatives for simulation across all values of x encountered in the optimization. — Mark L. Stone, Jun 30 '15 at 19:20
Thanks Mark, x is indeed a scalar and so are all the symbols, including the two random variables. — texmex, Jul 01 '15 at 09:50
@whuber. As always, appreciate your suggestions. I have added a simplified expression above. Is there any closed forum solution or approximation possible? Or at least some way to show that a solution, that is, a minimum would exist? — texmex, Jul 01 '15 at 09:52
If all else fails, and you can settle for a non-closed form solution, try a metaheuristic like particle swarm optimization or a genetic algorithm. The function seems cheap enough to evaluate, so it shouldn't be a problem. — Marc Claesen, Jul 01 '15 at 14:17
Do you only want to solve this for one set of input parameters, or for multiple sets of input parameters? Can you provide numerical values of all the input parameters, at least for one case, or perhaps more than one, case of interest as applicable? — Mark L. Stone, Jul 01 '15 at 17:42
@MarkL.Stone There would be multiple sets of inputs parameters. What I mean by this is that the parameter values can change. At this stage, the only thing we know is that all the parameters are positive. — texmex, Jul 02 '15 at 02:44
When will you have a complete set of input parameters/? No one is going to be providing a closed form solution in terms of the parameters. The parameter values are needed to generate a concrete solution. The local extrema landscape and character of the problem could potentially change depending on the values of he parameters. Can you tell me what the application is? — Mark L. Stone, Jul 02 '15 at 03:21
@MarkL.Stone ... Much appreciative of your suggestions. But the parameter values can vary and the only restriction on them is that they are positive and $W > x$. So are you saying that there is imply no way to get a solution (even after using approximations, say Taylor rule or so) with the parameters. If so, I will generate some sample numerical values for the paramters shortly. The application of this is for financial portfolios. — texmex, Jul 02 '15 at 03:40
I don't know how to solve this in "closed-form". I'm not saying no one does. f you make enough approximations, you can solve "it', but the it you solve may bear little to no resemblance to what you're trying to solve.You may not believe it, but I saw a guy break the "unity barrier" in probability by use of a two term Taylor series. The guy had a Ph.D. in Physics and spent 2 years developing his model. For no reason, he took a 2 term Taylor series, and his intermediate "probability" was 1.2, but his final probability wasn't. He didn't know until I told him. Always look at remainder term. — Mark L. Stone, Jul 02 '15 at 04:11
Thanks @MarkL.Stone I see what you mean. I will revert back with numerical seeds for the parameters as I progress further on this. Meanwhile, wanted to get your attention on this other question, which is more interesting a closed form solution seems to be lurking around the corner, but remains elusive: http://stats.stackexchange.com/questions/157954/conditional-expected-value-of-product-of-normal-and-log-normal-distribution — texmex, Jul 02 '15 at 06:58

score 2 · Answer 1 · answered Jun 30 '15 at 09:03

2

I will try to answer question 1 now (a numeric optimization scheme - gradient descent). I will edit my answer if I find an answer to question 2.

Let us consider p(x) = E{x*f(x) + (W-x)*g(x)} As you stated, simple forward difference method can be used to define the first derivative:

p'(x) = (p(x+h)-p(x))/h for a small value h

If you want to maximize p(x), initialize x to some value in the domain and update x iteratively as

Repeat till convergence:

1) Find p(x+h), p(x) and p'(x) 2) x <- x + *alpha* * p'(x)

where alpha is called the learning rate.

Gradient descent can be stuck at a local optima as p'(x) = 0 is one of the conditions for convergence. This can be overcome using stochastic variants or using methods like simulated annealing, etc.

answered Jun 30 '15 at 09:03

Naveen Mathew

231
1
11

I am trying to find the minimum so I suppose your approach applies either way? – texmex Jun 30 '15 at 09:07
It works. You have to change the updation step to: x – Naveen Mathew Jun 30 '15 at 09:08
Thanks for your pointers @ Naveen Mathew. So without knowing the answer to question 2, we won't know how to proceed is that right? or are you saying we can try both to see if an extremum exists? – texmex Jun 30 '15 at 09:28
Also, please note there are two sources of randomness inside the second part of the function. Would that not matter? – texmex Jun 30 '15 at 09:33
I cannot comment on the existence/non-existence of global minima based on the results of gradient descent. I did notice the presence of randomness in the second term. The overall optimum solution may depend on these terms. I would suggest you to simulate and obtaim the distribution of the optima by varying the values of ε and η. You must also note that random normal and white noise are not necessarily the same. – Naveen Mathew Jun 30 '15 at 10:04
I cannot take the derivative without getting the Expected value right? So the calculus approach is completely ruled our right? Hence we can only run simulations and get values of the function for the two values of the random variables... would this be correct? – texmex Jun 30 '15 at 12:50
From the statistics point of view, you can replace squares of ε and η with another random variable: http://stats.stackexchange.com/questions/93383/square-of-normal-distribution-with-specific-variance. Then you can simulate, find expected value and then take derivative. It is highly computation intensive. I'm sure it can be optimized. – Naveen Mathew Jun 30 '15 at 13:00
Two sources of error is no problem. That's handled in the simulation. I am guessing that eta and epsilon are independent. But if not, that's not a problem either - just generate dependent pairs of (eta, epsilon) in the simulation. – Mark L. Stone Jun 30 '15 at 19:12
This answer seems to miss the crux of the question. This is a one-dimensional problem involving an almost-everywhere smooth function of $x$. It can readily be solved *once the expectation has been computed as a function of $x$.* It is of some interest to determine whether multiple local minima might exist. If they do, avoid gradient descent altogether. – whuber Jul 01 '15 at 14:44
Gradient descent should be a LAST resort, never a first resort. Gradient descent MAY NOT DESCEND, It is misnamed. Descent algorithms need to safeguard to ensure descent. But with noisy function evaluation you can never get a perfect guarantee, but there are ways to robustify. I'm developing them now for non-convex functions up to several tens of thousands of variables. I don't have anything ready to share with the world yet, however. – Mark L. Stone Jul 01 '15 at 17:26
I completely agree with Mark and @whuber. My answer was based on the assumption that the function is not easily differentiable. I have tried other variants of gradient descent and randomized algorithms which give accurate results even in case of presence of multiple local maxima/minima. – Naveen Mathew Jul 02 '15 at 07:30

Mark L. Stone · Answer 2 · 2015-07-02T13:46:39.803

Here is a compilation and expansion of some of my comments, which should be worthy of being an answer.

This can basically be solved as a simulation optimization problem. It's "easy" due to only be one dimensional, since per the comments, x is a scalar (as are all the input parameters).

Step 1 is to write a stochastic (Monte Carlo) simulation routine to evaluate the expectation. Steps 2 and preferably 3 are to do the same for the first and second derivatives. You ought to be able to differentiate "under the integral", i.e., d/dx(E(f(x)) = E(d/dx(f(x)). That lets you avoid using finite differences, which is a good thing to do.

Two sources of error is no problem. That's handled in the simulation. I am guessing that eta and epsilon are independent. But if not, that's not a problem either - just generate dependent pairs of (eta, epsilon) in the simulation.

You want to use common random numbers across objective function and derivative evaluations for all values of x encountered in the optimization. That is THE MOST IMPORTANT thing to do. There should be extremely high correlation in this simulation, so common random numbers will be of huge benefit to the optimization, and may reduce the needed number of simulation replications, per value of x, by orders of magnitude.

Then you should use a robustified safeguarded Newton or steepest descent algorithm on this. I'm not going to tell you how to do the robustification (see next paragraph).

Gradient descent is not a very good or robust algorithm for this problem. Gradient descent should be a LAST resort, never a first resort. Gradient descent MAY NOT DESCEND, It is misnamed. Descent algorithms need to safeguard to ensure descent. But with noisy function evaluation you can never get a perfect guarantee, but there are ways to robustify. I won't provide the details here as I'm not ready yet to share with the world in the current state of development.

I don't think anyone is going to be providing an (anywhere close to accurate) closed form solution in terms of the parameters. The parameter values are needed to generate a concrete solution. The local extrema landscape and character of the problem could potentially change depending on the values of he parameters. There are no a priori guarantees on whether there is only one local minimum, or multiple local minima with different objective values, and that may depend on the values of the various input parameters. I might be able to figure that out to high confidence in the course of solving.

Optimization of Complicated Function with Two Random Variables

QUESTION 1)

QUESTION 2)

OPTIMIZATION PROBLEM

STEPS TRIED

2 Answers2