Trying to solve costly optimization problem by surrogate loss surface

Question

Suppose I am trying to tune weights W of a neural network for the problem which is non-smooth, by using an expensive numerical calculation of gradients. I have been stuck at not being able to get a good solution in reasonable time.

Then I decide to trick the problem by following:

generate 10K vectors of W' that cover most of the domain of values that W can occupy, and calculate the loss function L for them using a very non-trivial set of rules involving the output of the neural network.

I create X from W' (in a sense of input space) and fit another neural network which should approximate calculated L. I wanted to, first of all, understand whether it is possible to map weights to loss ignoring the true input and output spaces. It showed that for simple problems I can actually do that.

For a difficult problem I mentioned in the beginning the mapping appears nothing more than a mere averaging of the output, without any significant correlation between Y and Y'.

Can I make an intuitive statement that when I cannot succeed in mapping weights to output through a set of some simple functions (a case of a neural network), the original problem of weight tuning appears to be too noisy or unsolvable in principle? Or is it too vague or wrong?

UPDATE:

Referring to an older question here,

In case I am sure that there is a continuous kind of relation between X and neural network output (where X is a big randomized set of my weights), how can I proceed to apply a differentiable-assumed method to solve argmax(output) | X?

score 0 · Answer 1 · answered Jul 09 '18 at 17:27

I will try to answer myself.

I can take first derivative of NN output with respect to NN inputs.

As an example of a NN with one hidden layer with two units and one output regression unit:

D(expression(tanh(x1 * w01 + x2 * w02 + x3 * w03 + x4 * w04 + c01) * w21 + tanh(x1 * w11 + x2 * w12 + x3 * w13 + x4 * w14 + c02) * w22 + C11), 'x1')

w01/cosh(x1 * w01 + x2 * w02 + x3 * w03 + x4 * w04 + c01)^2 * 
    w21 + w11/cosh(x1 * w11 + x2 * w12 + x3 * w13 + x4 * w14 + 
    c02)^2 * w22

Hope that works fine...

Trying to solve costly optimization problem by surrogate loss surface

1 Answers1