0

Say I want to train a neural network to approximate a function F that depends on an integer k in [1,N] and a vector r of real numbers. The output of the network is a single real number. Two options come to my mind:

  1. setup N neural networks, each of then trained with samples (input=r, output=F(k,r)) for a fixed k and multiples r. When I need to test an input (k,r) select the network opportunely.
  2. train a single neural network with samples (input=[k, r], output=F(k,r)) for multiples k and r

If N gets big option 2 seems more convenient as you get a single neural network to train (consider the same total amount of training samples). Which option do you suggest? Do you have suggestions/references for the two approaches (e.g. normalization of the integer input)?

Nicola
  • 101
  • 1
    Why not just tack on the integer as another dimension in the vector $r$? // Does your integer represent a category ("dog is 1, cat is 2, horse is 3") or a quantity? – Dave Apr 28 '21 at 12:45
  • It is a quantity, I think what you are suggesting is option 2. In this case I wonder if special care should be taken since it is a (positive) integer. – Nicola Apr 28 '21 at 12:53
  • @Nicola In comments on an answer, you write that you want to approximate the function for all $k,r$. What is your goal once you have that function? For instance, if you only need to optimize it, then there are some suggestions here https://stats.stackexchange.com/questions/193306/optimization-when-cost-function-slow-to-evaluate which may be simpler just because training and tuning a neural network is a very expensive process. – Sycorax Apr 28 '21 at 14:40

1 Answers1

0

F(k, r) = y where k = Z and r = R

In python:

input:

print(type(k), type(r))

output:

<class 'int'> <class 'float'>

correct?

Assuming running this function is not too costly to compute, why not have it generate results using sklearn's randomsearch, store the input and output in a table (columns = [y, k, r]) and use that table to train a neural network (or one of many other model types for that matter)?

A sensible use case for this type of application would likely be where you have a bunch of measurements of a system F(k, r) and you would like to fill in the blanks in between the measurements. Or if there is a time dimension, maybe you would like to forecast y_t+1.

In your question it reads as if you plan to have a neural network with at most 2 inputs and 1 output. Most likely not the best way to go about solving whatever problem you want this to solve. Option 1 sounds like a worse option than option two, but its hard to judge not knowing the problem or the properties of F (like, how difficult is F to compute to generate training data, what is the application of your model?)

XiB
  • 1
  • 1
  • Actually $r \in R^n$ (in the question I write that r is "a vector r of real numbers"), and indeed F is difficult to evaluate. I would like to train the NN with (relatively) few samples and get an approximation for all k, r. – Nicola Apr 28 '21 at 13:09