Feasibility of a neural network fitting a specific multivariate quadratic function?

Question

I have run into some problems when trying to train a network that fits some multivariate quadratic function, or the Euclidean distance between 2 points in a 3-dimensional space, where they are 'pretty far from each other'(typically between $2.1\times10^7$ and $2.4\times10^7$).

To begin with, I have a whole bunch of data from a simulation that features the coordinates of both points, and the distance between them calculated with exactly the same formula as what I attempt to fit, stated as below:

$d=\sqrt{\Delta x^2+\Delta y^2+\Delta z^2}$.

The input data are a set of 3 decimals of the difference of coordinates of both points, one really close to the origin point and another faraway. (Originally 6 when I didn't do the subtraction myself.) The set is generated by a simulation with numpy.random(size = 3) and then scaled up with a proper multiplier.

and with Keras I constructed something like this:

train_f, test_f, train_t, test_t = train_test_split(X_dist, Y_dist, test_size = 0.3)
nn_r = Sequential()
nn_r.add(InputLayer(input_shape = (train_f.shape[1], )))
nn_r.add(Dense(6, activation = 'linear'))
nn_r.add(Dense(12, activation = LeakyReLU()))
nn_r.add(Dense(12, activation = LeakyReLU()))
nn_r.add(Dense(1, activation = 'linear'))
nn_r.compile(loss = 'mse', optimizer = Adam(), metrics = ['mae', 'mse'])
es_r = EarlyStopping(monitor = 'loss', patience = 30)
lr_r = ReduceLROnPlateau(monitor = 'loss')
nn_r.fit(train_f, train_t, epochs = ep, verbose = 1, callbacks = [es_r, lr_r])
score_r = nn_r.evaluate(x = test_f, y = test_t)

The network does accept the design and starts well, but never giving a reasonable result; mse reaches somewhere above $1\times10^9$ in the end.

I have tried normalizing the input with sklearn.preprocessing, which did accelerate the training process but did not help with the accuracy; modifying its structure did affect the result significantly but it never end up in an acceptable result either. I even tried doing the subtraction myself and feed the network with the difference, and that did not work either.

Additionally, the system might need to deal with noisy data so I decided to give NNs a try instead of simply putting some numbers back in the formula above.

I'm not sure where the problem could be from, and how I can make it work better. Thank you in advance.

The numerical progression suggested may reduce to something more tractable by transformation of variables, e.g., taking logarithms. Another example, [orbital resonance](https://en.wikipedia.org/wiki/Orbital_resonance) is a 3-D distance problem that is more tractable in 1-D (orbital radius). I would first characterize the distribution of distances and identify which probability density function best corresponds to it, and, if they are for example, lognormal, I would take the logarithm of distances, and without that initial step, only nonparametric methods; ranking the data, apply. — Carl, Nov 06 '18 at 20:39
Jumping to NN without doing data characterization might be tractable on ranked data, but on raw data, I would expect problems. — Carl, Nov 06 '18 at 20:44
Thank you. I did check the other (and some more) threads on this and they seem a bit too generalized and did not improve much here. Still in verification now. — Suzuco, Nov 07 '18 at 09:09

seanv507 · Accepted Answer · 2018-11-06T23:11:09.110

This is an example test case, so its worth building up slowly perhaps.

for your particular test case there seems a natural neural network implementation with relus (remember that in 1 D relus implement a piecewise linear function). namely : 1 layer of relus take difference in coordinates and approximates each of the squared functions. so consider each of your differences, and plot the full range of differences vs the square of difference). How many 'knot points' would you need to approximate the quadratic function by piecewise linear steps - thats the number of relus you need (for each coordinate difference). I am guessing you will need 1000s of relus?

Now you need a layer to implement the square root. again plot the sum of squared differences vs its square root. How many relus do you need (for your range of values)...?

finally you just need the output node that sums them all up.

TLDR - I suspect you need a much larger number of relus.

[ so a simple 'closed form' solution to this problem (with regular relus) is just place knot points (ie bias value) equally spaced over your input range (for each function), and weight value to second derivative at the knot point of the function you are trying to approximate (squared /squareroot).]

[![#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Created on Tue Nov  6 22:22:01 2018


"""


import matplotlib.pyplot as plt
import numpy as np
start = 1e6
stop = 1e7
knot_spacing = 1e5
data_spacing = 1e4
knots = np.arange(start, stop, knot_spacing)
data = np.arange(start, stop, data_spacing)


def relu(x):
    return np.where(x>0, x, 0)

grad_xsq = (knots\[1:\]**2  - knots\[:-1\]**2)/knot_spacing
# 1st grad is actual delta, then following are change in delta's
grads =  np.insert((grad_xsq\[1:\] - grad_xsq\[:-1\]), 0, grad_xsq\[0\])

relu_approx =  (knots\[0\] ** 2 +
    relu(grads\[np.newaxis, :\] *
           (data\[:,np.newaxis\] - knots\[np.newaxis, :-1\])).sum(axis=1))

mat = np.stack((data, data **2, relu_approx), axis=1)
ax=plt.gca()
ax.plot(mat\[:,0\], mat\[:,1:\])
ax.legend(\['quadratic', 'relu'\])
ax.legend()

rmse = np.sqrt(((mat\[:,1\] - mat\[:,2\])**2).mean())
print('rmse is {:.2E} with {} knots'.format(rmse, len(knots)))][1]][1]

Thank you. I added more hidden layers with thousands of relu units and it ended up way better than it had been. Still seeking further optimizations :) — Suzuco, Nov 07 '18 at 10:05
did you try creating the same structure as I suggested: eg 1st hidden layer 1000 relu units. then single linear neuron, then 1000 relu units. then linear unit. Then do a graph of how the rmse scales with number of hidden units. The code above basically implements the squared function using relus, so it can give you an idea of the number of nodes you need to achieve a given rmse. — seanv507, Nov 07 '18 at 10:40
I've been working on that and it turned out I had insufficient units at the beginning. — Suzuco, Nov 09 '18 at 11:27
With the method you suggested I found the number of units that gives the best results so far. — Suzuco, Nov 09 '18 at 11:34

Feasibility of a neural network fitting a specific multivariate quadratic function?

1 Answers1

Linked