1

Suppose there is a data set $\{(x_i; y_i)\}_{i=1}^{N}, x_i \in \mathbb{R}^3, y_i \in \mathbb{R}$.

I am trying to find a regression function such as $f(x) = \omega_0 + \Sigma_{i=1}^{K} \omega_i \phi(\|x_i-c_i\|)$ using least square approximation.

And my question now is whether there are any differences if I choose $\phi(x)$ to be Gaussian ($e^{-(\epsilon r)^2}$), inverse quadratic ($\frac{1}{1+(\epsilon r)^2}$), or inverse multiquadratic ($\frac{1}{\sqrt{1+(\epsilon r)^2}}$), etc.

Thanks.

chikurin
  • 111
  • 2

1 Answers1

1

One way to picture the regression function above is that your $y_j$ are approximated by scaled basis functions sitting on the $c_i$. If you plot the basis functions, you can see that are not the same but fall of differently when $r$ (or $x$ in my case) is large. One possibility to compare them would be to consider \begin{align} \frac{1}{1+x^{2}} = \exp\left(\log\left(\frac{1}{1+x^{2}}\right)\right) = \exp\left(-\log\left(1+x^{2}\right)\right)\\ \frac{1}{\sqrt{1+x^{2}}} = \exp\left(-\frac{1}{2}\log\left(1+x^{2}\right)\right) \end{align}

For $x^2$ close to zero $\log(1+x^2)\approx x^2$ which tells you that, around zero, both function are pretty similar. However, due to the $\log$ in the exponent, the inverse quadratics fall of slower than the Gaussian. You can also see that there is only a factor $\frac{1}{2}$ difference between the inverse quadratic and the inverse multi-quadratic.

Another way of comparing the functions would be to look at their Fourier transforms. The more low frequency components they have, the smoother the functions look.

In the end, you have to decide what properties fit best to your data. If you don't know that you could always look at the prediction error in a cross-validation and choose the function with the lowest average error.

In general, the Gaussian functions are not a bad choice because the have universal approximation properties. This means that (with enough basis functions) you can approximate any continuous function arbitrarily close. I don't know whether this is true for the other functions as well.

fabee
  • 2,403
  • 13
  • 18
  • Thanks a lot! And may I ask if I am ristricted to use the Gaussian function and it just falls too fast (I mean that if I do not consider the $\epsilon$ but just use like $\epsilon = 1$ that is $e^{-t^2}$) and the matrix when solving least square is nearly sigular, then how I could choose a $\epsilon$ to make it better? I have read this one: http://stats.stackexchange.com/questions/17708/optimal-basis-for-regression-problem, but still not that clear. Sorry if the question has been too amateur. – chikurin Dec 21 '13 at 10:16
  • The matrix would be singular when the basis positions are the same because then at least two columns are the same. You can avoid that by choosing the basis positions differently. You can make the matrix always invertible by adding a small number (e.g. 1e-6) times the identity matrix to it. For choosing $\varepsilon$, I would try setting $2^n \frac{1}{\sigma}$ where $n=-3,-2,...3$ and $\sigma$ is the median pairwise distance between the $x_i$. This heuristic usually works well. You should definitely adjust $\varepsilon$ to your data to get good results. – fabee Dec 21 '13 at 21:13