4

Suppose there is a function $f(a,b,c,\ldots)$ of $M$ variables (fixed numbers, not random variables). Add some Gaussian noise to this function:

$$ g(a,b,c,\ldots) = f(a,b,c,\ldots) + \varepsilon(a,b,c,\ldots) $$

where $\varepsilon(a,b,c,\ldots) \sim N(0,\sigma_{a,b,c,\ldots}{}^2)$ are the Gaussian noise parameters. The $\sigma$ are set so that the noise is large compared to the function value, and the standard deviation of the noise depends on the input parameters (heteroskedastic).

Now suppose that I don't know $f$ or $\sigma$, but I have a large number $N$ of realisations of $g$ and each realisation has different input parameters $a,b,c,\ldots$. I am interested in estimating $f$.

If I was doing this parametrically, I could assume that $f$ is some kind of polynomial and use a regression algorithm with least squares regression. This is because the Gaussian errors "cancel out" on average because they are independent.

Is there a non-parametric approach (or semi-parametric approach) to estimate the same thing? What approach do people take in practice?

gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650
  • I'm not sure that including a polynomial makes something parametric. I usually think of parametric as having parameters of a known distribution, such as the normal, and nonparametrics don't assume such a distribution, perhaps using ranks. What do you mean by "non-parametric" here? Would quantile regression or ordinal logistic regression (both of which could have polynomial expansions of X) count? Why do you want non-parametric? Are you just worried about the heteroscedasticity? – gung - Reinstate Monica Dec 15 '14 at 22:43
  • Fair point. In my field, "parametric" is taken to mean that you define a model (in this case a polynomial) and fit the data to that model. I would define something like interpolation to be non-parametric because no model is used. I realise that this probably isn't self consistent. – Bob Mortimer Dec 15 '14 at 22:54
  • Regarding the rest of your comment, the important point is that under the assumption of independent normal residuals least squares regression "cancels out" the noise. Would the same hold true for quantile regression/ordinal logisitc regression? – Bob Mortimer Dec 15 '14 at 22:56
  • Most standard models assume independence. A typical linear model is unbiased, even if the residuals are not independent (or are heteroscedastic), though. The problem would be with the tests / p-values of those models. For that there are heteroscedasticity (& potentially autocorrelation) consistent 'sandwich' errors. But I don't know enough yet about your thinking to know if that would be (part of) the answer to your question. – gung - Reinstate Monica Dec 15 '14 at 23:00
  • Can you say more about what would constitute a "non-parametric" model for you & why you want one? – gung - Reinstate Monica Dec 15 '14 at 23:02
  • I'd be happy to accept any answer which is an alternative to a polynomial, and which would be accepted good practice for this problem. The problem with polynomials is that they are quite stiff, so for example difficulty fitting to $f$ in one region can affect the fit in another region. – Bob Mortimer Dec 15 '14 at 23:03
  • That helps to clarify. Would you mind editing your question to describe alternatives to polynomials due to the problem of regions affecting each other? – gung - Reinstate Monica Dec 15 '14 at 23:30

1 Answers1

1

Using polynomials to fit curves is a standard feature of stats 102. However, for anything beyond the simplest and best behaved curves, polynomials are a poor choice (as you note). A better strategy is to use cubic splines. Your predictor space is divided into regions with a function fit to the entire space and additional functions fit to the region greater than the border (or 'knot'). Consider a case with only one $X$ variable / dimension fit with a linear spline. Let's say the range of $X$ is partitioned at $X = .7$, then:
$$ X_{\rm spline} = \begin{cases} 0\quad &\text{if } X\le{.7} \\ X-.7\quad &\text{if } X>.7 \end{cases} $$ Then a multiple regression model is fit using the two variables $X$ and $X_{\rm spline}$. This is quite rudimentary, of course. It would be more typical to have, say, 5 knots instead of one, and you can fit polynomials (typically cubics) for each variable. This is a very powerful and flexible strategy for function approximation. Moreover, all the benefits of linear models come naturally with this approach. I have more information about this here: What are the advantages / disadvantages of using splines, smoothed splines, and Gaussian process emulators?

gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650
  • How does this technique work for functions $f$ which are non-separable, eg. $f(x,y,z)=xyz$? I can see how to divide the predictor space into partitions, but would this mean that that the result wouldn't be guaranteed to be continuous? – Bob Mortimer Dec 19 '14 at 14:54
  • @BobMortimer, it depends on how you do it. A simple linear spline (as shown above) would show a sharp break, but a cubic spline is constrained to have identical 1st & 2nd derivatives on both sides of the knot. – gung - Reinstate Monica Dec 19 '14 at 16:26
  • Just to be clear, let's suppose we have the (unknown) function $f(x,y,z)=xyz$. Are you proposing the following: Partition the 3d variable space into a number of cubes Fit a regression model to each cube. Presumably this is of the form $F=X+Y+Z$? Or what other predictor would I use Impose constraints on the regression so that the resulting curve was continuous – Bob Mortimer Dec 23 '14 at 12:04