2

Given a regression setting with covariates $X_{n \times m}$ and response $Y_{n \times p}$ where $p>1$, i.e the responses are vector-valued or multivariate, is there a Nadaraya-Watson estimator for kernel regression in this setting?

This boils down to how the following can be computed with this form of $Y$ :

$$\frac{\sum_{i=1}^{n}K_h(x-x_i)y_i}{\sum_{i=1}^{n}K_h(x-x_i)}$$

But since above, $y_i$ is now multivariate as well, what happens to this multiplication operation in the numerator, in this generalization to multivariate responses?

hearse
  • 2,355
  • 1
  • 17
  • 30

1 Answers1

1

The mathematical operation here allows also the use of a vector instead of a scalar. Think of it as a weighted sum of vectors:

$$\frac{\sum_i w_i \mathbf y_i}{\sum_k w_k} = \sum_i \left(\frac{w_i}{\sum_k w_k} \right) \mathbf y_i = \sum_i \tilde w_i \mathbf y_i$$

where the coefficients are given in terms of kernel functions

$$ w_i = K_h(\mathbf x-\mathbf x_i)\\[1em] \Rightarrow \quad\tilde w_i = \frac{K_h(\mathbf x-\mathbf x_i)}{\sum_{k=1}^{n}K_h(\mathbf x-\mathbf x_k)}$$

With this, multivariate Nadaraya-Watson kernel regression simply boils down to a one-dimensional regression in each dimension.

davidhigh
  • 1,260
  • 9
  • 20
  • Awesome. i previously thought it would be a contradiction if we did multiple 1-d regressions as all dimensions may not have equally useful information. But it's good to know, that this is not a detrimental answer-and the right answer indeed. Thanks! – hearse Nov 26 '14 at 03:26
  • @PraneethVepakomma: still, it was slightly misleading (--whereas I think you got it right). I made a mistake the definition of the coefficients. Actually it was correct as the normalization factor cancels out, but now its more convenient. – davidhigh Nov 26 '14 at 08:47
  • how is the bandwidth h estimated here, in this case? Do you know of R-packages that implement this for multivariate Y? – hearse Nov 26 '14 at 18:10