5

Given i.i.d samples (x1,y1), ... (xn,yn) such that yi = f0(xi) + $\epsilon$i, i = 1,... n for some f0

Suppose I want an estimate $\hat{f}$ of f0 using k-nearest-neighbors regression in the neighborhood of each xi in my dataset. So for each xi, I must search for the k nearest neighboring elements and take the average of the set of all yj such that j $\in$ $\mathcal{N}$k(xi) where $\mathcal{N}$k(x) contains the k nearest points of x:

$$\hat{f}(x_i) = \frac{1}{k}\sum_{j\in\mathcal{N}_k(x_i)} y_j$$

Now if my xi are all evenly spaced, then I could simply sort them in ascending order and calculate a moving average over corresponding elements in y with window size k. My question is: Will this moving average be approximately equivalent to k-nearest neighbors regression even if (x1, ... xn) are not evenly spaced? Are there any tests I can do on the distribution P(x) to check the quality of approximation?

Moss Murderer
  • 739
  • 4
  • 12
  • 1
    *Moving average model* is a fixed notion which is quite different from *moving average* in general – see [this](https://stats.meta.stackexchange.com/questions/4953/confusing-moving-average-tag-split-into-two) and perhaps edit the title. – Richard Hardy Oct 24 '17 at 06:13
  • Is $x_i$ going to be the midpoint of the window? – Cagdas Ozgenc Oct 24 '17 at 07:49
  • Yes, sorry I left that detail out. $x_i$ will be the middle element/median of the window. Of course, if ($x_1$, ... $x_n$) are evenly spaced, then $x_i$ will also be the mean of the window. – Moss Murderer Oct 24 '17 at 07:56
  • Is the endgoal to gain some computation speed? Are there a lot of data points? I mean why do want to avoid the nearest neighbor? – Cagdas Ozgenc Oct 24 '17 at 07:58
  • Yes. There are 300,000 observations in my dataset so my question was partly motivated by the need to speed it up. However, I am curious to know if I can use moving average as a general strategy because the results looked very similar - at least in my one dataset. – Moss Murderer Oct 24 '17 at 08:04

0 Answers0