Weighting loss function using Voronoi-tessellation of response space

Question

Let's say you have some Real-valued features $\mathbf{X}$ and Real-valued univariate responses $\mathbf{y}$. We want to fit a regression model to this data:

$$\mathbf{y} = f\left(\mathbf{X};\beta\right) + \mathbf{\varepsilon}$$

Where fitting is done by minimizing some loss function $L(f(x;\beta),y)$:

$$\hat{\beta} = \min_{\beta} \sum_i L(f(x_i;\beta),y_i)$$

I noticed that although $L$ is a function of $\mathbf{X}$ and the true responses $\mathbf{y}$, it is not sensitive to the distribution of $\mathbf{y}$ or $\mathbf{X}$. The loss assigned to each point does not appear to depend on the distribution of points around it.

I'd like to weight the loss for each training point by a scalar $s_i$ to reflect the "novelty" of each $y_i \in \mathbf{y}$. I was thinking of using a 1-d Voronoi tessellation on $\mathbf{y}$ (with the extreme points serving as outer bounds/convex hull for the tessellation) and setting $s_i$ to be the length of the Voronoi cell associated with $y_i$, such that the new, weighted loss function becomes:

$$L_w(f(x_i;\beta),y_i) = s_iL(f(x_i;\beta),y_i)$$

I'd be appreciative of any references to the statistical properties of this approach, whether applied to the response space (as above) or to the feature space (provided s suitable Voronoi tessellation can be developed for the feature space)

I've never seen the weighting that you propose, but while it is true that the loss at a single point has nothing to do with the distribution of $(X, Y)$, this is not true about the average. For a fixed $f$, $\frac 1n \sum_{i=1}^n L(y_i, f(x_i)) \to_p E_P(L(Y, f(X)))$ where $P$ is the joint distribution of $(X, Y)$, and under certain conditions we'll get that the minimizer of the empirical risk converges on the minimizer of the true risk, and these statements are all extremely related to the distribution of the data — jld, Jun 08 '17 at 14:44
@Chaconne you are correct, and I apologize if my post suggested that my "correction" would somehow make it a more accurate estimator of the expected loss (which, as you point out, is usually reflected by the sample distribution). My estimator will be a purposefully *biased* estimate of the expected loss (i.e., Risk) with the intent of ensuring that "novel" points in response space are accounted for. It's much like an unbalanced classification problem, where we forgo unconditional accuracy for some other measure of classification error. — , Jun 08 '17 at 14:49

Weighting loss function using Voronoi-tessellation of response space

0 Answers0