11

Consider a linear regression model: $$ y_i = \mathbf x_i \cdot \boldsymbol \beta + \varepsilon _i, \, i=1,\ldots ,n, $$ where $\varepsilon _i \sim \mathcal L(0, b)$, that is, Laplace distribution with $0$ mean and $b$ scale parameter, are all are mutually independent. Consider a maximum likelihood estimation of unknown parameter $\boldsymbol \beta$: $$ -\log p(\mathbf y \mid \mathbf X, \boldsymbol \beta, b) = n\log (2b) + \frac 1b\sum _{i=1}^n |\mathbf x_i \cdot \boldsymbol \beta - y_i| $$ from which $$ \hat{\boldsymbol \beta}_{\mathrm {ML}} = {\arg\min }_{\boldsymbol \beta \in \mathbb R^m} \sum _{i=1}^n |\mathbf x_i \cdot \boldsymbol \beta - y_i| $$

How can one find a distribution of residuals $\mathbf y - \mathbf X\hat{\boldsymbol \beta}_{\mathrm {ML}}$ in this model?

Matthew Drury
  • 33,314
  • 2
  • 101
  • 132
nmerci
  • 489
  • 3
  • 13

2 Answers2

1

The residuals (actually called errors) are assumed to be randomly distributed with a double-exponential distribution (Laplace distribution). If you are fitting this x and y data points, do it numerically. You first calculate beta-hat_ML for these points as a whole using the formula you posted above. This will determine a line through the points. Then subtract each point's y value from the y value of the line at that x value. This is is the residual for that point. The residuals of all points can be used to construct a histogram that will give you the distribution of the residuals.

There is a good mathematical article on it by Yang (2014).

--Lee

merv
  • 137
  • 8
Lee G
  • 11
  • 1
0

I think this is equivalent to Robust Regression. In Robust Regression you minimize the 1-norm, instead of the 2-norm - and try to find ${\arg\min }_{\boldsymbol \beta \in \mathbb R^m} \sum _{i=1}^n |\mathbf x_i \cdot \boldsymbol \beta - y_i|$ as you wrote.

One way to solve it is to approximate the 1-norm with a surrogate, like the Huber loss: i.e. $h_\eta(x)=\sqrt {x^2+\eta}$ for some small smoothing parameter $\eta$. So now the loss is $\sum _{i=1}^n \sqrt{(\mathbf x_i \cdot \boldsymbol \beta - y_i)^2+\eta}$ and you can use something like Gradient-Descent on this (now differentiable) function.

Here's some code I wrote for an HW exercise in MATLAB:

fun_g = @(u) sum( sqrt (u.^ 2 + eta^ 2 ));
fun_f = @(w) fun_g(X*w-z);
grad_g = @(u) u.*( 1. /( sqrt (u.^ 2 + eta^ 2 )));
grad_f = @(w) X'*grad_g(X*w-z);
grad = grad_f(w);
while (norm(grad) > epsilon)
    w = w - t*grad;
    fun_val = fun_f(w);
    grad = grad_f(w);
    fprintf( 'iter_number = %3d norm_grad = %2.6f fun_val = %2.6f\n' ,iter,norm(grad),fun_val);
end
Maverick Meerkat
  • 2,147
  • 14
  • 27