Linear regression with Laplace errors

Question

Consider a linear regression model: $$ y_i = \mathbf x_i \cdot \boldsymbol \beta + \varepsilon _i, \, i=1,\ldots ,n, $$ where $\varepsilon _i \sim \mathcal L(0, b)$, that is, Laplace distribution with $0$ mean and $b$ scale parameter, are all are mutually independent. Consider a maximum likelihood estimation of unknown parameter $\boldsymbol \beta$: $$ -\log p(\mathbf y \mid \mathbf X, \boldsymbol \beta, b) = n\log (2b) + \frac 1b\sum _{i=1}^n |\mathbf x_i \cdot \boldsymbol \beta - y_i| $$ from which $$ \hat{\boldsymbol \beta}_{\mathrm {ML}} = {\arg\min }_{\boldsymbol \beta \in \mathbb R^m} \sum _{i=1}^n |\mathbf x_i \cdot \boldsymbol \beta - y_i| $$

How can one find a distribution of residuals $\mathbf y - \mathbf X\hat{\boldsymbol \beta}_{\mathrm {ML}}$ in this model?

Since residuals can be grouped in a random vector, I'd like to know its distribution. At least first two moments. — nmerci, Sep 30 '15 at 18:35
Got it, thanks! Have you considered simulating and plotting? — jlimahaverford, Sep 30 '15 at 18:39
Yes, I want to construct a confidence region for residuals. For instance, for Gaussian errors the region is an ellipsoid. — nmerci, Sep 30 '15 at 19:07

score 1 · Answer 1 · edited Nov 02 '19 at 06:49

The residuals (actually called errors) are assumed to be randomly distributed with a double-exponential distribution (Laplace distribution). If you are fitting this x and y data points, do it numerically. You first calculate beta-hat_ML for these points as a whole using the formula you posted above. This will determine a line through the points. Then subtract each point's y value from the y value of the line at that x value. This is is the residual for that point. The residuals of all points can be used to construct a histogram that will give you the distribution of the residuals.

There is a good mathematical article on it by Yang (2014).

--Lee

The link does not work. – Michael R. Chernick Jun 10 '17 at 16:12 — Michael R. Chernick, Jun 10 '17 at 16:12

score 0 · Answer 2 · answered Feb 22 '21 at 13:36

I think this is equivalent to Robust Regression. In Robust Regression you minimize the 1-norm, instead of the 2-norm - and try to find ${\arg\min }_{\boldsymbol \beta \in \mathbb R^m} \sum _{i=1}^n |\mathbf x_i \cdot \boldsymbol \beta - y_i|$ as you wrote.

One way to solve it is to approximate the 1-norm with a surrogate, like the Huber loss: i.e. $h_\eta(x)=\sqrt {x^2+\eta}$ for some small smoothing parameter $\eta$. So now the loss is $\sum _{i=1}^n \sqrt{(\mathbf x_i \cdot \boldsymbol \beta - y_i)^2+\eta}$ and you can use something like Gradient-Descent on this (now differentiable) function.

Here's some code I wrote for an HW exercise in MATLAB:

fun_g = @(u) sum( sqrt (u.^ 2 + eta^ 2 ));
fun_f = @(w) fun_g(X*w-z);
grad_g = @(u) u.*( 1. /( sqrt (u.^ 2 + eta^ 2 )));
grad_f = @(w) X'*grad_g(X*w-z);
grad = grad_f(w);
while (norm(grad) > epsilon)
    w = w - t*grad;
    fun_val = fun_f(w);
    grad = grad_f(w);
    fprintf( 'iter_number = %3d norm_grad = %2.6f fun_val = %2.6f\n' ,iter,norm(grad),fun_val);
end

Linear regression with Laplace errors

2 Answers2

Related