Questions tagged [least-squares]

Refers to a general estimation technique that selects the parameter value to minimize the squared difference between two quantities, such as the observed value of a variable, and the expected value of that observation conditioned on the parameter value. Gaussian linear models are fit by least squares and least squares is the idea underlying the use of mean-squared-error (MSE) as a way of evaluating an estimator.

Overview

Formulation

Given a set of data $(x_1,y_1),...,(x_n,y_n)$ where $x_i \in \mathbb{R}^{p}$ and a vector of coefficients $\beta$, the least squares estimate is the solution to the equation:

$$\widehat{\beta}_{LS} = \underset{\beta} {\text{arg min}} \sum\limits_{i=1}^{n}(y_i - \sum\limits_{j=1}^{p}x_{i,j}\beta_{j})^2 = || {\bf y - X\beta}||^2$$

Using linear algebra, one can find the least squares hyperplane:

$$ {\bf \widehat{\beta} = (X^TX)^{-1}X^{T}y} $$

References

Least squares methods are treated in many introductory statistics resources and textbooks, but there are also advanced resources dedicated only to the subject, for example:

Data Analysis Using the Method of Least Squares by John Wolberg
Numerical Methods for Least Squares Problems by Åke Björck (for computational perspectives)

2460 questions

votes

5 answers

Mean absolute error OR root mean squared error?

Why use Root Mean Squared Error (RMSE) instead of Mean Absolute Error (MAE)?? Hi I've been investigating the error generated in a calculation - I initially calculated the error as a Root Mean Normalised Squared Error. Looking a little closer, I…

asked Jan 22 '13 at 17:11

user1665220

1,105
1
8
6

votes

2 answers

When to use regularization methods for regression?

In what circumstances should one consider using regularization methods (ridge, lasso or least angles regression) instead of OLS? In case this helps steer the discussion, my main interest is improving predictive accuracy.

regression least-squares lasso ridge-regression fused-lasso

asked Nov 06 '10 at 17:53

NPE

5,351
5
33
44

votes

3 answers

Maximum likelihood method vs. least squares method

What is the main difference between maximum likelihood estimation (MLE) vs. least squares estimaton (LSE) ? Why can't we use MLE for predicting $y$ values in linear regression and vice versa? Any help on this topic will be greatly appreciated.

regression estimation maximum-likelihood least-squares

asked Mar 27 '15 at 14:54

evros

votes

3 answers

Why does ridge estimate become better than OLS by adding a constant to the diagonal?

I understand that the ridge regression estimate is the $\beta$ that minimizes residual sum of square and a penalty on the size of $\beta$ $$\beta_\mathrm{ridge} = (\lambda I_D + X'X)^{-1}X'y = \operatorname{argmin}\big[ \text{RSS} + \lambda…

regression least-squares ridge-regression regularization

asked Oct 11 '14 at 18:52

Heisenberg

4,239
3
23
54

votes

5 answers

How to derive the ridge regression solution?

I am having some issues with the derivation of the solution for ridge regression. I know the regression solution without the regularization term: $$\beta = (X^TX)^{-1}X^Ty.$$ But after adding the L2 term $\lambda\|\beta\|_2^2$ to the cost function,…

regression least-squares regularization ridge-regression

asked Sep 04 '13 at 15:49

user34790

6,049
6
42
64

votes

5 answers

Regression when the OLS residuals are not normally distributed

There are several threads on this site discussing how to determine if the OLS residuals are asymptotically normally distributed. Another way to evaluate the normality of the residuals with R code is provided in this excellent answer. This is another…

regression least-squares residuals assumptions normality-assumption

asked Jun 03 '12 at 13:24

Robert Kubrick

4,078
8
38
55

votes

3 answers

Where does the misconception that Y must be normally distributed come from?

Seemingly reputable sources claim that the dependent variable must be normally distributed: Model assumptions: $Y$ is normally distributed, errors are normally distributed, $e_i \sim N(0,\sigma^2)$, and independent, and $X$ is fixed, and …

regression least-squares linear-model dependent-variable

asked Apr 25 '18 at 20:14

colorlace

1,010
11
25

votes

4 answers

Why sigmoid function instead of anything else?

Why is the de-facto standard sigmoid function, $\frac{1}{1+e^{-x}}$, so popular in (non-deep) neural-networks and logistic regression? Why don't we use many of the other derivable functions, with faster computation time or slower decay (so…

logistic neural-networks least-squares

asked Jul 24 '15 at 11:14

Mark Horvath

votes

6 answers

What algorithm is used in linear regression?

I usually hear about "ordinary least squares". Is that the most widely used algorithm used for linear regression? Are there reasons to use a different one?

regression least-squares algorithms computational-statistics numerics

asked Aug 18 '10 at 13:30

Belmont

1,273
3
12
16

votes

5 answers

Is minimizing squared error equivalent to minimizing absolute error? Why squared error is more popular than the latter?

When we conduct linear regression $y=ax+b$ to fit a bunch of data points $(x_1,y_1),(x_2,y_2),...,(x_n,y_n)$, the classic approach minimizes the squared error. I have long been puzzled by a question that will minimizing the squared error yield the…

least-squares error

asked Apr 18 '15 at 02:17

Tony

1,583
4
15
20

votes

6 answers

Why don't linear regression assumptions matter in machine learning?

When I learned linear regression in my statistics class, we are asked to check for a few assumptions which need to be true for linear regression to make sense. I won't delve deep into those assumptions, however, these assumptions don't appear when…

regression machine-learning mathematical-statistics least-squares

asked Sep 09 '20 at 01:10

kamal tanwar

votes

8 answers

Is it valid to include a baseline measure as control variable when testing the effect of an independent variable on change scores?

I am attempting to run an OLS regression: DV: Change in weight over a year (initial weight - end weight) IV: Whether or not you exercise. However, it seems reasonable that heavier people will lose more weight per unit of exercise than thinner…

regression repeated-measures least-squares change-scores

asked Sep 18 '11 at 04:22

ChrisStata

votes

1 answer

Proof that the coefficients in an OLS model follow a t-distribution with (n-k) degrees of freedom

Background Suppose we have an Ordinary Least Squares model where we have $k$ coefficients in our regression model, $$\mathbf{y}=\mathbf{X}\mathbf{\beta} + \mathbf{\epsilon}$$ where $\mathbf{\beta}$ is an $(k\times1)$ vector of coefficients,…

regression linear-model least-squares t-distribution

asked Oct 01 '14 at 01:12

Garrett

votes

3 answers

Why is RSS distributed chi square times n-p?

I would like to understand why, under the OLS model, the RSS (residual sum of squares) is distributed $$\chi^2\cdot (n-p)$$ ($p$ being the number of parameters in the model, $n$ the number of observations). I apologize for asking such a basic…

regression distributions least-squares

asked Dec 25 '11 at 15:10

Tal Galili

19,935
32
133
195

votes

4 answers

Why squared residuals instead of absolute residuals in OLS estimation?

Why are we using the squared residuals instead of the absolute residuals in OLS estimation? My idea was that we use the square of the error values, so that residuals below the fitted line (which are then negative), would still have to be able to be…

regression estimation least-squares residuals

asked Dec 16 '12 at 12:17

PascalVKooten

2,127
5
22
34

2 3

…

99 100 Next