Questions tagged [linear]

For statistical topics which involve the assumption of linearity, for example, linear regression or linear mixed models, or for the discussion of linear algebra as applied to statistics.

A linear function (in the sense of calculus) is one of the form $$f(x)=ax+b $$ where $a$ and $b$ are constants. The constant $a$ is the slope and the constant $b$ is the intercept of the function. This type of linear function occurs, for example, in linear regression.

In linear algebra, a linear function is a map $f$ between two vector spaces which is compatible with addition and scalar multiplication: $$f(x+y) = f(x) + f(y), \quad \quad f(ax) = a f(x), $$ where $x$ and $y$ denote elements of a vector space and $a$ denotes a scalar.

Note: the first type of linear function mentioned above is closely related to the second kind, being a special case of what is called (in linear algebra) an affine function.

1256 questions
37
votes
3 answers

Linearity of PCA

PCA is considered a linear procedure, however: $$\mathrm{PCA}(X)\neq \mathrm{PCA}(X_1)+\mathrm{PCA}(X_2)+\ldots+\mathrm{PCA}(X_n),$$ where $X=X_1+X_2+\ldots+X_n$. This is to say that the eigenvectors obtained by the PCAs on the data matrices $X_i$…
AlphaOmega
  • 667
  • 7
  • 13
22
votes
4 answers

Linear regression what does the F statistic, R squared and residual standard error tell us?

I'm really confused about the difference in meaning regarding the context of linear regression of the following terms: F statistic R squared Residual standard error I found this webstie which gave me great insight in the different terms involved…
21
votes
3 answers

Is a decision stump a linear model?

Decision stump is a decision tree with only one split. It can also be written as a piecewise function. For example, assume $x$ is a vector, and $x_1$ is the first component of $x$, in regression setting, some decision stump can be $f(x)=…
Haitao Du
  • 32,885
  • 17
  • 118
  • 213
20
votes
3 answers

Are linear regression and least squares regression necessarily the same thing?

I saw a thread about this but they seem to have got caught up and dive into statistical theory, trying to explain different things than this concept. So can anyone explain the difference between these two regressions in a simple way?
Atilla Colak
  • 385
  • 1
  • 8
18
votes
5 answers

Why Normality assumption in linear regression

My question is very simple: why we choose normal as the distribution that error term follows in the assumption of linear regression? Why we don't choose others like uniform, t or whatever?
18
votes
3 answers

How to run linear regression in a parallel/distributed way for big data setting?

I am working on a very large linear regression problem, with data size so large that they have to be stored on a cluster of machines. It will be way too big to aggregate all the samples into one single machine's memory (even disk) To do regression…
James Bond
  • 381
  • 1
  • 3
  • 9
18
votes
1 answer

In multiple linear regression, why does a plot of predicted points not lie in a straight line?

I'm using multiple linear regression to describe relationships between Y and X1,X2. From theory I understood that multiple regression assumes linear relationships between Y and each of X (Y and X1, Y and X2). I'm not using any transformation of…
Klausos
  • 499
  • 1
  • 6
  • 11
17
votes
4 answers

What is the best programmatic way for determining whether two variables are linearly or non-linearly or not even related

What is the best programmatic way for determining whether two predictor variables are linearly or non-linearly or not even related, maybe using any of the packages scipy/statsmodels or anything else in python. I know about the ways like plotting and…
17
votes
2 answers

Why linear regression has assumption on residual but generalized linear model has assumptions on response?

Why linear regression and Generalized Model have inconsistent assumptions? In linear regression, we assume residual comes form Gaussian In other regression (logistic regression, poison regression), we assume response comes form some distribution…
Haitao Du
  • 32,885
  • 17
  • 118
  • 213
16
votes
5 answers

Why does linear regression use a cost function based on the vertical distance between the hypothesis and the input data point?

Let’s say we have the input (predictor) and output (response) data points A, B, C, D, E and we want to fit a line through the points. This is a simple problem to illustrate the question, but can be extended to higher dimensions as well. Problem…
alpha_989
  • 283
  • 3
  • 10
15
votes
6 answers

Linear regression when Y is bounded and discrete

The question is straightforward: Is it appropriate to use linear regression when Y is bounded and discrete (e.g. the test score 1~100, some pre-defined ranking 1~17)? In this case, is it "not good" to use linear regression, or it's totally wrong to…
15
votes
2 answers

Why does Covariance measure only Linear dependence?

1) What is meant by linear dependence? 2) How can I convince myself that covariance measures linear dependence? 3) How I can convince myself that non-linear dependence is not measured by covariance?
ColorStatistics
  • 2,699
  • 1
  • 10
  • 26
15
votes
1 answer

Why do we call the equations of least square estimation in linear regression the *normal equations*?

When we want to estimate parameters of linear regression, we make normal equations as many as the linear model contain number of unknowns. Why are these equation called normal equations?
Rashid Munir
  • 161
  • 3
15
votes
1 answer

Other unbiased estimators than the BLUE (OLS solution) for linear models

For a linear model the OLS solution provides the best linear unbiased estimator for the parameters. Of course we can trade in a bias for lower variance, e.g. ridge regression. But my question is regarding having no bias. Are there any other…
Gumeo
  • 3,551
  • 1
  • 21
  • 31
14
votes
5 answers

Is linear regression obsolete?

I am currently in a linear regression class, but I can't shake the feeling that what I am learning is no longer relevant in either modern statistics or machine learning. Why is so much time spent on doing inference on simple or multiple linear…
Anonymous Emu
  • 548
  • 3
  • 13
1
2 3
83 84