How to identify models as linear or non-linear?

Question

Identify the following models as linear or non-linear. In case of a non-linear model, reduce the model into a linear model by a suitable transformation.

$$\eqalign{ (a)\quad&y=\beta_0+\beta_1 x+\beta_2 x^2+e \\ (b)\quad&y=\frac{x}{\beta_0+\beta_1 x}+e\\ (c)\quad&y=\frac{\exp(\beta_0+\beta_1 x)}{1+\exp(\beta_0+\beta_1 x)}+e }$$

where $e\sim\mathcal{N}(0,\sigma^2).$

I know that the model of $(a)$ is linear. I think $(b)$ & $(c)$ are non-linear.How can I make them linear models?

Homework question should be tagged as such, because they receive a [special treatment](http://meta.stackexchange.com/questions/10811/how-to-ask-and-answer-homework-questions/10812#10812). Users who are always ready to assist you probably expect you to show what you tried to solve those questions. — chl, Aug 07 '12 at 20:13
Some caution is needed in interpreting this question and its results, because neither (b) nor (c) can be linearized with a transformation. This is because the nonlinear transformations needed to make the parameters enter linearly will affect the error distributions: it will make them non-normal with *nonzero* expectation and with spreads depending on the values of $x$ (heteroscedasticity) and the parameters (a subtle form of nonlinearity). — whuber, Aug 07 '12 at 20:36
@whuber In my answer I was deliberately ignoring the additive error. I was taking the OP to be describing the model by the functional form and made the transformations accordingly. This seemed to be obviously the intent. — Michael R. Chernick, Aug 07 '12 at 20:55

score 5 · Accepted Answer · answered Aug 07 '12 at 22:06

We try to find a re-expression $f$ for which $f(y)$ is a sum of two kinds of things. The first kind is a product of (1) something depending only on the data $x$ and (2) something depending only on the parameters $\beta_i$. The second kind of thing represents "random error." Typically we want the random error, at a minimum, to have an expectation of zero. It's even nicer (and usual to assume) that the error not depend on the data or the parameters and that it have a symmetric distribution.

(I am glossing over one delicate but tangential point: the parameter-dependent part of each term ought itself to be just a constant linear combination of parameters. However, in practice this often does not matter. For instance, although the model $y = \log(\beta)x + e$ depends nonlinearly on the parameter $\beta$, we can easily replace $\beta$ by $\gamma=\log(\beta)$, thereby effecting a reparameterization $y = \gamma x + e$ in which the new parameter $\gamma$ (a) appears in the desired (linear) form and (b) determines everything that could be inferred about the original parameter $\beta$.)

Model $(a)$ is already in this archetypal form, because each non-error term clearly is a product of a parameter and something depending only on the data ($x$ and $x^2$). The error $e$, with a $\mathcal{N}(0,\sigma^2)$ distribution, has an expectation of $0$, is distributed independently of $x$ and the $\beta_i$, and has a symmetric distribution. The point of this example is to emphasize that the dependence on the data can be non-linear, as evidenced by the $x^2$, without affecting the linearity of the model.

Model $(b)$ is not in this form, but we can see such a form in the denominator on the right hand side. Therefore we try to solve for $y$ algebraically, initially ignoring the errors (and making any additional assumptions required to make the algebra work, such as $x\ne 0$):

$$y = \frac{x}{\beta_0 + \beta_1 x} = \frac{1}{\beta_0/x + \beta_1}$$

implies

$$1/y = \beta_1 + \beta_0(1/x).$$

Each of the terms on the right hand side separates as a product of something depending only on the data ($1$ and $1/x$, respectively) and something depending only on the parameters ($\beta_1$ and $\beta_0$, respectively). Typically, when such an expression can be found, it is essentially unique. (You can often shift a constant factor around within each term; e.g., $\beta_0 \times 1/x = 0.5\beta_0 \times 2/x$, but that's an unimportant change.) Evidently, $f(y) = 1/y$ is how $y$ should be re-expressed.

Then, it is important to repeat the calculation with the error term included:

$$1/y = \frac{1}{\frac{x}{\beta_0+\beta_1 x}+e}.$$

This, unfortunately, does not simplify. By assuming the errors are small, we can attempt an approximate simplification by means of an approximation that is linear in the error:

$$1/y \approx \beta_1 + \beta_0/x + \left[-\left(\beta_1 + \beta_0/x\right)^2\right] e + O(e^2).$$

This now is in the desired form, with the error term equal to $-\left[\left(\beta_1 + \beta_0/x\right)^2\right] e$, more or less (neglecting terms of the size of $e^2$, which is typically $\sigma^2$). Because the expectation of $e$ is $0$, the expectation of this error term is (more or less) $0$. However, the dispersion of the error in $1/y$ equals $\left(\beta_1 + \beta_0/x\right)^4 \sigma^2$. This complicated expression depends essentially on both the data and the parameters, leading to a rather complicated--but linear--model. The distribution of this error is no longer normal (in fact, it is asymmetric).

After fitting such a model, one would certainly want to check the assumption required to make the approximation: namely, that $\sigma^2$ is small compared to $y$ for the fitted parameters and for all values of $x$ of interest. If this assumption does not hold, the transformation should be considered unsuccessful and nonlinear modeling (based on the original equation, for which the errors have a particularly nice form) would be advisable.

The analysis of model $(c)$ is similar. The algebraic manipulations change a little, and therefore the answer changes a little, but no new ideas are introduced.

This question is glib in that it blithely ignores these latter complications concerning the error terms. That's probably ok for an elementary, shallow introduction to the ideas of transformation, but it is important to get in the habit of introducing and tracking the error terms in one's models: that is a part of what "statistical thinking" and modeling are all about.

I deleted my answer. But now this is an answer to both parts a) and b) of the question. I of course get the point you make about the effect of the transformation on the additive error term. However, I am not sure that the problem necessarily meant to have an additive error term. i think it is there mainly to suggest that the model is observed with error without a particular concern for the form of the error term. Also we went into a logn discussion about my answer and whetehr or not I should take it down and then you put back up 2/3rds of that answer. — Michael R. Chernick, Aug 07 '12 at 22:28
@Michael I offered this reply as an example of how one might address questions that look like homework but are not explicitly tagged as such. It is here only as a reaction to your now-deleted post. As a moderator, rather than downvoting low-quality replies, I prefer to provide constructive examples of what a good answer might look like (understanding that my efforts are never perfect and can always be improved: take that as an invitation to post an even better answer!). — whuber, Aug 07 '12 at 22:33
It just appears that I was being criticized for answering a homework type problem and essentially asked to delete it and then you give a partial answer to the same question. I see nothing particularly low quality about my reply. It was a complete correct answwer to the question. As I said although the answer ignores the additive noise component the additive nature of the noise was not a serious part of the question. I have seen cases where the noise term was deliberately specified as multiplicative just so it would be additive after the log transformation. — Michael R. Chernick, Aug 08 '12 at 00:16
The intent of the problem was to transform to a linear model. — Michael R. Chernick, Aug 08 '12 at 00:17
@MichaelChernick: You are absolute right . Thank you for help. — Argha, Aug 08 '12 at 05:02
@Ranabir whuber did not give the answer to part 3 of your question. I did in my deleted answer. You can make the third case linear by using the logit transformation. Look up logit and figure out how logit(y) becomes linear in β$_0$ and β$_1$. — Michael R. Chernick, Aug 08 '12 at 15:55
@MichaelChernick: I observe that In part (c) I need to use $ln \frac{y}{1-y}$ to make that linear. — Argha, Aug 08 '12 at 15:58

brethvoice · Answer 2 · 2020-03-30T16:19:11.000

SHORT ANSWER: If (and only if) the statistical distribution of a model's noise (error) can be described using only linear combinations of observations, factors and/or predictors, that model is linear. Otherwise, it is not.

People often hear someone in an academic setting state that linearity is "in the parameters," which does get to the point of linearity. But what does that really mean?

The answer to this question requires a background in abstract algebra to understand. If a model has the following form:

$$ y = \beta x + \epsilon ,$$

this "normally" (pun intended) implies that its noise (a.k.a. the model error) has a Gaussian distribution with mean $y - \beta x$, e.g.:

$$ \epsilon \sim N(y - \beta x, 1) .$$

after proper "normalization" to obtain a variance of 1 (pun not intended). NOTE: the dimensions of $y$ and $x$ must be the same, and if they are not scalars, $x$ becomes a vector or matrix, usually written in uppercase as $X$ to avoid confusion.

The notation above usually implies that $\beta$ is a parameter (or vector/matrix of parameters). A parameter is not what is measured physically (and often transformed via arithmetic or algebra afterward); those quantities are in $x$.

Linearity is an algebraic concept, involving items called vectors and scalars (and every scalar is also a vector in an abstract sense, by the way). It means that you can form what are known as "linear combinations" of vectors and remain within the "vector space" where you began.

Therefore, if (and only if) you can parametrize your model in the format above such that each parameter in $\beta$ is a scalar, and such that the noise and each predictor in $x$ is a vector (sometimes 1-dimensional), then your model is linear. But what does it mean to be a scalar or a vector, you may ask?

A scalar is literally what it claims to be: something which causes a change in magnitude or scale. In the sense of abstract algebra, it specifically scales things called vectors--for example, real numbers are scalars, and they are also vectors because they can be scaled by any field of scalars, even themselves.

If a predictor quantity can be multiplied by a parameter scalar (shrunk or expanded) and can also be added to other predictor quantities (either with or without scalar multiplication before or afterward) then the predictor is a vector in a linear (vector) space. If you treat all your predictors as vectors and all your parameters as scalars, you can create linear combinations of them, and thereby (and in no other way) you can create a linear model.

Incidentally, I would love for anyone in this community to try writing a machine learning algorithm (model) as a linear combination of vector-like quantities. I believe it is impossible to do so, which is why non-linear regression is a thing. — brethvoice, Mar 20 '20 at 21:00
Also, please note that the statistical distribution of the errors does not need to be Gaussian (normal) in order for the above definition to apply. It is what is most commonly seen/assumed, however. — brethvoice, Mar 20 '20 at 21:01
By focusing solely on the "noise" (which you don't define), your answer seems overly restrictive and somewhat at odds with what many people consider linear models, as described (*inter alia*) in my post at https://stats.stackexchange.com/a/148713/919. — whuber, Mar 21 '20 at 12:28
@whuber in my answer I wrote noise (error). Please give more focused feedback on how I failed to define noise. Because all models are wrong, there will always be error; because some are useful, sometimes we can call that error "noise" (twisted version of https://quotesjournal.blogspot.com/2008/12/all-models-are-wrong-but-some-are.html). — brethvoice, Mar 30 '20 at 12:51
Since writing the above, I have decided that most machine learning models are non-linear because the activation functions, even if invertible, occur at the neuron or layer level of the model, rather than just at the output layer. Also, for any of you who are interested, I posted a link to this discussion on SO hoping to convince some of them to join CV: https://stackoverflow.com/a/60958735/12763497 — brethvoice, Mar 31 '20 at 19:51

How to identify models as linear or non-linear?

2 Answers2

Linked