What about linearity makes it so useful?

Question

Among all areas of mathematics, linear algebra is incredibly well understood. I have heard it said that the only problems we can really solve in math are linear problems- and that much of the rest of math involves reducing problems to linear algebra?

But, what about the exchange of a "multiplication" operation and an "addition" operation is nice? Why is this interchange desirable, and why - among the many possible properties to specify, is linearity so important?

Specifically, I am looking for:

An idea of why the exchange that linearity allows is so powerful - whether appealing to some categorical or other argument about why these particular rules are so powerful
An idea of why linear problems, or linearization, shows up so frequently

Through linear algebra, you can take a vector and pass it through different dimensions. Also, it is the best way to connect with geometry, since we all can see geometry, but we can't see abstract algebra or real analysis, neither any of them can be connected to geometry. Here lies the beauty of linear algebra. — Anik Bhowmick, Aug 04 '18 at 05:26
An old professor said to me, “The reason we do *linear* algebra is that it’s the kind we know how to do!” Besides that, the whole point of differential calculus is that anything changing smoothly is *locally* linear, making it accessible to the only kind of algebra we’re really very good at! — G Tony Jacobs, Aug 04 '18 at 05:31
Perhaps I was not as clear as I thought I was in my original question. When you say, "...An old professor said to me, 'The reason we do linear algebra is that it’s the kind we know how to do!'" my question comes down to "What is it about the property of linearity that MAKES it so that we can do math with linear things and not with non-linear things?" We have all manner of reasonable conditions we put on functions in mathematics. Of all those conditions why is THIS condition of linearity so powerful that we can completely do linear algebra? — msm, Aug 05 '18 at 20:12
I knew my comment didn’t answer your question, which is why I didn’t post it as an answer. Now that I’ve posted an answer, would you say *it* addresses what you’re asking about? — G Tony Jacobs, Aug 06 '18 at 17:49
Regarding 2), linear problems show up so frequently because the fundamental strategy of calculus is to replace a complicated nonlinear function $f(x)$ with its local linear approximation $L(x) = f(a) + f'(a)(x - a)$. (The approximation is good when $x$ is near $a$.) This strategy is used all the time throughout math. This does not answer question 1), but if you accept that linear transformations are easy to understand then that helps to explain why the fundamental strategy of calculus is so effective. — littleO, Apr 26 '20 at 19:35
It's easy to deal with. Linearity seems hard wired with us. We love it. So linear approximations whenever the tolerances permit! — Allawonder, Apr 26 '20 at 19:39

G Tony Jacobs · Answer 1 · 2018-08-15T23:13:22.710

We work with fields of numbers, such as $\Bbb Q$, the field of rational numbers, $\Bbb R$, the field of real numbers, and $\Bbb C$, the field of complex numbers. What is a field? It's a set in which two invertible operations - addition and multiplication - interact. Elementary algebra is simply the study of that interaction.

What's a linear function defined on one of these fields? It's a function that is compatible with the two operations. If $f(x+y)=f(x)+f(y)$, and $f(cx)=cf(x)$, then the whole domain, before and after applying $f$, is structurally preserved. (That's as long as $f$ is invertible; I'm glossing over some details.) Essentially, such a function is simply taking the field and scaling it, possibly flipping it around as well. In the complex field, the picture is a little more.... complex, but fundamentally the same.

The most intuitive vector spaces - finite dimensional ones over our familiar fields - are basically just multiple copies of the base field, set at "right angles" to each other. Invertible linear functions now just scale, reflect, rotate and shear this basic picture, but they preserve the algebraic structure of the space.

Now, we often work with transformations that do more complicated things that this, but if they are smooth transformations, then they "look like" linear transformations when you "zoom in" at any point. To analyze something complicated, you have to simplify it in some way, and a good way to simplify working with some weird non-linear transformation is to describe and study the linear transformations that it "looks like" up close.

This is why we see linear problems arise so frequently. Some situations are modeled by linear transformations, and that's great. However, even situations modeled by non-linear transformations are often approximated with appropriate linear maps. The first and roughest way to approximate a function is with a constant, but we don't get a lot of mileage out of that. The next fancier approach is the approximate with a linear function at each point, and we do get a lot of mileage out of that. If you want to do better, you can use a quadratic approximation. These are great for describing, for instance, critical points of multi-variable functions. Even the quadratic description, however, uses tools from linear algebra.

Edit: I've thought more about this, and I think I can speak further to your question, from comments, "why does the property of linearity make linear functions so "rigid"?"

Consider restricting a linear function on $\Bbb R$ to the integers. The integers are a nice, evenly spaced, discrete subset of $\Bbb R$. After applying a linear map, their image is still a nice, evenly spaced, discrete subset of $\Bbb R$. Take all the points with integer coordinates in $\Bbb R^2$ or $\Bbb R^3$, and the same thing is true. You start with evenly spaced points all in straight lines, and after applying a linear map, you still have evenly spaced points, all in straight lines. Linear maps preserve lattices, in a sense, and that's precisely because they preserve addition and scalar multiplication. Keeping evenly spaced things evenly spaced, and keeping straight lines straight, seems to be a pretty good description of "rigidity".

Does that help at all?

Your answer does not exactly answer my question. There are all manner of properties we ask functions to obey. Like you say, a field has two invertible operations, and that is one choice of axioms. Among possible properties, why does the property of linearity make linear functions so "rigid," in that we can essentially classify all linear functions? What about this property makes linear math "easy" in some sense? Continuity is another nice condition, but continuous functions can be pretty nasty. What about linearity makes it so that we can have an algorithm to find essentially whatever we want? — msm, Aug 10 '18 at 18:27
Huh... I'm not sure what to say that I haven't already said. Why does preserving addition and multiplication make functions so "rigid"? Well, if you preserve those, then you preserve the entire structure of arithmetic with real numbers. A linear function can't twist and bend the field; it just moves it around, rigidly, because all of arithmetic has to pass through it without breaking. I wonder if you're missing something in thinking of our two operations as "one choice of axioms", rather than appreciating that these two operations capture all of our intuitions about number. — G Tony Jacobs, Aug 10 '18 at 19:03
Continuous functions keep real numbers "close together" that start out "close together", but that's all relative and allowed to vary wildly over the domain. Continuous functions can take your original field and stretch it out beyond all bounds, or wad it into a ball.... linear functions literally just move it around, and maybe zoom in and out. However, once you know what a linear function does around one generic element, you know what it does everywhere. Why? Because you can get from that generic spot to anywhere else, using addition and multiplication! — G Tony Jacobs, Aug 10 '18 at 19:07
@msm, I've added a bit more to my answer, to address your concern about "rigidity". — G Tony Jacobs, Aug 15 '18 at 23:13

score 3 · Answer 2 · answered Aug 16 '18 at 00:51

I'll give my two cents, from an applied perspective: what makes linearity so powerful is that linear operations are easily invertible. Many, many, many problems in mathematics boil down to having to solve for $x$ in some relation of the form $$ y = f(x). $$ There is of course no general method of computing $x = f^{-1}(y)$ for arbitrary $f$, but if $f$ is linear, i.e. $$ y = Ax $$ for some matrix $A$, if $A$ is invertible, then we can simply do some arithmetic and compute $$ x = A^{-1}y. $$ Even if the the problem is overconstrained and there is no exact solution, we can still use linear algebra to compute a pseudo-inverse: $$ x = (A^TA)^{-1}A^Ty, $$ and get the least-squares solution (the best we can hope for) minimizing $\|y-Ax\|_2^2$.

Maybe this begs the question "well then why can we invert linear functions so easily?" but I think this can be explained by the fact that field addition and multiplication are invertible, by definition, and linear maps are composed of nothing but addition and multiplication. It seems pretty natural to me that transformations composed of the fundamentally invertible field operations $(+, \cdot)$ will be invertible by e.g. back-substitution (in non-degenerate cases of course). Note that linear algebra is ubiquitous in applications, while module theory is not --- the only difference is that a module's scalar multiplication is not invertible!

I think this line of reasoning also addresses the question:

An idea of why linear problems, or linearization, shows up so frequently

The reason is that we frequently need to invert things, and often the only way to go about that is to compute a linear approximation and invert that. Two examples:

The extended Kalman filter takes a state update and observation model \begin{align*}x_k &= f(x_{k-1}) + w_k \\ z_k &= h(x_k) + v_k \end{align*} and linearizes it to \begin{align*} x_k &= Fx_{k-1} + w_k \\ z_k &= H x_k + v_k \end{align*} with $F = df$, $H = dh$, which makes it possible to compute the Kalman gain, which requires an inversion: $$ K_k = P_{k|k-1} H_k^T(H_k P_{k|k-1}H_k^T + R_k)^{-1}. $$
Newton's method in optimization requires solving $$ \frac{\partial}{\partial \delta} f(x_k + \delta) = 0 $$ for $\delta$ at each step. Taking a quadratic approximation $m_k(\delta) = f_k + \nabla f_k^T \delta + \delta^T \nabla^2f_k \delta$ makes this equation linear, and we are able to invert and solve for the optimal step: $$ \delta = - (\nabla^2 f_k)^{-1} \nabla f_k. $$

David G. Stork · Answer 3 · 2018-08-04T04:29:48.297

1

Linear problems are so very useful because they describe well small deviations, displacements, signals, etc., and because they admit single solutions. For sufficiently small $x$, $f(x) = a_0 + a_1 x + a_2 x^2 + a_3 x^3 + \ldots$ can be very well approximated by $a_0 + a_1 x$. Even the simplest nonlinear equation, $x^2 - 3 = 0$ has two real solutions, making analysis more difficult. Linear equations have a single solution (if one exists).

edited Aug 04 '18 at 04:29

answered Aug 04 '18 at 04:28

David G. Stork

28,486
5
29
54

3

There may also be infinity many solutions, e.g. $Ax = 0$ if $A$ has nontrivial nullspace. – Aug 04 '18 at 04:29

Andres Mejia · Answer 4 · 2018-08-15T23:39:06.697

I think this question misses the point slightly. It's not about why the axioms of linearity make it so well understood, it's more so (in my view) about why such simple operations fully characterize what we think of as linear.

Here is a pretty fluffy answer, but something that might help(?)

There are two ways to understand why linear functions are so desirable.

For a linear function, where you are headed is not determined by where you are.

That is, say for optimization, if you are before a peak and after a peak, it's unclear whether or not you should increase or decrease your parameter. This is not so for linear functions, and unlike other functions, the cost of increasing your parameter ($\Delta x$) does not depend on where you are either.

This is succinctly the fact that $f(x+\Delta x)=f(x)+f(\Delta x)$ for all choices of $x$.

It makes sense then that we would use linear functions for a variety of problems to decide what is going on locally and this is the key insight of the derivative.

Linear transformations have a nontrivial and easy to describe geometry.

This is not any more difficult than why one computes riemann integrals (which is a linearization in its own right) by looking at the area of squares, or decides the angle of intersection by intersecting tangent lines. Linear things just have a clear geometry that well approximates large classes of objects. This geometry is one that can scale too, which is essential to our geometric pictures. If we transform something, the picture should not depend on a choice of co-ordinate axes, which is to say that $kf(x)=f(kx)$.

Dimension plays a different role in (finite dimensional) linear algebra that is, when things behave linearly, the linear geometry (and math) of $\mathbb R^4$ is not that different from $\mathbb R^3$. This is basically a consequence of the fact that $V \cong k^n$ for the ground field $k$, where the latter is read as a $k$-vector space. This basically imposes a certain homogeneity on a vector space (and linear functions on it) since they are "homogeneous" as you go up in dimension: essentially more copies of $k$.

Basically: if you solve a problem in a given dimension for one vector space, you got it for all vector paces, as long as you can transform one into the other.

score 1 · Answer 5 · answered Apr 26 '20 at 19:27

This only addresses your first question.

First, I think it's worth saying that "linearity" can be very complicated. Linear algebra is so pleasant because the maps involved are linear over a field. In general, the study of modules over a ring is much more complicated and opaque than linear algebra (i.e the study of modules over a field).

From a category-theoretic perspective, the category of vector spaces over a field $\Bbbk$ has rich structure which is simultaneously very simple - it's determined by the field $\Bbbk$. Specifically, it is an abelian symmetric monoidal closed category, whose monoidal unit is simple (admits only trivial quotients), and in which every object is a power of the monoidal unit (the field $\Bbbk$).

In fact, if we require each object to be a finite power of the monoidal unit then this characterizes category of finite dimensional vector spaces over a field, as explained here. Thus the extremely pleasant structural properties above actually recover finite dimensional linear algebra.

Remark. In the category of modules over (even) a commutative ring $R$, the $R$-module structure of $R$ is simple iff $R$ has only trivial ideals iff $R$ is a field!

What about linearity makes it so useful?

5 Answers5