Why do people like smooth data?

Question

I am to use the Squared Exponential kernel (SE) for Gaussian Process Regression. The advantages of this kernel are: 1) simple: only 3 hyperparameters; 2) smooth: this kernel is Gaussian.

Why do people like 'smoothness' so much? I know that the Gaussian kernel is infinitely differentiable, but is that so important? (Please let me know if there is other reasons why the SE kernel is so popular.)

PS: I was told that most signals in real world (without noise) are smooth, so it is reasonable to use smooth kernels to model them. Could anyone please help me understand this concept?

Are you asking the psychological question of why people like smoothness or the statistical question of why smooth functions are better statistically? — John, Aug 04 '14 at 12:00
@John Thank you for your comment. I was refering to the second question in your post and in addition, I want to confirm why are most signal in real world are smooth — kakanana, Aug 04 '14 at 20:22

score 15 · Accepted Answer · answered Aug 04 '14 at 11:36

"Natura non facit saltus" is an old principle in philosophy. Also, beauty and harmony are such principles. Another philosophical principle that has impact on statistics is qualitative thinking: Traditionally we don't think in effect sizes but whether an effect is there or not. This let to hypothesis testing. Estimators are too precise for your perception of nature. Take it as it is.

Statistics has to serve the human perception. So discontinuity points are disliked. One would immediately ask: Why is exactly at this a discontinuity? Especially in density estimation, these discontinuity points are mostly due to non asymptotical nature of real data. But you don't want to learn about your certain finite sample but about the underlying natural fact. If you believe this nature doesn't jump, then you need smooth estimators.

From a strict mathematical point of view, there is hardly a reason for it. Also, since Leibniz and Newton natural phenomena became known that are not smooth. Talk to the natural scientist you're working for. Challenge his view of smoothness/discontinuity and then do what you both decided to be most helpful for his understanding.

this answer addresses almost only continuity, not smoothness — carlo, Sep 03 '20 at 21:21

score 2 · Answer 2 · answered Aug 05 '14 at 02:22

There are two more reasons of practical matters. The first one is that analytical functions are much easier to work with mathematically, and therefore prove theorems about your algorithms and give them a stronger foundation.

The second is sensitivity. Say you have a machine learner $M$ whose output has a discontinuity at $x=x_0$. Then you would get very different results for $x_0 - \epsilon$ and $x_0 + \epsilon$, but that's OK because we made it discontinuous. Now, if you train your model with slightly different data ($\tilde M$), where the random noise is just a tiny bit different, the discontinuity will now be at $\tilde x_0$, probably very close to $x_0$, but not quite, and now, for some values of $\epsilon$, $x_0 + \epsilon$ has a very different value for $M$ and for $\tilde M$.

score 1 · Answer 3 · answered Aug 04 '14 at 11:57

There are many motivations, depending on the problem. But the idea is the same: add a priori knowledge about some problem to achieve a better solution and cope with complexity. A more way to put it is: model selection. Here a nice example on model selection.

Another idea, deeply related to it is to find a similarity measure of data samples (there are different terms that relate to that idea: topographical mappings, distance metric, manifold learning,...).

Now, let us consider a practical example: optical character recognition. If you take the image of a character, you would expect the classifier to deal with invariances: if you rotate, displace or scale the image, it should be able to detect it. Also, if you apply some one modification slightly to the input, you would expect the answer/behaviour of your classifier to vary just slightly as well, because both samples (the original and the modified are very similar). This is where the enforcement of smoothness comes in.

There is a wealth of papers dealing with this idea, but this one (transformation invariance in pattern recognition, tangent distance and tangent propagation, Simard et. al) illustrates these ideas in great detail

Why do people like smooth data?

3 Answers3

Linked