There are many motivations, depending on the problem. But the idea is the same: add a priori knowledge about some problem to achieve a better solution and cope with complexity.
A more way to put it is: model selection. Here a nice example on model selection.
Another idea, deeply related to it is to find a similarity measure of data samples (there are different terms that relate to that idea: topographical mappings, distance metric, manifold learning,...).
Now, let us consider a practical example: optical character recognition. If you take the image of a character, you would expect the classifier to deal with invariances: if you rotate, displace or scale the image, it should be able to detect it. Also, if you apply some one modification slightly to the input, you would expect the answer/behaviour of your classifier to vary just slightly as well, because both samples (the original and the modified are very similar). This is where the enforcement of smoothness comes in.
There is a wealth of papers dealing with this idea, but this one (transformation invariance in pattern recognition, tangent distance and tangent propagation, Simard et. al) illustrates these ideas in great detail