I was recently reading a discussion amongst mathematicians/statisticians about machine and deep learning, and how they are applied by non-mathematicians/statisticians. The argument was that these methods are often applied incorrectly, since the people who are often using these methods do not have the appropriate mathematical/statistical background to understand them. For instance, some machine learning methods, and certainly deep learning methods, require large amounts of data to produce good results; however, people who don't understand these methods often apply them without adequate amounts of data. It was then mentioned that this ignorance sometimes works if you have large amounts of data, in the sense that having large amounts of data reduces the need for you to understand the assumptions of these methods and will yield good results regardless; however, it was then said that, if one wishes to use these methods in not-so-good conditions (say, in the absence of large amounts of data), then it is still possible to get good results, but the statistical assumptions of the methods then become important, since you don't have the large amounts of data to save/shield you.
As a novice, I want to research this further. What assumptions are being referred to here? In other words, what are these mathematical/statistical assumptions underlying these methods that one must understand in order to actually understand the methods and be able to apply them in not-so-good conditions? The first thing that came to my mind when I was reading this was the law of large numbers and the idea of the distribution of data approaching a normal distribution as the amount of data increases. Another, less concrete idea that came to mind was that there was probably some assumption here that is related to all of those inequalities that are taught in probability theory (bounding probabilities), such as Cauchy-Schwarz, Jensen, etc. But since I am a novice, this is all that I could come up with.
And please reference any research papers that discuss this! That would be much appreciated.
EDIT:
My understanding is that machine learning and deep learning are different (categories of) methods, so I've described them separately in case the underlying assumptions are different between them.
EDIT2:
If the assumptions are dependent on the specific method and too many to list, then are there any general assumptions across all methods (such as the law of large numbers and normality one I mentioned)? A sampling of a few important methods, their assumptions, and relevant research papers would be a fine answer. Deep Learning in particular would be an interesting one, since it's said to require so much data (what if I wanted to use Deep Learning with limited data? What assumptions would I need to be aware of?).