In your problem, linear model with small data set, it seems that they are quite similar, and statistical linear model may be more suitable than data mining.
Their particular emphases are slightly different. Generally speaking, statistics is more based on probability, distribution and hypothesis tests. To get some theoretical proof results, statistics has more assumptions and tends to a simpler model form. On the other hand, data mining is more focus on optimizing the result performance of data set. Not emphasizing the mathematical assumptions and proof too much, it tends to use complex structure and model averaging. Usually, no model can guarantee always more suitable (or say outperforming) than the other models in data mining consistently by mathematical proof (before try to apply to the data set). Different models and different parameters will be tried to the data set and compared by performance (like cross-validation). So we can see lots of hypothesis test to validate the statistical models while data mining will usually only compare about the testing error (prediction performance).
In the linear model, we can see there are several classic assumptions. And we use hypothesis and residuals plot etc to validate the model, which is suitable for small data. For big data, it is difficult for us to read in a mess plot and some hypothesis are not convincing. For example, Shapiro-Wilk Normality Test will nearly always reject Normality because that statistic is function of N.
With Big data and computational machine learning, not only more variables but also more complex structure even the same variables can we use, like splines, penalized regression, etc. Moreover, we can use bootstrapping to get a larger sample, and do model averaging. In these cases, the classic statistic hypothesis test or model selection like AIC/BIC are not valid, because we can not even strictly find the likelihood or parameters based on different kinds of models. So data mining is more focus on how to tune the model to get a good performance.