2

I was reading about when to use standardization vs normalization and what I could understand was that standardization should be used when the model in use makes some assumptions about the data (I don't know why this assumption). Also I have read that standardization retains the original distribution of data only if it was Gaussian.

Many people on the internet seem to believe that neural nets don't make any assumptions. Now, I do not see how neural networks don't make any assumptions because-

Few months back I had gone through the derivation of mean squared error function and how for normally distributed error term, MSE is the maximum likelihood estimator. This assumption for error term is stated as assumption for linear regression. So, the assumptions depend on the choice of our cost function, I believe? Wouldn't that mean neural network also make assumptions based on choice of cost function, otherwise the estimates wouldn't be that good?

Also, does it mean that I should standardize when data is normally distributed or distribution doesn't matter?

shekhar
  • 21
  • 2
  • The usage of these terms are not "standardized". What do you mean by *standardization*, and what by *normalization*? – Igor F. May 18 '20 at 08:24
  • Standardization meaning mean 0 variance 1 and normalization meaning min max scaling. – shekhar May 19 '20 at 02:58

2 Answers2

2

They don’t. Moreover, normality is not among core assumptions of linear regression either. It is true that minimizing squared error is equivalent to maximizing Gaussian likelihood, but this doesn’t mean that you need to make such assumption when minimizing squared errors. You can use linear regression when the assumption is broken. For linear regression we need the assumption to hold mostly for hypothesis testing and confidence intervals, both are not used, and would be hard to do, in case of neutral networks.

Tim
  • 108,699
  • 20
  • 212
  • 390
  • 1
    Further, we make the normality assumption about the error term, not the data, and we definitely don’t make a normality assumption about the predictors. – Dave May 18 '20 at 11:14
  • Thanks for the answers! What are some good sources to study about basics of neural nets (and statistics in general) from? Most pages on internet either share half information or just tell that things are a certain way (and not the why they are that way). – shekhar May 19 '20 at 03:18
0

This is a common but misleading wording. Models don't make assumptions. People who use them do! For example, when you decide on using a nearest neighbor classifier, you implicitly assume that points which are close are likely to belong to a same class (or, maybe, you have no idea what you're doing and it's pure luck how your model will perform).

So, when you think about using linear regression, you should consider what can be assumed about the data and what do you want your model to capture. If you just want a line through the data that minimizes the sum of square errors, you don't need normality. But, if you want your line at the same time to represent the most likely process which generated the data, then normality, independence, homoskedasticity etc. are an issue.

Regarding data scaling ("standardization"), it's again a question of assumptions and objectives. Imagine a two-dimensional data set, having a large spread along one axis (say, $x$) and a small one along $y$. Whether scaling them to the same span or standardizing them makes sense depends on the underlying cause of the spread. If $x$ is measured in millimeters and $y$ in light years, scaling will likely make sense. On the other hand, if the spread along the $x$-axis is due different classes having distinctively different $x$-values, scaling can lead to an information loss, at least if you use a distance measure-based algorithm, like nearest neighbor.

Igor F.
  • 6,004
  • 1
  • 16
  • 41
  • Yeah, it's humans who make assumptions, but for a particular model, when *it's* assumptions do not hold, it would give rubbish results. It is not that you could make any assumptions, for any model. – Tim May 18 '20 at 12:31
  • 1
    Let me paraphrase: "Model follows assumptions". **First** you make assumptions and **then**, based on them, the knowledge of how models work, and information you want to obtain, decide which model to use. It's like with choosing any other tool. You make assumptions about the hardness of the nail and the wall, and then choose a suitable object to drive the former into the latter. If you don't know the objects' properties, you can still make a bad choice, e.g. taking a glass bottle instead of a steel hammer, but you probably wouldn't say that glass assumes walls and/or nails to be soft. – Igor F. May 18 '20 at 14:27
  • Thanks for the answer! What kind of information loss are we talking about? Is it the loss of effect of outliers on the results? – shekhar May 19 '20 at 03:28
  • bad scaling in NNs is likely to just slow down training a bit. and a slightly sub-optimal result will be reached if regularization is applied. the information loss @IgorF. is talking about mainly applies to distance based models (not NNs). – carlo May 19 '20 at 13:42