Difference between estimators and optimizers in neural networks

Question

I individually understand what different optimizer functions, such as gradient descent, Adams, etc, and I understand what estimator functions, such as maximum likelihood, do.

But I am having a hard time putting the two concepts together. Is MLE used instead of an optimizer or is does the optimizer maximize the MLE? In what segment of the network does the MLE come into play?

My current understanding is the following. An input is fed into a network, and assuming we're not using a pre-trained network, we randomly assign weights/biases, then, once the input has forward propagated, a loss function is used to calculate the error between the output and the target. This loss is then "back propagated" through the network and the weights are tweaked. This seems like a closed loop. Where in this does MLE come into play?

score 1 · Accepted Answer · answered May 08 '18 at 22:58

I'll try to answer this in three parts to address: 1) Maximum Likelihood, 2) The role of ML in neural networks, 3) Optimizers (e.g., Gradient Descent)

I think its best to view Maximum Likelihood as a Principal, rather than an estimator. MLE is simply a manifestation of using the ML principal in a specific problem (example estimating loading vectors in factor analysis).

From this top-down perspective, a neural network falls in the same category as any other modeling tool (like factor analysis). Now, in order to find the parameters of the neural network, you need to optimize a loss function. In deriving said loss function (aka objective), you can use the Maximum Likelihood Principal, which maximizes the likelihood of the data given the neural network parameters.

When the objective is formulated for neural networks, it cannot be solved using a simple closed form (or semi-closed form) set of estimating equations (see Godambe 1991). That's where gradient descent (or other optimization tools) come into picture.

Difference between estimators and optimizers in neural networks

1 Answers1