I am reading Chris Bishop's Pattern Recognition and Machine Learning.
In Section 2.3.5 he introduces some ideas on the contribution of the $n$th observation in a data set to the maximum likelihood estimator of the mean.
He says that the larger number of observations, the contribution of the last data set doesn't amount much, really. This makes good sense.
He then continues to introduce this:
"However, we will not always be able to derive a sequential algorithm by this route, and so we seek a more general formulation of sequential learning, which leads us to the Robbins-Monro algorithm."
My question is: I am not entirely clear on the motivation. How are these two ideas connected? I would be glad to hear some better insights into what is happening.