In section 4 Empirical Risk Minimzation of the paper Principles of Risk Minimization for Learning Theory by V. Vapnik, the author says the following:
In order to solve this problem, the following induction principle is proposed: the risk functional $R(w)$ is replaced by the empirical risk functional
$$E(w) = \dfrac{1}{\mathscr{l}} \sum_{i = 1}^\mathscr{l} L(y_i, f(x_i, w)) \tag{3}$$ constructed on the basis of the training set (1). The induction principle of empirical risk minimization (ERM) assumes that the function $f(x, w^*_\mathscr{l})$, which minimizes $E(w)$ over the set $w \in W$, results in a risk $R(w^*_\mathscr{l})$ which is close to its minimum.This induction principle is quite general; many classical methods such as least square or maximum likelihood are realizations of the ERM principle.
The evaluation of the soundness of the ERM principle requires answers to the following two questions:
Is the principle consistent? (Does $R(w^*_\mathscr{l})$ converge to its minimum value on the set $w \in W$ when $\mathscr{l} \to \infty$?)
How fast is the convergence as $\mathscr{l}$ increases?
Why is the rate of convergence in 2. important?