There are many important reasons for it. For example, in the case of search engines and recommender systems the user will consider only the first k items, so you really want to make sure that the first elements of the search result are most relevant for him. Here, just evaluating just how relevant/accurate the top element is, is the wrong metric for the problem.
This is linked to theoretical and computational issues. The more classes you consider, the more you suffer from class ambiguity (classes whose members are much more similar to each other than to other classes). This is also an important issue when scaling image classification problems to a huge number of classes, and has motivated the use of loss functions that optimise the top-k error instead of the zero-one loss.
It has been shown that these problems are computationally easier to solve (scale better), and yield better accuracy. In loose terms, algorithms will have a hard time trying to separate very "similarly looking" classes, and spend lots of computation trying to solve it, without success. And from a practical point of view, it is not critical or even a problem. So, how is the top-k error defined?.
As we have seen, the goal is to have the most k-relevant terms as a result. As an example, consider the approach described in Top-k Multiclass SVM. See Loss Functions for Top-k Error: Analysis and Insights for more details, or do some research on ranking algorithms.
If we have a training set $\{x_{i}, y_{i}\}, i=1, ..., n$, of training samples $x_{i} \in \mathcal{X}$ and labels $y_{i} \in \mathcal{Y} = \{1, ..., m\}$. We want to learn $m$ linear predictors, $\omega_{y}$, whose prediction scores are given by the scalar product $<\omega_{y}, x>$. Let $[.]$ represent the permutation of the labels such that $[j]$ is the largest jth score,
$$
<\omega_{[1]}, x> \geq <\omega_{[2]}, x> \geq .. \geq <\omega_{[k]}, x>
$$
We will have an error when the desired search result is not among the k-best scores. We can formally encode this idea in the top-k zero-one loss defined as,
$$
\text{err}_{k}(f(x), y) = \mathbf{1}_{<\omega_{[k]}, x> > <\omega_{y}, x>}
$$
where $f(x) = (<\omega_{[1]}, x>, <\omega_{[2]}, x>, .. <\omega_{[k]}, x>)$ and $\mathbf{1}_{P} = 1$ if $P$ is true, and 0 otherwise.