Suppose you had a classification problem where you are trying to predict a class label (e.g., $[0 \: 1 \: 0]^T$) with a model. One way to do this is to use log loss:
$\Large L_{\log} = -\sum_i[y_i\log \hat{y}_i + (1-y_i)\log (1-\hat{y}_i)]$
This is attractive because it does the right thing: it pushes $\hat{y}$ to $\inf$ when $y_i$ is $1$, and to $- \inf$ when $y_i$ is zero. But another way to do this is with elementwise division:
$\Large L_{\text{div}} = \sum_i[\frac{y_i}{\max (\hat{y_i}, \epsilon)} + \frac{\hat{y}_i}{\max (y_i, \epsilon)}]$
Note: $\epsilon$ is a small positive constant to prevent division by zero.
Here, the minimum of the function is attained when $y$ matches $x$ on all dimensions. Isn't this a preferable cost function? Why isn't it used?