As I understand it*, link functions are associated with Generalized linear modeling (GLM). The link function is used to relate the (conditional) expected value of the dependent variable $y$ to a linear predictor constructed from the independent variables $x$, i.e.
$$g[\langle y\mid x\rangle]=L[x,\theta]$$
where $g[\,]$ is the link function and $L[\,]$ is the linear predictor, parameterized by $\theta$, which is to be estimated by MLE, assuming i.i.d. data $y$. The link function is relatively unconstrained, but to be admissable it must have an appropriate domain and range, and it must be invertible.
For the case of Bernoulli distributed data $y\sim\mathrm{Bern}[p]$, the left hand side becomes $g[\,p[x]\,]$, where $p[x]=\langle y\mid x\rangle$ is the conditional mean.
Now in machine learning, the "loss function" is commonly derived from MLE, where it is defined as the negative log-likelihood of the data given the parameters. For MLE involving exponential-family (conditional) PDFs, the log is convenient, but note that any monotonic (i.e. invertible) function could be used for optimization.
As I understand it*, in GLM the log-loss corresponds to the "canonical link". In the Bernoulli case this would be the logit function, which gives logistic regression. For probit regression the probit function would be used instead.
One final note: from an optimization perspective, GLM operationally gives a framework to translate many estimation problems into nonlinear least squares problems that can be solved with standard techniques that leverage existing linear LSQR toolkits.
(*I am not very familiar with GLMs, so parts of this may very well be wrong. Any corrections would be appreciated if so!)