Logistic regression link function from optimization perspective

Question

I learned the logistic regression from optimization context instead of statistics context.

Logistic regression is trying to minimize logistic loss. I know there are two forms, but they are identical. Essentially from probability perspective, we are doing max likelihood on Bernoulli probability models.

Why there are two different logistic loss formulation / notations?

Now how the other link function play different rows here? In addition to logit link we have probit link

Difference between logit and probit models

Are we essentially changing logistic loss to something else? Why I never heard of "probit loss"?

I tried to give an answer, as I come from a similar perspective! Note that you have a typo in the question title: missing "i" in l**i**nk — GeoMatt22, Sep 19 '16 at 20:19
@GeoMatt22 thanks for your answer and suggestions. typo fixed! — Haitao Du, Sep 19 '16 at 20:20
What is the question here exactly? Are you asking if there is a way to formulate bernoulli regression with a probit link as a loss of the form $\log(1+ \exp(−y \hat{y}))$, as per the question [you linked](http://stats.stackexchange.com/questions/229645/why-there-are-two-different-logistic-loss-formulation-notations)? — Andrew M, Sep 19 '16 at 20:48
@AndrewM I was trying to ask what's the objective function for other link functions. and if there are any intuitive explanations for other objective functions. — Haitao Du, Sep 20 '16 at 01:17

GeoMatt22 · Accepted Answer · 2016-09-19T20:24:46.323

As I understand it*, link functions are associated with Generalized linear modeling (GLM). The link function is used to relate the (conditional) expected value of the dependent variable $y$ to a linear predictor constructed from the independent variables $x$, i.e. $$g[\langle y\mid x\rangle]=L[x,\theta]$$ where $g[\,]$ is the link function and $L[\,]$ is the linear predictor, parameterized by $\theta$, which is to be estimated by MLE, assuming i.i.d. data $y$. The link function is relatively unconstrained, but to be admissable it must have an appropriate domain and range, and it must be invertible.

For the case of Bernoulli distributed data $y\sim\mathrm{Bern}[p]$, the left hand side becomes $g[\,p[x]\,]$, where $p[x]=\langle y\mid x\rangle$ is the conditional mean.

Now in machine learning, the "loss function" is commonly derived from MLE, where it is defined as the negative log-likelihood of the data given the parameters. For MLE involving exponential-family (conditional) PDFs, the log is convenient, but note that any monotonic (i.e. invertible) function could be used for optimization.

As I understand it*, in GLM the log-loss corresponds to the "canonical link". In the Bernoulli case this would be the logit function, which gives logistic regression. For probit regression the probit function would be used instead.

One final note: from an optimization perspective, GLM operationally gives a framework to translate many estimation problems into nonlinear least squares problems that can be solved with standard techniques that leverage existing linear LSQR toolkits.

(*I am not very familiar with GLMs, so parts of this may very well be wrong. Any corrections would be appreciated if so!)

Logistic regression link function from optimization perspective

1 Answers1