8

Is Wikipedia's page on the sigmoid function incorrect?

It states that:

A common example of a sigmoid function is the logistic function

From my knowledge of machine learning, I thought that "the sigmoid function" is defined as the logistic function, $$\sigma(z) = \frac {1} {\left(1 + e^{-z}\right)}\text{.}$$ I have never seen or heard the phrasing that the logistic function is a type of sigmoid function.

Furthermore, that Wikipedia page says that other examples of a sigmoid function are the tanh and arctan functions. Again, I've never seen tanh nor arctan described as a type of sigmoid function.

These functions are considered to be peers, usually in a context like:

We can use various non-linear functions in this neural network, such as the sigmoid, tanh, and ReLU activation functions.

What am I missing here? Is the Wikipedia article correct or incorrect? I find that Wikipedia is usually accurate for math terms.

Karolis Koncevičius
  • 4,282
  • 7
  • 30
  • 47
stackoverflowuser2010
  • 3,190
  • 5
  • 27
  • 35
  • 5
    Please do not vandalize your question. When you posted on SE, you gave up ownership of the content under [CC BY-SA 4.0](https://stats.stackexchange.com/help/licensing). If there are no answers, you may delete your own question (see [here](https://stats.stackexchange.com/help/what-to-do-instead-of-deleting-question) ): just click the faint gray 'delete' at lower left (your account needs to be registered for this). Otherwise, the thread will remain according to SE's rules. – Sycorax Sep 28 '21 at 23:43

4 Answers4

60

The unsatisfying answer is "It depends who you ask." "Sigmoid", if you break it into parts, just means "S-shaped".

The logistic sigmoid function is so prevalent that people tend to gloss over the word "logistic". For machine learning folks, it's become the exemplar of the class, and most call it the sigmoid function. (Is it myopia to call it the sigmoid function?) Still, there are other communities that use S-shaped functions.

Arya McCarthy
  • 6,390
  • 1
  • 16
  • 47
  • 18
    I'm analytical chemist, and we use "sigmoid" in the more general S-shape sense without implying what function exactly. E.g. the rather typical detector behaviour that you get some roughly constant signal at very low concentrations, then the signal increases with analyte concentration (that's what we want to use) and finally, at high concentrations the signal becomes constant again, e.g. because of detector saturation is called sigmoid. – cbeleites unhappy with SX Sep 15 '21 at 12:20
  • 2
    One common problem of encyclopedias is that they must give general definitions valid in any context, of things that are usually used and taught in several more or less narrow different contexts. – Pere Sep 15 '21 at 12:51
  • 1
    The Wikipedia article is overgeneralizing to the point of being potentially harmful. You said `it depends on whom you ask`. I work in applied ML, and I would probably be knocked down by my peers if I said that I used a sigmoid function in my neural network instead of being more specific and saying that I really used tanh. – stackoverflowuser2010 Sep 15 '21 at 17:59
  • 6
    Your message goes exactly to my point. – Arya McCarthy Sep 15 '21 at 18:02
  • 27
    Re "is it myopia?": yes, definitely. This function has been around--with an established name ("logistic")--since the mid-1800's. A community that creates a new name for such an old, well-known object is actively rejecting its intellectual history. – whuber Sep 15 '21 at 18:12
  • 18
    Intellectual history *does* matter. Those who don't know it are doomed to repeat it, as the adage goes. It is difficult (practically impossible) to acquire a deep understanding of a concept or technique if you have to repeat for yourself centuries of investigation and discovery. In the present case, anybody who has taken a freshman college course in math, chemistry, physics, or biology has learned about logistic functions under that name, so ignorance is no excuse. Even Isaac Newton acknowledged that he "stood on the shoulders of giants." We, too, should take advantage of what precedes us. – whuber Sep 15 '21 at 19:51
  • 25
    My point is that ML people thinking that other (older) disciplines' names are wrong is hubris. (I say this as an ML person.) Just because we're a community with a large, loud online presence doesn't mean that we're the only truth out there. Norms in your field ≠ norms in other fields. So while your colleagues may understand "sigmoid function" to mean the logistic sigmoid function specifically, analytical chemists are also well-grounded in calling a broader class of functions "sigmoid functions". – Arya McCarthy Sep 15 '21 at 19:58
  • 13
    @stackoverflowuser2010 [There are lots of examples of machine learning/neural networks folks redefining terms.](https://stats.stackexchange.com/questions/223256/tensorflow-cross-entropy-for-regression) For instance, I know that when a NN paper writes about "cross entropy loss," they're almost certainly referring to "categorical cross entropy," [even though you can write a cross entropy loss for other distributions.](https://stats.stackexchange.com/questions/378274/how-to-construct-a-cross-entropy-loss-for-general-regression-targets) – Sycorax Sep 15 '21 at 20:14
  • 8
    @Sycorax Heck, sometimes there's examples of machine learning folks redefining their *own field*'s terms in confusing ways, like how "testing" and "validation" datasets can each refer to one of two different things. – nick012000 Sep 16 '21 at 06:53
  • 2
    @whuber I don't think that is accurate. Fields re-define terms all the time, depending on what is more useful. Do you ever use "the Fourier transform"? Because I know of at least six different definitions, of which I have seen four of them used "in the wild" in physics alone. It is just a synecdoche. Not much different from `x^2`, that is defined as convex in higher mathematics, but concave in most other contexts. – Davidmh Sep 16 '21 at 08:35
  • 4
    @David I'm sorry, I can't understand your comment because (1) it's unclear what your initial "that" refers to; (2) the point about the FT concerns different definitions whereas we were discussing re-inventions of new terms for the *same* concept; (3) "synecdoche" makes little sense in this context. – whuber Sep 16 '21 at 15:53
  • Let's not pretend like machine learning is unique in this regard. I'm thinking in particular how almost every term in calculus has multiple different names. The first line of the wikipedia article for 'Antiderivative' is "In calculus, an antiderivative, inverse derivative, primitive function, primitive integral or indefinite integral[Note 1] of a function f is a differentiable function F whose derivative is equal to the original function f." – Brady Gilg Sep 16 '21 at 17:14
  • 4
    That's true about Cross Validated. Your question, though, is _about Wikipedia_. It has a broader scope than statistics/ML people. – Arya McCarthy Sep 16 '21 at 18:57
  • @AryaMcCarthy: I posted this question on a ML site, purposefully to limit the scope of the audience to people in my field. I'm wondering why analytic chemists are being brought into this discussion. – stackoverflowuser2010 Sep 16 '21 at 18:59
  • 12
    Because analytic chemists use statistics. – Arya McCarthy Sep 16 '21 at 19:04
  • 2
    @whuber "anybody who has taken a freshman college course in math, chemistry, physics, or biology has learned about logistic functions under that name," Not true. I have a math degree from Cambridge University (UK). I never heard of sigmoid functions until I taught *myself* about neural networks, probably 20 years after getting the degree. (And they were called sigmoid functions, not logistic functions, in all the literature I read.) – alephzero Sep 17 '21 at 02:40
  • 2
    @alephzero and yet I met some sigmoid functions in my maths A level studies 20 years ago, again in physics. It may be possible that both you and whuber are exaggerating a wee bit for effect. – Ciaran Haines Sep 17 '21 at 03:55
  • 1
    ML people build models, in which they have to choose a particular function to suit their purposes. Chemical analysts observe data and describe it. It's no surprise that analysts will use names to refer to classes of functions, whereas ML folk will use names to refer to specific functions. Calling 1/(1+e^-x) *the* sigmoid function is an abuse of language, similar to calling x^2 *the* quadratic function, and this abuse of language makes sense in model-building ML, but not in analysis. – Stef Sep 17 '21 at 10:16
  • 3
    @aleph I made no claim that everyone would *remember* these lessons. ;-) – whuber Sep 17 '21 at 12:53
  • 2
    @alephzero Two decades after maths at Cambridge, I clearly recall the logistic function from compulsory Part Ia modules Differential Equations (as a solution to the logistic differential eqn, taught in some detail with reference to eg population ecology; "Sigmoid=S-shaped" was used as a more generic term to qualitatively describe various DE solns) & briefly in Dynamics (more emphasis on the logistic map that solves the equivalent discrete-time eqn). Logistic regression wasn't, to my memory, in any compulsory probability & statistics course: maybe 1st appeared in Part II Statistical Modelling? – Silverfish Sep 29 '21 at 00:18
  • 4
    I can believe someone who dodged stats options might not encounter the logistic function on a maths degree if they weren't shown that DE as an example, or at least not known it by name if the lecturer that year treated it purely as an algebra/calculus exercise & didn't find the idea of "logistic growth" worth commenting. Someone on an applied science (eg chemistry, ecology) or stats-heavy (eg economics, psychology) degree might even be more likely to see the term "logistic" than a pure maths student. But certainly terminology 20 years ago at Cam was in line with @AryaMcCarthy's answer! – Silverfish Sep 29 '21 at 00:33
10

As Arya said, it depends who you ask, but this is not specific to Machine Learning, and even in Machine Learning the situation is not consistent (or not consistently bad). Bishop, for example, uses the term "logistic sigmoid function" and Jordan used "logistic function" already in 1995. In Statistical Mechanics, on the other hand, people are likely to call it the "Fermi-Dirac distribution/function". In some fields of biochemistry, including toxicology, you'll meet the same thing under the name "Hill equation". Etc.

It is IMHO important to remember that these are only names (words) used for describing a mathematical concept. Words is what people use to communicate, for example ideas and methods. As long as all participants of the communication understand what concept they are talking about, it doesn't really matter what words they use for it. Communities develop to a large part independently from each other (otherwise they would form a single community) and develop field-specific "dialects".

As a related example, the words "weight" and "bias", in the context of neural networks (and, through historical development, support vector machines) have completely different meanings from those used in statistics, but there is historical/field specific justification for using them.

Update: Actually, neural network pioneers commonly use "logistic function" or "logistic neuron": Hinton, Rumelhart and McClelland (also here), Sejnowski etc.

Update 2: Also, one might as well ask: "Is RBF just the Gaussian function?". For some reason, equating the two on CV doesn't seem to cause nearly as much commotion as your question.

Igor F.
  • 6,004
  • 1
  • 16
  • 41
7

I believe one more answer, specifically addressing your points as they currently stand (Revision 11) and comments is warranted.

Is Wikipedia's page on the sigmoid function incorrect?

No. In some communities, specifically Machine Learning, some (maybe even most?) people use the term "sigmoid function" in a different, more limited sense, as a synonym for the logistic function. But, not the whole community does so, Machine Learning is not the only community using the term, and Wikipedia is not an encyclopaedia of Machine Learning. It addresses a broader audience, which uses a different terminology and has been using it probably before Machine Learning has been invented.

I have never seen or heard the phrasing that the logistic function is a type of sigmoid function.

Wikipedia also doesn't use this exact wording, so you seem to be misquoting it. But, semantically, considering the logistic function just a member of the sigmoid family is not at all uncommon, not even in the ML community. See for example:

A Sigmoid function is a mathematical function which has a characteristic S-shaped curve. There are a number of common sigmoid functions, such as the logistic function, the hyperbolic tangent, and the arctangent.

Examples of such usage in other communities have been given in other answers and comments.

These functions are considered to be peers, usually in a context like:

We can use various non-linear functions in this neural network, such as the sigmoid, tanh, and ReLU activation functions.

Again, this is just ML-specific lingo and even there the situation seems not to be so clear-cut. For example, in Python's Scikit-learn (an ML library!), the neurons in a multi-layer perceptron can have identity, logistic, tanh, or relu activation functions, but not "sigmoid".

From the comments:

I work in applied ML, and I would probably be knocked down by my peers if I said that I used a sigmoid function in my neural network instead of being more specific and saying that I really used tanh.

When in Rome, do as the Romans do. But, that goes in both directions. Machine Learners, when addressing other audiences, should be specific and use "logistic function" instead of "sigmoid".

I posted this question on a ML site, purposefully to limit the scope of the audience to people in my field.

Cross Validated's scope is broader than just Machine Learning:

Cross Validated is a question and answer site for people interested in statistics, machine learning, data analysis, data mining, and data visualization.

Igor F.
  • 6,004
  • 1
  • 16
  • 41
-15

It should be clear that the mentioned Wikipedia page has some terminology issues.

Wikipedia's statement

A common example of a sigmoid function is the logistic function

and assertions that these functions are examples of sigmoid functions

enter image description here

are confusing at best.

The logistic function is not a type of sigmoid function. The sigmoid function is the logistic function. Likewise, the tanh function is not a type of sigmoid function.

Stanford's Andrew Ng states the terminology concisely in this video on neural network activation functions. This is the correct terminology to use if you are working in this field.

https://www.youtube.com/watch?v=P7_jFxTtJEo

Sycorax
  • 76,417
  • 20
  • 189
  • 313
stackoverflowuser2010
  • 3,190
  • 5
  • 27
  • 35
  • 10
    I don't think you can make such a strong statement. Ng may use "sigmoid" and "logistic" as synonyms (http://www.jojo-m.cn/2021/01/07/machine%20learning-Andrew%20Ng-Stanford/), but not all experts in the field do. In sklearn (https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPClassifier.html), the activation function can be 'tanh' or 'logistic', but not 'sigmoid'. Logistic function and tanh are closely related: $\tanh x = 2 \cdot$ logistic$(2x) - 1$. Whoever fires you for using a technically correct term is not worth working for. – Igor F. Sep 20 '21 at 06:53
  • 9
    Besides, your question was about Wikipedia. Wikipedia is **not** an encyclopedia of Machine Learning. – Igor F. Sep 20 '21 at 18:40