1

Are there any reference document(s) that give a comprehensive list of misnomers in machine learning? I would like to have a list and simple explanation if needs be that I could go through easily (vs. a full encyclopedia) to make sure I know all the jargon pitfalls.

Examples:

  • logistic regression: actually not regression, but classification (using the ML terminology).
  • multilayer perceptron: the model actually comprises multiple layers of logistic regression models (with continuous nonlinearities) rather than multiple perceptrons (with discontinuous nonlinearities). (see Christopher M. Bishop's Pattern Recognition and Machine Learning, page 226).
  • non-parametric model: there are actually parameters, but potentially an infinite number of them. The difference between parametric model and non-parametric model is that the former has a fixed number of parameters, while the latter grows the number of parameters with the amount of training data. The non-parametric model is not none-parametric: parameters are determined by the training data, not the model (see Wikipedia). E.g. Bayesian nonparametric models are not parameter free, but have an infinite number of parameters.

I'm aware that there is some dose of subjectivity when declaring a term as a misnomer. I prefer to have higher recall and lower precision, to a reasonable extent.

Franck Dernoncourt
  • 42,093
  • 30
  • 155
  • 271
  • 2
    Why is logit not a regression? Some of your examples are odd. – Aksakal Aug 25 '15 at 19:01
  • 1
    @Aksakal: Andrew Ng's lectures make this note, which was puzzling to me as well. I think he means that in the ML community, the term regression refers to continuous outcomes, while classification refers to categorical outcomes. Of course, this terminology is *not* consistent with statistical terminology. – Cliff AB Aug 25 '15 at 19:05
  • @Aksakal Same as Cliff AB. What's the issue with the two other examples? – Franck Dernoncourt Aug 25 '15 at 19:06
  • @CliffAB, Ok, but logit is applied to continuous outcomes quite often, e.g. to model default rates or survival rates. – Aksakal Aug 25 '15 at 19:07
  • 1
    @FranckDernoncourt, your statements about nonparametric models are unsual, if not outright wrong – Aksakal Aug 25 '15 at 19:08
  • @Aksakal Feel free to correct the WP page :) But more generally I agree regarding a term as a misnomer is often debatable, the goal is to make sure there is no confusion due to the terminology. – Franck Dernoncourt Aug 25 '15 at 19:08
  • @Aksakal: I agree, I think calling logistic regression only a classification tool is a very messy stance. To really say it's only for categorical outcomes means that you don't care about the difference between $\hat p = 0.25$ and $\hat p = 0.00000001$, which I wouldn't generally think is the case. – Cliff AB Aug 25 '15 at 19:11
  • Logit certainly has merit for modelling continuous outcomes, notably proportions between 0 and 1. Ironically its use as a link function (in modern terms) for binary outcomes (emerging in the 1930s/40s IIRC) long postdates its use as a model for population growth (19th century). – Nick Cox Aug 25 '15 at 19:12
  • @FranckDernoncourt, here's example. You have a data set $x_i$, find a function $f(t)$ such that $argmin \int f(t)''dt+\sum_i (f(i)-x_i)^2$. There are no parameters, none whatsoever. – Aksakal Aug 25 '15 at 19:16
  • @Aksakal: there's a lot of different definitions of non-parameteric vs. parametric so this is a tricky area. However, in your example (which is a non-parametric problem), there *certainly* will be parameters involved in the solution. Although you will not be able to describe all possible solutions of $f(t)$ with a finite number of parameters (hence it being a nonparametric problem), *conditional on a dataset*, the solution must be representable with a finite set of parameters. Otherwise, there will be no way to find the solution. – Cliff AB Aug 25 '15 at 19:20
  • 4
    A machine learning person is someone who uses ML to mean machine learning. A classical statistical person is someone who uses ML to mean maximum likelihood. (Part serious, part facetious.) – Nick Cox Aug 25 '15 at 19:21
  • @NickCox, excellent reminder! [Here](http://papers.tinbergen.nl/02119.pdf)'s the paper with a history of logit. It appears started with a differential equation of all continuous things! – Aksakal Aug 25 '15 at 19:21
  • @NickCox haha nice one :-) – Franck Dernoncourt Aug 25 '15 at 19:23
  • @Aksakal Thanks for the link to the history of logit! – Franck Dernoncourt Aug 25 '15 at 19:24
  • @CliffAB, in my example there are no parameters. Of course, you can point to something and call it a parameter, but there are none, really. The function value is determined by a data set, but you can't call it a parameter. – Aksakal Aug 25 '15 at 19:27
  • So how would you characterize the solution then? If you can't summarize it using a finite set of parameters, then it seems you have a non-parametric problem with no practical method of finding the solution. But if you can fully characterize it with a finite number of parameters (even if that requires all the observed data points), then the problem may be a tractable non-parametric problem. – Cliff AB Aug 25 '15 at 19:49
  • @CliffAB, the solutions to these kind of problems are obtained by variational methods, and may look like this $f(t)=\sum_i e^{-|x_i-t|}$ (laplacian smoother). This type of stuff is used in kernel smoothing a lot – Aksakal Aug 25 '15 at 19:51
  • I am not so familiar with this problem, so it's great to learn. But I don't understand the stance that we can't consider $x_i$ to a be a parameter. In the non-parametric problems I've worked on, $x_i$ will have a very similar function, although the solution will not be closed form. We have little problem referring to $x_i$ as a parameter; clearly in your problem, $f(t)$ changes if you change $x_i$. – Cliff AB Aug 25 '15 at 19:56
  • @CliffAB, a parameter is something unobservable that you want to estimate, a data is something you're given and observable. The data is an input, it comes from outside the problem. – Aksakal Aug 25 '15 at 20:02
  • It seems to me that this definition of parameter is very useful for explaining the difference between data and parameters in a parametric model. It does not hold up well in comparing parametric and nonparametric models, though. By this definition, $\hat f(0)$ is a parameter, for example. – Cliff AB Aug 25 '15 at 22:05
  • See https://stats.stackexchange.com/questions/127042/why-isnt-logistic-regression-called-logistic-classification – kjetil b halvorsen Jan 28 '18 at 20:21

0 Answers0