So Im trying to make a neural network that learns a pattern and outputs another number from the sequence. for example : My first test was with factorials. I had an array of numbers as my input, with labels that were the factorials of those numbers. I was doing this off of code from a tutorial. However, I realised this tutorial was for classification with binary output. What kind of neural network supports non binary classification?
1 Answers
Neural networks can learn to solve $c$-class classification problems, where $c$ is the number of classes (categories) to be discriminated.
The general goal is to categorize a set of patterns or feature vectors, into one of $c$ classes. The true class membership of each pattern is considered uncertain. Feed-forward neural networks learn to perform statistical classification, where the feature distributions overlap, for the different classes. In case the number of classes is three, $c=3$, you train with indicator vectors (Target = [1 0 0]',Target = [0 1 0]' and Target = [0 0 1]'
, where "`" indicates vector transpose), for patterns belonging to each of the three categories. The neural network learns the probabilities of the three classes, $P(\omega_i \mid {\boldsymbol x})$, $i=1,\ldots,c$.
The prior class distribution is given from the training set, ${\hat P}(\omega_i)$, $i=1,\ldots,c$, the fraction of training patterns belonging to each category.
In the annotation of Duda & Hart [Duda R.O. & Hart P.E. (1973) Pattern Classification and Scene Analysis, Wiley], define the feature distributions provided as input vector to the feed-forward neural network by $P({\boldsymbol x}\,\mid\,\omega_i)$, where for example the data vector equals ${\boldsymbol x}=(0.2,10.2,0,2)$, for a classification task with 4 real-valued feature variables. The index $i$ indicates the possible $c$ classes, $i \in \{1,\ldots,c\}$, and $\omega_1,\omega_2,\ldots,\omega_c$.
The feed-forward neural network classifier learns the posterior probabilities, ${\hat P}(\omega_i\,\mid\,{\boldsymbol x})$, when trained by gradient descent. This is the major result proved by Richard & Lippmann in 1991. The hat over the posterior probability indicates the uncertainty as the probabilities are estimated (learned): $$ {\hat P}(\omega_i\,\mid\,{\boldsymbol x}) = \frac{{\hat P}(\omega_i) \; {\hat P}({\boldsymbol x},\mid\,\omega_i)}{\sum_{i=1}^c {\hat P}(\omega_i) \; {\hat P}({\boldsymbol x},\mid\,\omega_i)} $$
Reference:
Michael D. Richard and Richard P. Lippmann. "Neural Network Classifiers Estimate Bayesian a posteriori Probabilities," Neural Computation, Vol. 3, No. 4,pp. 461-483, 1991.

- 1,701
- 4
- 15
-
Aright then, but how can something like a neural net handle a game of chess, or in google’s case, learn to walk? – Michael Ilie May 13 '18 at 20:54
-
I argue that we should first jointly be impressed of the results by LeCun and his co-workers on visual recognition obtained by deep learning convolutional neural networks (2015, Nature). – Match Maker EE May 13 '18 at 22:28
-
1Alas, this is not true anymore. Unlike old-style shallow MLPs, modern deep neural networks, with all their powerful but arcane regularization tricks (dropout, batch normalization, skip connections, increased width, scale of a dragon, tail of a toad, etc.) [are very poorly calibrated](https://arxiv.org/pdf/1706.04599.pdf). See also [here](https://stats.stackexchange.com/questions/309642/why-is-softmax-output-not-a-good-uncertainty-measure-for-deep-learning-models?utm_medium=organic&utm_source=google_rich_qa&utm_campaign=google_rich_qa), and, more recently, 1/ – DeltaIV May 14 '18 at 06:46
-
2/ [here](https://arxiv.org/abs/1803.09546) for the regression case. – DeltaIV May 14 '18 at 06:48
-
1Scale of a dragon :lol – conjectures May 14 '18 at 08:00