Supervised learning is the machine learning task of inferring a function from labeled training data. The training data consist of a set of training examples. In supervised learning, each example is a pair consisting of an input object (typically a vector) and a desired output value (also called the supervisory signal). A supervised learning algorithm analyzes the training data and produces an inferred function, which can be used for mapping new examples.
Questions tagged [supervised-learning]
596 questions
59
votes
5 answers
Apply word embeddings to entire document, to get a feature vector
How do I use a word embedding to map a document to a feature vector, suitable for use with supervised learning?
A word embedding maps each word $w$ to a vector $v \in \mathbb{R}^d$, where $d$ is some not-too-large number (e.g., 500). Popular word…

D.W.
- 5,892
- 2
- 39
- 60
53
votes
4 answers
Class imbalance in Supervised Machine Learning
This is a question in general, not specific to any method or data set. How do we deal with a class imbalance problem in Supervised Machine learning where the number of 0 is around 90% and number of 1 is around 10% in your dataset.How do we optimally…

NG_21
- 1,436
- 3
- 17
- 25
49
votes
1 answer
Difference between GradientDescentOptimizer and AdamOptimizer (TensorFlow)?
I've written a simple MLP in TensorFlow which is modelling a XOR-Gate.
So for:
input_data = [[0., 0.], [0., 1.], [1., 0.], [1., 1.]]
it should produce the following:
output_data = [[0.], [1.], [1.], [0.]]
The network has an input layer, a hidden…

daniel451
- 2,635
- 6
- 22
- 26
46
votes
1 answer
How is softmax_cross_entropy_with_logits different from softmax_cross_entropy_with_logits_v2?
Specifically, I suppose I wonder about this statement:
Future major versions of TensorFlow will allow gradients to flow
into the labels input on backprop by default.
Which is shown when I use tf.nn.softmax_cross_entropy_with_logits. In the…

Christian Eriksson
- 573
- 1
- 4
- 10
38
votes
4 answers
Is there any supervised-learning problem that (deep) neural networks obviously couldn't outperform any other methods?
I have seen people have put a lot of efforts on SVM and Kernels, and they look pretty interesting as a starter in Machine Learning. But if we expect that almost-always we could find outperforming solution in terms of (deep) Neural Network, what is…

Robin
- 585
- 1
- 6
- 9
32
votes
5 answers
How can you account for COVID-19 in your models?
How are you dealing with the coronavirus "event" in your machine learning models?
Let's say you used to predict the number of sales each month. The virus affected your results last year and it will affect for at least a couple of months. So your…

dsbr__0
- 707
- 2
- 7
30
votes
5 answers
Distinguishing between two groups in statistics and machine learning: hypothesis test vs. classification vs. clustering
Assume I have two data groups, labeled A and B (each containing e.g. 200 samples and 1 feature), and I want to know if they are different. I could:
a) perform a statistical test (e.g. t-test) to see if they are statistically different.
b) use…

MaxG
- 363
- 3
- 7
30
votes
2 answers
Supervised learning, unsupervised learning and reinforcement learning: Workflow basics
Supervised learning
1) A human builds a classifier based on input and output data
2) That classifier is trained with a training set of data
3) That classifier is tested with a test set of data
4) Deployment if the output is satisfactory
To be used…

Karl Morrison
- 763
- 2
- 8
- 17
29
votes
3 answers
Unsupervised, supervised and semi-supervised learning
In the context of machine learning, what is the difference between
unsupervised learning
supervised learning and
semi-supervised learning?
And what are some of the main algorithmic approaches to look at?

Ami
- 958
- 1
- 10
- 11
23
votes
2 answers
What is the manifold assumption in semi-supervised learning?
I am trying to figure out what the manifold assumption means in semi-supervised learning. Can anyone explain in a simple way? I cannot get the intuition behind it.
It says that your data lie on a low-dimensional manifold embedded in a…

user34790
- 6,049
- 6
- 42
- 64
21
votes
3 answers
How to predict outcome with only positive cases as training?
For the sake of simplicity, let's say I'm working on the classic example of spam/not-spam emails.
I have a set of 20000 emails. Of these, I know that 2000 are spam but I don't have any example of not-spam emails. I'd like to predict whether the…

enricoferrero
- 506
- 5
- 14
19
votes
4 answers
Why does regularization wreck orthogonality of predictions and residuals in linear regression?
Following up on this question...
In ordinary least squares, the predictions and residuals are orthogonal. $$\sum_{i=1}^n\hat{y}_i (y_i - \hat{y}_i) = 0$$
If we estimate the regression coefficients using some other method but the same model, such as…

Dave
- 28,473
- 4
- 52
- 104
19
votes
3 answers
(Why) Is absolute loss not a proper scoring rule?
Brier score is a proper scoring rule and is, at least in the binary classification case, square loss.
$$Brier(y,\hat{y}) = \frac{1}{N} \sum_{i=1}^N\big\vert y_i -\hat{y}_i\big\vert^2$$
Apparently this can be adjusted for when there are three or more…

Dave
- 28,473
- 4
- 52
- 104
18
votes
4 answers
What *is* an Artificial Neural Network?
As we delve into Neural Networks literature, we get to identify other methods with neuromorphic topologies ("Neural-Network"-like architectures). And I'm not talking about the Universal Approximation Theorem. Examples are given below.
Then, it makes…

Firebug
- 15,262
- 5
- 60
- 127
17
votes
2 answers
What is the support vector machine?
What IS the support vector machine? Can someone clarify my confusion?
Possible answers:
The SVM is the problem: given data $(x_n, y_n), n = 1, \ldots, N$
$$\min_{w, b}\frac{1}{2}||w||^2$$
$$\text{ subject to: } y_n(w \cdot x_n + b) \geq 1,…

Fraïssé
- 961
- 2
- 13
- 29