Machine learning when there are two "answers"

Question

I have a problem that I am trying to use machine learning on. At a very high level, I am looking to do a transformation where in the training data I have x and it can map to either 1 or 10. I also have y and it can map to either 2 or 20.

When I use LSTMs or RNNs on the data is seems to be giving me the average of 1 and 10 or 2 and 20. But what I want is for the network to choose one of the values and not average them.

Is this not a good problem for machine learning or should I use a different model like random forest?

What exactly is your data? What does the numbers mean? Why can't you map those pairs of values into single labels? — Tim, May 27 '19 at 18:29
if you want to choose then you could just use logistic regression (ie logistic output function..) — seanv507, May 27 '19 at 18:59
@Tim its for placing furniture in a room. So for example some times we want the couch placed on the left hand side of the room. Other times we want it placed on the right side of the room. We never want it placed in the center of the room. I was trying to simplify the problem but I should have just given all the info. — Alexis, May 27 '19 at 22:48
@Alexis it seems to me your neural network has exactly the same problem as we did. It lacks an information that the position in between is not "desirable." Can you signal this undesirability through data or in some other way? — Nino Rode, May 28 '19 at 22:04

Tim · Accepted Answer · 2019-05-29T07:09:40.600

Given the information in your comment, that the labels are about

[...] placing furniture in a room. So for example some times we want the couch placed on the left hand side of the room. Other times we want it placed on the right side of the room. We never want it placed in the center of the room.

then you can treat them as separate categories like "right side of the room", "center of the room", etc., but much better approach would be to map those categories to two-dimensional coordinates, for example $x \in \{-1, 0, +1\}$ for left, center, right, and $y \in \{-1, 0, +1\}$ for bottom, center, top.

If you want to make the predictions like "the piece of furniture can be either on right side of the room, or on the left side, then you want to be able to predict bi-modal distribution. One of the approaches to such problems would be to treat the outcome variable (or variables) as a mixture distribution. Mixture distribution can be multi-modal, and what your model would predict is the probability of the predicted variables to be in a particular region of the variable space. In your case, you could use mixture density network (Bishop, 1994), using model like

$$\begin{align} \boldsymbol{h} &= \operatorname{LSTM}(\mathbf{Z}) \\ \boldsymbol{\mu} &= g_\mu(\boldsymbol{h}) \\ \boldsymbol{\Sigma} &= g_\sigma(\boldsymbol{h}) \\ \boldsymbol{\pi} &= g_\pi(\boldsymbol{h}) \\ (x, y) &\sim \sum_{i=1}^k \pi_i \,\mathcal{N}(\boldsymbol{\mu}_i, \boldsymbol{\Sigma}_i) \end{align}$$

where $g_\mu$, $g_\sigma$, and $g_\pi$ are sub-networks (e.g. dense layer, followed by activation function) mapping the latent variables $\boldsymbol{h}$ to mean $\boldsymbol{\mu}$, covariance $\boldsymbol{\Sigma}$, and mixing proportions $\boldsymbol{\pi}$ of the mixture of bivariate normal distributions, and $\mathbf{Z}$ are the features. Notice that this model would enable us to assume correlation between the dimensions, so if something usually appears on "top left" corner, then the model would be able to learn such relation. This is a simplification of MDN-RNN network described by Ha and Schmidhuber (2018).

_{Bishop, C.M. (1994). Mixture Density Networks. Technical Report NCRG/4288, Aston University, Birmingham, UK.

Ha, D., and Schmidhuber, J. (2018). Recurrent world models facilitate policy evolution. In Advances in Neural Information Processing Systems (pp. 2450-2462).}

Great answer! Another alternative might be treating it as a multi-label classification problem, with X and Y being both binary/ternary variables. — George, May 28 '19 at 21:35
@George yes, but the above approach you explicitly assume the objects to lie on the 2D coordinates, so assume that the position does matter and there can be spatial correlation. — Tim, May 28 '19 at 21:50
@tim Firstly, thank you so much for this amazing answer. I have been reading up on it and all the links. Does the mixture distribution learn the interdependence between different objects? So I should clarify again the data (I'm sorry!). The data has multiple objects in it. For example a couch, a bed, and a nightstand, each with a (x, y) position. Will this try and learn that the bed and the nightstand should be placed by each other? — Alexis, May 29 '19 at 22:09
@Alexis this would need to be adapted. Now it is about possible position of an object that is described by $k$-component mixture distribution ($k$ modes). For multiple positions you would need multiple such "units". I think that the paper by Ha and Schmidhuber describes your case. — Tim, May 30 '19 at 04:44
@Tim So in the multiple piece of furniture example, `k` would refer to the number of piece distinct types of furniture in the data set? So with couch, bed, nightstand, it would be 3? I read through the Ha and Schmidhuber but didn't see the k-component mixture :/ — Alexis, May 30 '19 at 17:20
@Alexis no. If you have $m$ pieces of furniture, then you would have $m$ such "cells", each consisting of $k$ components. $k$ is for number of distinct locations that are tracked. You would also $m$ of latent vectors $h$ etc. So my description is about one piece of furniture, with more of them, each of the elements get extra dimensions. — Tim, May 30 '19 at 18:14
As about the paper, they use very similar architecture. They don't go into the details, but that's when they say that they use mixture density network on top of RNN. There's more details in references, accompanying blog posts, code etc if you want more details. — Tim, May 30 '19 at 18:18

Machine learning when there are two "answers"

1 Answers1

Linked