I am aware of this and this existing questions, as well as this issue on github. Unless I am missing something though, all these fail to explain how the example in the keras docs makes sense:
model = Sequential()
model.add(Embedding(1000, 64, input_length=10))
input_array = np.random.randint(1000, size=(32, 10))
model.compile('rmsprop', 'mse')
output_array = model.predict(input_array)
assert output_array.shape == (32, 10, 64)
Specifically, what is the target against which the mse is computed? In supervised variants, the target is the respective class of the inputs, and the error is backpropagated from the classification layer. In unsupervised variants it is the lexical context by which the model captures the distributional information (e.g. surrounding words or "middle" word for the word2vec embeddings).
What is the respective pipeline in keras' Embedding layer? Is this information simply omitted in the above example?