I have been trying to make a language model that predict the next word, but with the assumption that there are multiple "correct" answers.
Input: dictionary indices + document topic data for initial states Output: one-hot vector as long as the vocab size, expecting probability of the next word
And I thought it would be cool to be able to show multiple suggestions ex) next will be word1 with prob 0.5, word2 with prob 0.3, etc.
from tensorflow.keras.layers import Input, Embedding, Concatenate, Dense, LSTM, Dropout
from tensorflow.keras.models import Model
from tensorflow.keras.initializers import Constant
vocab_size = 3527
categ_count = 21
W_SIZE = 5
network_size = 4096
# Network
word_input = Input(W_SIZE, name="Word_Input")
categ_input = Input(categ_count, name="Category_Input")
word_embed = Embedding(vocab_size, 300, input_length=W_SIZE, embeddings_initializer=Constant(embedding_matrix), trainable = True, name="Embedding_Layer")(word_input)
dense_h = Dense(network_size, activation="relu", name="Initial_h")(categ_input)
dense_c = Dense(network_size, activation="relu", name="Initial_c")(categ_input)
lstm1 = LSTM(network_size, dropout=0.3, return_sequences=True, name="LSTM_1")(word_embed, initial_state=[dense_h, dense_c])
lstm2 = LSTM(network_size, dropout=0.3, name="LSTM_2")(lstm1)
output = Dense(vocab_size, activation="softmax", name="Output")(lstm2)
model = Model([word_input, categ_input], output)
model.summary()
However I discovered that traditional NNs are not good with getting probability distributions since they push toward confident predictions by punishing ambiguous predictions harshly. I have tried to look at some modules in tensorflow probability but I cannot figure out how they could be used in my model.
Is there a way to add on/edit to the my current code so that I can get probability of each words as an output?