Keras a hot embeds before an LSTM
Let’s say I have a training dataset as multiple sequences with padding length = 40 and dictionary length 80, for example, example = [0, 0, 0, 3, 4, 9, 22, ...]
I want to input it into the LSTM layer. What I want to do is apply a hot encoder to the sequence, e.g. example_after_one_hot.shape = (40, 80).
Is there a keras layer capable of doing this? I’ve tried Embedding
, however, it doesn’t seem to be a single encoding.
Edit: Another approach is to use an emplacement layer. Given that the dictionary only has 80 different keys, how should I set the output of the Embedding layer?
Solution
I think you’re looking for preprocessing tasks that aren’t strictly part of your network.
Keras has a one-hot text preprocessing feature that might help. Take a look at Keras text preprocessing. You can do something like….
X = numpy.zeros(shape=(len(sentences), 40, 80), dtype='float32')
for i, sent in enumerate(sentences):
for j, word in enumerate(sent):
X[i, j, word] = 1.0
This will give you a one-time encoding of a two-dimensional array of “sentences” where each word in the array is an integer less than 80. Of course, data doesn’t have to be a sentence, it can be any type of data.
Note that the embedding layer is used to learn a distributed representation of data, not to put data into a single format.