Python – Keras a hot embeds before an LSTM

Keras a hot embeds before an LSTM… here is a solution to the problem.

Keras a hot embeds before an LSTM

Let’s say I have a training dataset as multiple sequences with padding length = 40 and dictionary length 80, for example, example = [0, 0, 0, 3, 4, 9, 22, ...] I want to input it into the LSTM layer. What I want to do is apply a hot encoder to the sequence, e.g. example_after_one_hot.shape = (40, 80). Is there a keras layer capable of doing this? I’ve tried Embedding, however, it doesn’t seem to be a single encoding.

Edit: Another approach is to use an emplacement layer. Given that the dictionary only has 80 different keys, how should I set the output of the Embedding layer?

Solution

I think you’re looking for preprocessing tasks that aren’t strictly part of your network.

Keras has a one-hot text preprocessing feature that might help. Take a look at Keras text preprocessing. You can do something like….

X = numpy.zeros(shape=(len(sentences), 40, 80), dtype='float32')
for i, sent in enumerate(sentences):
    for j, word in enumerate(sent):
        X[i, j, word] = 1.0

This will give you a one-time encoding of a two-dimensional array of “sentences” where each word in the array is an integer less than 80. Of course, data doesn’t have to be a sentence, it can be any type of data.

Note that the embedding layer is used to learn a distributed representation of data, not to put data into a single format.

Related Problems and Solutions