Python – Use Word2vec with Tensorflow on Windows

Use Word2vec with Tensorflow on Windows… here is a solution to the problem.

Use Word2vec with Tensorflow on Windows

In this tutorial file passes Tensorflow finds the following line (line 45) to load the word2vec “extension”:

word2vec = tf.load_op_library(os.path.join(os.path.dirname(os.path.realpath(__file__)), 'word2vec_ops.so'))

I’m using Windows 10, as noted in this SO question. .so-files are used for Linux.

What is the equivalent extension loaded on Windows?

Also, I don’t understand why so much other stuff is included in Tensorflow when installing, but Word2Vec has to be built locally. In the documentation, Installing TensorFlow on Windows, there is no mention that these extensions must be built.

Is this the old practice that has now changed and everything comes with the installation? If so, how does that change apply to the word2vec module in the example?

Solution

Yes, it has changed! Tensorflow now includes a helper function tf.nn.embedding_lookup, which allows you to embed data very easily.

You can use it by doing something like this, i.e.

embeddings = tf. Variable(
    tf.random_uniform([vocabulary_size, embedding_size], -1.0, 1.0))

nce_weights = tf. Variable(
  tf.truncated_normal([vocabulary_size, embedding_size],
                      stddev=1.0 / math.sqrt(embedding_size)))
nce_biases = tf. Variable(tf.zeros([vocabulary_size]))

# Placeholders for inputs
train_inputs = tf.placeholder(tf.int32, shape=[batch_size])
train_labels = tf.placeholder(tf.int32, shape=[batch_size, 1])

embed = tf.nn.embedding_lookup(embeddings, train_inputs)

# Compute the NCE loss, using a sample of the negative labels each time.
loss = tf.reduce_mean(
  tf.nn.nce_loss(weights=nce_weights,
                 biases=nce_biases,
                 labels=train_labels,
                 inputs=embed,
                 num_sampled=num_sampled,
                 num_classes=vocabulary_size))
# We use the SGD optimizer.
optimizer = tf.train.GradientDescentOptimizer(learning_rate=1.0).minimize(loss)

for inputs, labels in generate_batch(...):
  feed_dict = {train_inputs: inputs, train_labels: labels}
  _, cur_loss = session.run([optimizer, loss], feed_dict=feed_dict)

The complete code is here .

In general, I’d be hesitant to rely too heavily on the tensorflow/models repository. Some parts of it are outdated. The main tensorflow/tensorflow repositories are better maintained.

Related Problems and Solutions