Use Word2vec with Tensorflow on Windows
In this tutorial file passes Tensorflow finds the following line (line 45) to load the word2vec “extension”:
word2vec = tf.load_op_library(os.path.join(os.path.dirname(os.path.realpath(__file__)), 'word2vec_ops.so'))
I’m using Windows 10, as noted in this SO question. .so-files
are used for Linux.
What is the equivalent extension loaded on Windows?
Also, I don’t understand why so much other stuff is included in Tensorflow when installing, but Word2Vec has to be built locally. In the documentation, Installing TensorFlow on Windows, there is no mention that these extensions must be built.
Is this the old practice that has now changed and everything comes with the installation? If so, how does that change apply to the word2vec
module in the example?
Solution
Yes, it has changed! Tensorflow now includes a helper function tf.nn.embedding_lookup
, which allows you to embed data very easily.
You can use it by doing something like this, i.e.
embeddings = tf. Variable(
tf.random_uniform([vocabulary_size, embedding_size], -1.0, 1.0))
nce_weights = tf. Variable(
tf.truncated_normal([vocabulary_size, embedding_size],
stddev=1.0 / math.sqrt(embedding_size)))
nce_biases = tf. Variable(tf.zeros([vocabulary_size]))
# Placeholders for inputs
train_inputs = tf.placeholder(tf.int32, shape=[batch_size])
train_labels = tf.placeholder(tf.int32, shape=[batch_size, 1])
embed = tf.nn.embedding_lookup(embeddings, train_inputs)
# Compute the NCE loss, using a sample of the negative labels each time.
loss = tf.reduce_mean(
tf.nn.nce_loss(weights=nce_weights,
biases=nce_biases,
labels=train_labels,
inputs=embed,
num_sampled=num_sampled,
num_classes=vocabulary_size))
# We use the SGD optimizer.
optimizer = tf.train.GradientDescentOptimizer(learning_rate=1.0).minimize(loss)
for inputs, labels in generate_batch(...):
feed_dict = {train_inputs: inputs, train_labels: labels}
_, cur_loss = session.run([optimizer, loss], feed_dict=feed_dict)
The complete code is here .
In general, I’d be hesitant to rely too heavily on the tensorflow/models
repository. Some parts of it are outdated. The main tensorflow/tensorflow
repositories are better maintained.