Tensorflow models are not loading correctly
I’m currently trying to train a word2vec model for my company.
To do this, I used the code on https://github.com/tensorflow/models, specifically https://github.com/tensorflow/models/blob/master/tutorials/embedding/word2vec.py .
I downloaded a German Wikipedia dump and extracted text information from it. The task is to train a model using this data.
I work on a virtual machine with Ubuntu 16.04 installed and access to Tesla M60. Over the weekend, I trained the model and kept the checkpoints in a separate folder. At the end of the weekend, the model was able to answer 36% of the assessment questions I gave him (similar to the German question of the example “questions-word.txt”). After training, I want to load the model and run the evaluation task again.
To do this, I changed the code (except for the path change) in the following lines: I added
with tf. Graph().as_default(), tf. Session() as session:
saver = tf.train.import_meta_graph(opts.save_path + "/model.ckpt-288720426.meta")
saver.restore(session, tf.train.latest_checkpoint('./results'))
print("Model restored.")
with tf.device("/cpu:0"):
model = Word2Vec(opts, session)
model.read_analogies() # Read analogy questions
for _ in xrange(opts.epochs_to_train):
#model.train() # Process one epoch
model.eval() # Eval analogies.
I added two lines for loading the model (saver = …) and commented out the training line. Looking at the metadata and the latest checkpoint file along with the tensor board shows the trained model, but when I run the code, the evaluation result is 0.1% correct answer, which seems to me like the model is restarted with an untrained model. I expect the result to be 36%.
Can anyone tell me the mistakes I made in the code or even in my thoughts?
Solution
You may be on the phone
After the model is restored,
build_graph tf.global_variables_initializer().run().
So you basically load the weights and then overwrite them with initialization values, and your network starts from scratch.
I implemented recovery checkpoint using command line options for a small project using Latin and you can see the code here:
https://github.com/CarstenIsert/LatinLearner/blob/master/word2vec.py