Python – Tensorflow models are not loading correctly

Tensorflow models are not loading correctly… here is a solution to the problem.

Tensorflow models are not loading correctly

I’m currently trying to train a word2vec model for my company.
To do this, I used the code on https://github.com/tensorflow/models, specifically https://github.com/tensorflow/models/blob/master/tutorials/embedding/word2vec.py .

I downloaded a German Wikipedia dump and extracted text information from it. The task is to train a model using this data.

I work on a virtual machine with Ubuntu 16.04 installed and access to Tesla M60. Over the weekend, I trained the model and kept the checkpoints in a separate folder. At the end of the weekend, the model was able to answer 36% of the assessment questions I gave him (similar to the German question of the example “questions-word.txt”). After training, I want to load the model and run the evaluation task again.
To do this, I changed the code (except for the path change) in the following lines: I added

    with tf. Graph().as_default(), tf. Session() as session:
       saver = tf.train.import_meta_graph(opts.save_path + "/model.ckpt-288720426.meta")
       saver.restore(session, tf.train.latest_checkpoint('./results'))
       print("Model restored.")
       with tf.device("/cpu:0"):
          model = Word2Vec(opts, session)
          model.read_analogies()  # Read analogy questions
       for _ in xrange(opts.epochs_to_train):
          #model.train()  # Process one epoch
          model.eval()  # Eval analogies.

I added two lines for loading the model (saver = …) and commented out the training line. Looking at the metadata and the latest checkpoint file along with the tensor board shows the trained model, but when I run the code, the evaluation result is 0.1% correct answer, which seems to me like the model is restarted with an untrained model. I expect the result to be 36%.

Can anyone tell me the mistakes I made in the code or even in my thoughts?

Solution

You may be on the phone
After the model is restored, build_graph tf.global_variables_initializer().run(). So you basically load the weights and then overwrite them with initialization values, and your network starts from scratch.

I implemented recovery checkpoint using command line options for a small project using Latin and you can see the code here:
https://github.com/CarstenIsert/LatinLearner/blob/master/word2vec.py

Related Problems and Solutions