Python – Keras prints no output, high memory and CPU usage, and does not use the GPU when using tensorboard callbacks

Keras prints no output, high memory and CPU usage, and does not use the GPU when using tensorboard callbacks… here is a solution to the problem.

Keras prints no output, high memory and CPU usage, and does not use the GPU when using tensorboard callbacks

I had a weird situation in Keras that really scared me.
I’m trying to train CNNs with pre-trained Inception with extra convolution, global average pools, and dense layers. I’m using ImageDataGenerator to load data.

The data generator works fine, I’ve tested it. The model compiles well as well. But when I run fit_generator, there is no printout, the CPU is at 100% and the memory starts filling slowly until it overflows. Even though I have a GPU and use it multiple times in tensorflow (here the backend), it’s completely ignored by Keras.

Considering that batch size could be an issue, I set it to 1, but it didn’t fix the problem. The image size is 299×299, which is not too big anyway.

I’ll post the code below as a reference, although it doesn’t seem to me anything wrong :

def get_datagen():
    return ImageDataGenerator(rotation_range=30,
                        width_shift_range=0.2,
                        height_shift_range=0.2,
                        horizontal_flip=True,
                        fill_mode='nearest'
                        )

# Setup and compile the model.
model = InceptionV3(include_top=False, input_shape=(None, None, 3))

# Set the model layers to be untrainable
for layer in model.layers:
    layer.trainable = False

x = model.output
x = Conv2D(120, 5, activation='relu')(x)
x = GlobalAveragePooling2D()(x)
predictions = Activation('softmax')(x)

model_final = Model(inputs=model.inputs, outputs=predictions)

model_final.compile(optimizer='adam', loss='categorical_crossentropy',metrics=['accuracy'])

# Define the dataflow.
train_gen = get_datagen()
val_test_gen = get_datagen()

train_data = train_gen.flow_from_directory(train_folder, target_size=(299, 299), batch_size=1)
val_data = val_test_gen.flow_from_directory(validation_folder, target_size=(299, 299), batch_size=1)
test_data = val_test_gen.flow_from_directory(test_folder, target_size=(299, 299), batch_size=1)

train_size = train_data.n
val_size = val_data.n
test_size = test_data.n

# Define callbacks.
model_checkpoint = ModelCheckpoint('.. /models/dbc1/', monitor='val_accuracy', verbose=1, save_best_only=True)
early_stopping = EarlyStopping(monitor='val_accuracy', patience=3, verbose=1, mode='max')
tensorboard = TensorBoard(log_dir='.. /log/dbc1', histogram_freq=1, write_grads=True, )

model_final.fit_generator(train_data, steps_per_epoch=1, epochs=100, 
                          callbacks=[model_checkpoint, early_stopping, tensorboard],
                         validation_data=val_data, verbose=1)

Edit

It seems that tensor board callbacks are the problem here. When I delete it, everything works fine. Does anyone know why this is happening?

Solution

There seems to be a problem with histogram_freq=1 under certain conditions (possibly related to keras#3358 )。

You can try setting histogram_freq=0 and file an issue in the keras repository. You won’t have gradient histograms, but at least you can train:

model.fit(...,
          callbacks=[
              TensorBoard(log_dir='./logs/', batch_size=batch_size),
              ...
          ])

I’ve noticed that not all trained models have this issue. If the use of Inception v3 is not necessary, I recommend switching to another model. So far, I’ve found that the following code (adapted from your code, using VGG19) works for keras==2.1.2, tensorflow==1.4.1:

from keras.applications import VGG19
from keras.applications.vgg19 import preprocess_input

input_shape = (224, 224, 3)
batch_size = 1

model = VGG19(include_top=False, input_shape=input_shape)
for layer in model.layers:
    layer.trainable = False

x, y = model.input, model.output
y = Conv2D(2, 5, activation='relu')(y)
y = GlobalAveragePooling2D()(y)
y = Activation('softmax')(y)

model = Model(inputs=model.inputs, outputs=y)
model.compile('adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

g = ImageDataGenerator(rotation_range=30,
                       width_shift_range=0.2,
                       height_shift_range=0.2,
                       horizontal_flip=True,
                       preprocessing_function=preprocess_input)

train_data = g.flow_from_directory(train_folder,
                                   target_size=input_shape[:2],
                                   batch_size=batch_size)
val_data = g.flow_from_directory(validation_folder,
                                 target_size=input_shape[:2],
                                 batch_size=batch_size)
test_data = g.flow_from_directory(test_folder,
                                  target_size=input_shape[:2],
                                  batch_size=batch_size)

model.fit_generator(train_data, steps_per_epoch=1, epochs=100,
                    validation_data=val_data, verbose=1,
                    callbacks=[
                        ModelCheckpoint('./ckpt.hdf5',
                                        monitor='val_accuracy',
                                        verbose=1,
                                        save_best_only=True),
                        EarlyStopping(patience=3, verbose=1),
                        TensorBoard(log_dir='./logs/',
                                    batch_size=batch_size,
                                    histogram_freq=1,
                                    write_grads=True)])

Related Problems and Solutions