Python – Hyperparameter optimization in Tensorflow

Hyperparameter optimization in Tensorflow… here is a solution to the problem.

Hyperparameter optimization in Tensorflow

I’m using Bayesian optimization in Tensorflow for hyperparameter tuning for my convolutional neural network (CNN). I’m getting this error:

ResourceExhaustedError (see above for traceback): OOM when allocating
tensor with shape[4136,1,180,432] and type float on
/job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc

I optimized these hyperparameters:

dim_batch_size = Integer(low=1, high=50, name='batch_size')
dim_kernel_size1 = Integer(low=1, high=75, name='kernel_size1')
dim_kernel_size2 = Integer(low=1, high=50, name='kernel_size2')
dim_depth = Integer(low=1, high=100, name='depth')
dim_num_hidden = Integer(low=5, high=1500, name='num_hidden')
dim_num_dense_layers = Integer(low=1, high=5, name='num_dense_layers')
dim_learning_rate = Real(low=1e-6, high=1e-2, prior='log-uniform',
                         name='learning_rate')
dim_activation = Categorical(categories=['relu', 'sigmoid'],
                             name='activation')
dim_max_pool = Integer(low=1, high=100, name='max_pool')

dimensions = [dim_batch_size,
              dim_kernel_size1,
              dim_kernel_size2,
              dim_depth,
              dim_num_hidden,
              dim_num_dense_layers,
              dim_learning_rate,
              dim_activation,
              dim_max_pool]

It says that the resource has run out. Why is that?

Are too many hyperparameters optimized? Or are there some dimensions that don’t match? Or am I assigning a hyperparameter range that is more than allowed for proper operation?

Solution

OOM occurs because the model becomes too large when multiple hyperparameters are at the high end of the range. For example, suppose the batch size is about 50, the dim_num_hidden is about 1500, and so on. The number of hyperparameters does not matter, only a few hyperparameters are enough to break the model.

The specific tensor in the error message is [4136,1,180,432] or 1.2Gb if you use a 32-bit float as a parameter. That’s a lot, and it’s just one of the many tensors needed for NN training (e.g., doubling the number of forward and backward parameters, and therefore doubling the amount of memory). No wonder tensorflow fails because of OOM.

A special problem with Bayesian optimization for hyperparameter tuning is that the algorithm is likely to choose corner points of hyperspace, i.e. one point where the value is close to the minimum value in the range, and another point where the value is close to the maximum value in the range. See this question. Or you can intelligently calculate the model precisely before running it each iteration, but then the algorithm doesn’t optimize the batch size.

Related Problems and Solutions