Python – tf.estimator shuffle – random seed?

tf.estimator shuffle – random seed?… here is a solution to the problem.

tf.estimator shuffle – random seed?

When I run tf.estimator.LinearRegressor repeatedly, the results are slightly different each time. I guess it’s because shuffle=True:

input_fn = tf.estimator.inputs.numpy_input_fn(
    {"x": x_train}, y_train, batch_size=4, num_epochs=None, shuffle=True)

That’s fine for now, but when I try to make it deterministic by planting a random number generator in np and tf:

np.random.seed(1)
tf.set_random_seed(1)

The results are still slightly different each time. What am I missing?

Solution

tf.set_random_seed settings Graph-level seeds, but it’s not the only source of randomness, as there’s also an operation-level seed that needs to be provided for each OP.

Unfortunately, tf.estimator.inputs.numpy_ input_fn doesn’t provide the seed parameter as well as shuffle to pass them to the underlying operation ( source code )。 Therefore, _enqueue_data function always gets seed=None, which will reset any seeds you set in advance. Interestingly, by the way, many low-level feed functions use standard python random.seed for shuffle, rather than tensorflow random (see _ArrayFeedFn, _OrderedDictNumpyFeedFn, etc.).

Summary: There is currently no way to guarantee stable execution of shuffle=True, at least with the current API. Your only option is to scramble the data yourself and pass shuffle=False.

Related Problems and Solutions