tf.estimator shuffle – random seed?
When I run tf.estimator.LinearRegressor
repeatedly, the results are slightly different each time. I guess it’s because shuffle=True
:
input_fn = tf.estimator.inputs.numpy_input_fn(
{"x": x_train}, y_train, batch_size=4, num_epochs=None, shuffle=True)
That’s fine for now, but when I try to make it deterministic by planting a random number generator in np
and tf
:
np.random.seed(1)
tf.set_random_seed(1)
The results are still slightly different each time. What am I missing?
Solution
tf.set_random_seed
settings Graph-level seeds, but it’s not the only source of randomness, as there’s also an operation-level seed that needs to be provided for each OP.
Unfortunately, tf.estimator.inputs.numpy_ input_fn
doesn’t provide the seed
parameter as well as shuffle
to pass them to the underlying operation ( source code )。 Therefore, _enqueue_data
function always gets seed=None
, which will reset any seeds you set in advance. Interestingly, by the way, many low-level feed functions use standard python random.seed
for shuffle, rather than tensorflow random (see _ArrayFeedFn
, _OrderedDictNumpyFeedFn
, etc.).
Summary: There is currently no way to guarantee stable execution of shuffle=True
, at least with the current API. Your only option is to scramble the data yourself and pass shuffle=False
.