Python – Tensorflow: Use different expressions for forward and backward passes

Tensorflow: Use different expressions for forward and backward passes… here is a solution to the problem.

Tensorflow: Use different expressions for forward and backward passes

I

have a tensorflow expression in which I want to use different expressions, depending on whether I’m evaluating forward or backward (gradient) passes. Specifically, I want to ignore the effect of some randomness (noise) added to the network during backward passing.

This is a simple example

import numpy as np
import tensorflow as tf

x = tf.placeholder(tf.float32)
y = x**2
u = tf.random_uniform(tf.shape(x), minval=0.9, maxval=1.1)
yu = y * u
z = tf.sqrt(yu)
g = tf.gradients(z, x)[0]

with tf. Session() as sess:
    yv, yuv, zv, gv = sess.run([y,yu,z,g], {x: [-2, -1, 1]})

print(yv)
print(yuv)
print(zv)
print(gv)

The output is similar

[4. 1. 1.]
[4.1626534 0.9370764 1.0806011]
[2.0402582  0.96802706 1.0395197 ]
[-1.0201291  -0.96802706  1.0395197 ]

The last value here is the derivative of z relative to x. I want them to not include the multiplication noise term u, i.e. for these input values of x, they should always be [-1, -1, 1].

Is there a way to do something like this using only Python? I know I can create a custom operator in C and define a custom gradient for it, but I want to avoid this as much as possible.

Also, I’d like to use it as part of the Keras layer, so a Keras-based solution would be an alternative (i.e., if different expressions could be defined for forward and backward passes through the Keras layer). This does mean that just defining the second expression z2 = tf.sqrt(y) and calling gradients on it is not a solution for me because I don’t I don’t know how to put it into Keras (because in Keras it would be part of a very long computational graph).

Solution

The short answer is that Sergey Ioffe’s trick you mentioned above only works when the graph is finally applied, just before the gradient calculation.

I’m assuming you tried the following but it won’t work:

yu_fixed = tf.stop_gradient(yu - y) + y
z = tf.sqrt(yu_fixed)

This will still output a gradient of random contamination.

To understand why, let’s move on to the gradient calculation. Let’s use s as shorthand for tf.stop_gradient. The way it works is that when TensorFlow needs to compute s(expr), it only returns expr, but when it needs to compute the gradient of s(expr), it returns 0.

We want to calculate the gradient of z = sqrt(s(yu - y) + y). Now, because
\frac{\partial \sqrt{f(x)}}{\partial x} = \frac{1}{2\sqrt{f(x)}} \frac{\partial f(x)}{\partial x} ,
We find that the gradient of z contains both the derivative term of s() and the term of s() itself. The latter term does not zero the s() part, so the calculated derivative of z will depend (in some strange and incorrect way) on the value yu. This is why the above solution still contains randomness in the gradient.

As far as I know, the only way to fix this is to apply Ioffe’s trick as the final stage before tf.gradient. In other words, you will get the expected result if you do the following:

x = tf.placeholder(tf.float32)
y = x**2
u = tf.random_uniform(tf.shape(x), minval=0.9, maxval=1.1)
yu = y * u
z = tf.sqrt(yu)
z_fixed = tf.stop_gradient(z - tf.sqrt(y)) + tf.sqrt(y)
g = tf.gradients(z_fixed, x)[0]

with tf. Session() as sess:
    yv, yuv, zv, gv = sess.run([y,yu,z_fixed,g], {x: [-2, -1, 1]})

print(yv)
print(yuv)
print(zv)
print(gv)

Output:

[ 4.  1.  1.]
[ 3.65438652  1.07519293  0.94398856]
[ 1.91164494  1.03691506  0.97159076]
[-1. -1.  1.]

Related Problems and Solutions