Python – tensorflow gradients about matrices

tensorflow gradients about matrices… here is a solution to the problem.

tensorflow gradients about matrices

FYI, I’m trying to implement a gradient descent algorithm using Tensorflow.

I have a matrix X

[ x1 x2 x3 x4 ]
[ x5 x6 x7 x8 ]

I multiply by some eigenvector Y to get Z

      [ y1 ]
Z = X [ y2 ]  = [ z1 ]
      [ y3 ]    [ z2 ]
      [ y4 ]

Then I will pass Z through the softmax function and take the logarithm. I refer to the output matrix as W.

All of this is implemented as follows (with a little boilerplate added so it works).

sess = tf. Session()
num_features = 4
num_actions = 2

policy_matrix = tf.get_variable("params", (num_actions, num_features))
state_ph = tf.placeholder("float", (num_features, 1))
action_linear = tf.matmul(params, state_ph)
action_probs = tf.nn.softmax(action_linear, axis=0)
action_problogs = tf.log(action_probs)

W (corresponding to action_problogs) looks like

[ w1 ]
[ w2 ]

I want to

find the gradient of w1 relative to the matrix X – that is, I want to calculate

          [ d/dx1 w1 ]
d/dX w1 =      .
               .
          [ d/dx8 w1 ]

(It’d better still look like a matrix so I can add it to X, but I don’t really care about that)

I hope tf.gradients solves the problem. I calculate the “gradient” like this

problog_gradient = tf.gradients(action_problogs, policy_matrix)

However, when I checked problog_gradient, this is the result I got

[<tf. Tensor 'foo_4/gradients/foo_4/MatMul_grad/MatMul:0' shape=(2, 4) dtype=float32>]

Note that this has the exact same shape as X, but it shouldn’t. I want to get a list with two gradients, each corresponding to 8 elements. I suspect I got two gradients, but each gradient is related to four elements.

I’m new to TensorFlow, so I’m grateful and explain what’s going on and how I can achieve the behavior I want.

Solution

The gradient requires a scalar function, so by default, it sums the entries. This is the default behavior because all gradient descent algorithms require this type of functionality, and stochastic gradient descent (or its variants) is the preferred method inside Tensorflow. You won’t find any more advanced algorithms (like BFGS or something else) because they haven’t been implemented at all (and they require true Jacobian determinants, but they haven’t been implemented yet). In terms of its value, this is a valid Jacobian implementation I wrote:

def map(f, x, dtype=None, parallel_iterations=10):
    '''
    Apply f to each of the elements in x using the specified number of parallel iterations.

Important points:
    1. By "elements in x", we mean that we will be applying f to x[0],... x[tf.shape(x)[0]-1].
    2. The output size of f(x[i]) can be arbitrary. However, if the dtype of that output
       is different than the dtype of x, then you need to specify that as an additional argument.
    '''
    if dtype is None:
        dtype = x.dtype

n = tf.shape(x)[0]
    loop_vars = [
        tf.constant(0, n.dtype),
        tf. TensorArray(dtype, size=n),
    ]
    _, fx = tf.while_loop(
        lambda j, _: j < n,
        lambda j, result: (j + 1, result.write(j, f(x[j]))),
        loop_vars,
        parallel_iterations=parallel_iterations
    )
    return fx.stack()

def jacobian(fx, x, parallel_iterations=10):
    '''
    Given a tensor fx, which is a function of x, vectorize fx (via tf.reshape(fx, [-1])),
    and then compute the jacobian of each entry of fx with respect to x.
    Specifically, if x has shape (m,n,...,p), and fx has L entries (tf.size(fx)=L), then
    the output will be (L,m,n,...,p), where output[i] will be (m,n,...,p), with each entry denoting the
    gradient of output[i] wrt the corresponding element of x.
    '''
    return map(lambda fxi: tf.gradients(fxi, x)[0],
               tf.reshape(fx, [-1]),
               dtype=x.dtype,
               parallel_iterations=parallel_iterations)

While this implementation works, it doesn’t work when you try to nest it.
For example, if you try to compute Hessian using jacobian( jacobian(...)), you get some weird errors. This is tracked as Issue 675 awaiting a responseWhy this throws an error. I believe there is a deep-rooted bug in the while loop implementation or gradient implementation, but I don’t really know.

Anyway, if you just need a jacobian, try the code above.

Related Problems and Solutions