Unable to determine the shape of numpy array in loop containing transpose operation
I’ve been trying to create a small neural network to learn softmax functions, article from the following site: https://mlxai.github.io/2017/01/09/implementing-softmax-classifier-with-vectorized-operations.html
It is suitable for a single iteration. However, when I create a loop to train a network with updated weights, I get the following error: ValueError: Operand cannot be broadcast with shape (5,10) (1,5) (5,10). I have attached a screenshot of the output here.
Debugging this issue, I found that np.max() returns arrays of shapes (5,1) and (1,5) in different iterations, even though the axes are set to 1. Please help me determine what the following code is wrong.
import numpy as np N = 5 D = 10 C = 10 W = np.random.rand(D,C) X = np.random.randint(255, size = (N,D)) X = X/255 y = np.random.randint(C, size = (N)) #print (y) lr = 0.1 for i in range(100): print (i) loss = 0.0 dW = np.zeros_like(W) N = X.shape C = W.shape f = X.dot(W) #print (f) print (np.matrix(np.max(f, axis=1))) print (np.matrix(np.max(f, axis=1)). T) f -= np.matrix(np.max(f, axis=1)). T #print (f) term1 = -f[np.arange(N), y] sum_j = np.sum(np.exp(f), axis=1) term2 = np.log(sum_j) loss = term1 + term2 loss /= N loss += 0.5 * reg * np.sum(W * W) #print (loss) coef = np.exp(f) / np.matrix(sum_j). T coef[np.arange(N),y] -= 1 dW = X.T.dot(coef) dW /= N dW += reg*W W = W - lr*dW
In your first iteration,
W is the instance and shape
f inherits ndarray, so when you do
np.max(f, axis = 1), it returns an
ndarray shape (D
into (1, D).
Then transpose it to
But in your next iteration, W
is an instance of
np.matrix (which inherits from
W = W - lr*dW in
dW). f then inherits np.matrix , and np.max (
f, axis = 1) returns the np.matrix shape (D, 1), which is unphased by
np.matrix() and becomes shape
. After T
To resolve this issue, make sure you don’t mix np.ndarray with np.matrix.Define everything as np.matrix from the beginning (i.e.
W = np.matrix(np.random.rand(D,C)) or
keepdims Maintain your shaft like this:
f -= np.max(f, axis = 1, keepdims = True)
This will allow you to keep all your 2D content without having to convert to
np.matrix. (Do this for
sum_j as well).