Unable to determine the shape of numpy array in loop containing transpose operation
I’ve been trying to create a small neural network to learn softmax functions, article from the following site: https://mlxai.github.io/2017/01/09/implementing-softmax-classifier-with-vectorized-operations.html
It is suitable for a single iteration. However, when I create a loop to train a network with updated weights, I get the following error: ValueError: Operand cannot be broadcast with shape (5,10) (1,5) (5,10). I have attached a screenshot of the output here.
Debugging this issue, I found that np.max() returns arrays of shapes (5,1) and (1,5) in different iterations, even though the axes are set to 1. Please help me determine what the following code is wrong.
import numpy as np
N = 5
D = 10
C = 10
W = np.random.rand(D,C)
X = np.random.randint(255, size = (N,D))
X = X/255
y = np.random.randint(C, size = (N))
#print (y)
lr = 0.1
for i in range(100):
print (i)
loss = 0.0
dW = np.zeros_like(W)
N = X.shape[0]
C = W.shape[1]
f = X.dot(W)
#print (f)
print (np.matrix(np.max(f, axis=1)))
print (np.matrix(np.max(f, axis=1)). T)
f -= np.matrix(np.max(f, axis=1)). T
#print (f)
term1 = -f[np.arange(N), y]
sum_j = np.sum(np.exp(f), axis=1)
term2 = np.log(sum_j)
loss = term1 + term2
loss /= N
loss += 0.5 * reg * np.sum(W * W)
#print (loss)
coef = np.exp(f) / np.matrix(sum_j). T
coef[np.arange(N),y] -= 1
dW = X.T.dot(coef)
dW /= N
dW += reg*W
W = W - lr*dW
Solution
In your first iteration, W
is the instance and shape of np.ndarray
(D, C).
f inherits ndarray, so when you do
np.max(f, axis =
1), it returns an ndarray
shape (D,),
which np.matrix()
transforms into (1, D).
Then transpose it to
(D, 1).
But in your next iteration, W is an instance of
np.matrix
(which inherits from W = W - lr*
dW in dW
). f then inherits np.matrix , and np.max (f
, axis = 1) returns the np.matrix shape (D, 1), which is unphased by np.matrix
() and becomes shape
(1, D
)
In the . After T
To resolve this issue, make sure you don’t mix np.ndarray with np.matrix.Define everything as np.matrix from the beginning (i.e. W = np.matrix(np.random.rand(D,C))
or use
keepdims
Maintain your shaft like this:
f -= np.max(f, axis = 1, keepdims = True)
This will allow you to keep all your 2D content without having to convert to np.matrix.
(Do this for sum_j
as well).