1 min readAug 2, 2020
Thanks for this really helpful article.
I have a question. Why is the shape of the gradient of softmax differs from the input of softmax? How do we apply the softmax's gradient to it's input, i mean, how to update the gradients if they're in different dimensions ?