Deepak S M
1 min readAug 2, 2020

--

Thanks for this really helpful article.

I have a question. Why is the shape of the gradient of softmax differs from the input of softmax? How do we apply the softmax's gradient to it's input, i mean, how to update the gradients if they're in different dimensions ?

--

--

Deepak S M

Software Engineer. Math & Physics enthusiast. Dreaming of a survival-free world