Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Also, the gradients of "softmax loss" and mean-square-error loss are the same. The network learns to optimize the same function, up until the output activation.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: