Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
yobbo
on Aug 23, 2023
|
parent
|
context
|
favorite
| on:
Kullback–Leibler divergence
Also, the gradients of "softmax loss" and mean-square-error loss are the same. The network learns to optimize the same function, up until the output activation.
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: