Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The architecture is also important: there's a trade-off for MoE. There used to be a rough rule of thumb that a 35bxa3b model would be equivalent in smarts to an 11b dense model, give or take, but that's not been accurate for a while.
 help



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: