Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Prompt caching is functionally identical to snapshotting the model after it processed the prompt. And you need the KV cache for inference in any case so it doesn't even cost extra memory to keep it around, if every single inference task is going to have the same prompt suffix.


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: