> even proprietary content like the books themselves This definitely raises an i... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		joenot443 30 days ago \| parent \| context \| favorite \| on: Claude Opus 4.6 > even proprietary content like the books themselves This definitely raises an interesting question. It seems like a good chunk of popular literature (especially from the 2000s) exists online in big HTML files. Immediately to mind was House of Leaves, Infinite Jest, Harry Potter, basically any Stephen King book - they've all been posted at some point. Do LLMS have a good way of inferring where knowledge from the context begins and knowledge from the training data ends?

rendx 30 days ago [–]

> It seems like a good chunk of popular literature (especially from the 2000s) exists online in big HTML files

Anna's Archive alone claims to currently publicly host 61,654,285 books, more than 1PB in total.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact