It's precisely those idiosyncrasies of early modern orthography which make it di... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		coroxout on Jan 30, 2015 \| parent \| context \| favorite \| on: Thousands of early English books released online t... It's precisely those idiosyncrasies of early modern orthography which make it difficult to use an off-the-shelf OCR package, which is presumably why these are hand-transcribed instead. Perhaps there is a specialist antiquarian OCR package which can deal with long s, interchangeable u and v, non-standardised spelling, etc, but I have yet to come across one.

acdha on Jan 30, 2015 [–]

Have you looked at The Early Modern OCR project? My understanding is that they're working on exactly that as well as simply better tools for reviewing & retraining on a large scale:

http://emop.tamu.edu/

coroxout on Jan 31, 2015 | [–]

No, I hadn't, and am grateful for the link - thank you!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact