For a project that involved dozens of print pages to be saved as textdocuments, I used the Tesseract OCR program together with the Netpbm library. Here’s what the three step workflow looks like.
Teksteditor Yudit
Hindi woordenlijstje
Back from Mussoorie Landour Language school. Copied the Hari Kitaab (Green Book) vocabulary lists up to and including lesson 18. Typed in Yudit and pasted into an OpenOffice document.
Download the Hindi vocabulary (PDF 135kB)