From print to an OpenOffice textdocument in three steps:
Note that a little command-line proficiency is required here.
scan all documents
The printed text consisted of A4 pages as well as small (A5) booklets.
The A4 pages were scanned in full, booklets two pages at a time. Scanner settings: 300 dpi, B/W (1 bit), saved as .tif files with a sequencenumber suffixed filename
process the .tif files in batch with this shell script. Adjust it to your specific needs, e.g. the pixelsize of the scan images.
heavily relying on the spellchecker, use OpenOffice to cleanup the raw .txt and save the final text as .odt
Tesseract 3.00, Macports package
XDialog 2.3.1, idem
Sample text (dutch)
The scanned original: (textsnippet taken from this lecture)
Tesseract OCR result:
The final text after editing: