Welcome to Etherpad!

This pad text is synchronized as you type, so that everyone viewing this page sees the same text. This allows you to collaborate seamlessly on documents!

Get involved with Etherpad at http://etherpad.orglooking at the hOCR format.


There is also GUI programs that use tesseract, packaged into a graphic interface. ie gscan2pdf.
it can also create hocr.
it's a highly structured html..