moz-hocr-edit uses the hOCR, which is a format for representing the output of OCR software, and is used by tools such as OCRopus (you can see the public specification for hOCR here). Basically it's a microformat, that is, it's HTML with some additional tags. Given some hOCR, moz-hocr-edit enables you to edit the OCR output line-by-line.
I've created a simple demo based upon Case 3368 Eatoniella Dall, 1876 and EATONIELLIDAE Ponder, 1965 (Mollusca, Gastropoda): proposed conservation. For the demo to work you will need to use the Firefox web browser with the moz-hocr-edit installed.
- Go to http://dl.dropbox.com/u/639486/hocr/80780.html
- You will see a simple HTML representation of the OCR text from "Case 3368 Eatoniella Dall, 1876 and EATONIELLIDAE Ponder, 1965 (Mollusca, Gastropoda): proposed conservation". I created this HTML from the original ABBYY FineReader XML from the Internet Archive.
- On the bottom right-hand of the Firefox browser window you should see hOCR. Click on it and select "Edit this hOCR document":
- Firefox will open a new tab that will look something like this:
- You can now edit individual lines of text, and see your edits applied to the HTML below.