Grants:IdeaLab/Djvu text layer editor
What is the problem you're trying to solve?
Wikisource makes a large use of OCR text layer, but effectively uses just a little bit of it (naked text). Djvu text layer contains much more information (words, lines, paragraphs, regions, columno, page text coordinates), unluckily better exportable in a lisp-like format or as xml instead of hOCR.
What is your solution?
- To test VE or other WYSIWYG simpler html/xml editors for editing text only, saving information wrapped into xml tags;
- to test conversion extraction/upload of text layer into djvu files using a simple web interface.
Ideas for a test tool
A test could be done with existent tools:
- djvuLibre (running into Tool Labs), and particularly:
- djvutoxml, that extracts internal mapped text of djvu pages as an xml file;
- djvuxmlparser, that loads back modified mapped text into djvu file;
- tinyEditor, to edit xml text with a WYSIWYG comfortable interface (xml tags are hidden, only editable text is shown into any html textarea;
- a little bit of cgi from Tool Labs to manage such a web editing interface.
- to split proofreading into two steps:
- djvu text editing (saving the result into djvu text layer)
- text formatting
Welcome, brainstormers! Your feedback on this idea is welcome. Please click the "discussion" link at the top of the page to start the conversation and share your thoughts.