Grants:IdeaLab/Djvu text layer editor

From Meta, a Wikimedia project coordination wiki
status: idea
project:
please add a title
idea creator:
project contact:
alex.brollo(_AT_)gmail.com
participants:
summary:
Use some of VE features to edit djvu text layer
created on: 12:10, 13 March 2014

Project idea[edit]

What is the problem you're trying to solve?[edit]

Wikisource makes a large use of OCR text layer, but effectively uses just a little bit of it (naked text). Djvu text layer contains much more information (words, lines, paragraphs, regions, columno, page text coordinates), unluckily better exportable in a lisp-like format or as xml instead of hOCR.

What is your solution?[edit]

  • To test VE or other WYSIWYG simpler html/xml editors for editing text only, saving information wrapped into xml tags;
  • to test conversion extraction/upload of text layer into djvu files using a simple web interface.

Ideas for a test tool[edit]

A test could be done with existent tools:

  • djvuLibre (running into Tool Labs), and particularly:
    • djvutoxml, that extracts internal mapped text of djvu pages as an xml file;
    • djvuxmlparser, that loads back modified mapped text into djvu file;
  • tinyEditor, to edit xml text with a WYSIWYG comfortable interface (xml tags are hidden, only editable text is shown into any html textarea;
  • a little bit of cgi from Tool Labs to manage such a web editing interface.

Project goals[edit]

  • to split proofreading into two steps:
    • djvu text editing (saving the result into djvu text layer)
    • text formatting

Get involved[edit]

Welcome, brainstormers! Your feedback on this idea is welcome. Please click the "discussion" link at the top of the page to start the conversation and share your thoughts.

See also[edit]


Does this idea need funding? Learn more about WMF grantmaking. Or, expand to turn this idea into an Individual Engagement Grant proposal
Step 1. Change your infobox from IdeaLab to IEG:

Step 2. Create the rest of your IEG proposal:

Ready to create the rest of your proposal?
Use the button below just once to create the remaining sections you'll need!