Jump to content

Community Wishlist Survey 2017/Wikisource/Specify transcription completion with more granularity

From Meta, a Wikimedia project coordination wiki

Specify transcription completion with more granularity

  • Problem: Currently Wikisource revision system only allow to give a global status completion for the transcription, when a more flexible solution allowing multiple extensible criteria set would be welcome.
  • Who would benefit: Anybody interesting in having having fine granularity information about transcription completion status.
    • For giving a very concrete example, one might one to study evolution of hyphenation on a Wikisource corpus subset. But currently, the hyphenation is often dropped in the transcription process, and even when it is taken into account, there is no obvious way to query which transcriptions does that, or not, nor having an overview of the completion status for this criteria in the work completion overview.
      In this precise case, part of the problem might be solved through categories. For example, on the French Wikisource, there is the template Césure, which allow one to transcribe the text with hyphenation. It thereafter render the text hyphened when consulted in the Page namespace, and unhyphened otherwise like when it is transcluded in the main namespace. This template might add a category stating the page use it. However, also adding the level to which the page is completed regarding hyphenation criteria would be cumbersome, and it wouldn't allow quick overview of progression on this topic in the Livre (Work) namespace.
    • Additionally, this would avoid that pages stay in an "uncompleted" status when the transcription was done and reviewed but only the layout was not yet done to match the original page as close as possible. That's an interesting information. Indeed the transcription is not globally complete, but for a mere reading through the transclusion in the main namespace, that is wrong to state that the work is not complete.
  • Proposed solution:
    • Allow user to input status of transcription along an extensible set of parameters, like rates of sign matching, layout matching, and so on for stuff like tables and trees which might have a proper rendering but an improper html structure or the opposite.
    • Allow user to switch criteria in the transcription completion overview of the work
    • Possibly, a "global completion" criteria should provide a pondered mix of all existing criteria
  • More comments: This also pertains the remark of @Alex brollo: above about the true digitalization of a edition.
  • Phabricator tickets: