Community Wishlist Survey 2020/Archive/hOCR should work for all wikisource

From Meta, a Wikimedia project coordination wiki
Jump to navigation Jump to search
Random proposal◄ Archive  The survey has concluded. Here are the results!

hOCR should work for all wikisource

NoN Merged into Community Wishlist Survey 2020/Wikisource/New OCR tool.

  • Problem: hOCR is not working for Non-latin language Wikisource. Presently PheTool hOCR is creating a Tesseract OCR text layer for all latin language Wikisource. For Indic Wikisource, We have a temporary properity Google OCR to do this. So I am proposing this Phetools works for all Non latin wikisource including 12 Indic Wikisource.
  • Who would benefit: All Non latin Wikisource contributor.
  • Proposed solution: Just impliment the same as like enws, frws and creat OCR text layer with updating langdata
  • More comments: This proposal was merged into Community Wishlist Survey 2020/Wikisource/New OCR tool.
  • Phabricator tickets:phab:T228594
  • Proposer: Jayantanth (talk) 16:56, 26 October 2019 (UTC)Reply[reply]


Phe's tool has suffered some serious difficulties recently and nobody seems to be able to solve them, see phab:T228594. That is why I have suggested replacing this external tool with a brand new tool that would be an integral part of MediaWiki and would not be dependent on availability of a specific single and unreachable volunteer (see the proposal New OCR tool). I suggest to merge our proposals. --Jan.Kamenicek (talk) 20:24, 26 October 2019 (UTC)Reply[reply]

@Jan.Kamenicek, thanks for your reply. Both proposals can be merge in one, if you have no issue. Jayantanth (talk) 07:23, 27 October 2019 (UTC)Reply[reply]
@Jayantanth: I have merged it, can you check please, whether I worded it properly there? Thanks! --Jan.Kamenicek (talk) 11:59, 27 October 2019 (UTC)Reply[reply]