Community Wishlist Survey 2015/Wikisource/arz

This page is a translated version of the page Community Wishlist Survey 2015/Wikisource and the translation is 4% complete.

Voting has CLOSED! Thanks for your votes!

Allow Copy of Pages

As a Wikibooks user, I would like to see the feature "copy this page" (source, target) next to the move-feature. This would be great for normal user extending a book without touching the original one.

Example: Lets say you would like to improve a book about the programming language "Java". The book describes version 6 and you would like to start a new book on version 8. If it is worth having both books, than I see no way to preserve the original book and edit the text with respect to the original editors.

Example: Lets say two contributors have different but strong opinions about the future of a book. This typically leads to conflicts, where some user leave the community. The copy-the-page feature would be a good technical solution to settle the conflict.

-- Qwertz84 (talk) 23:17, 9 November 2015 (UTC)[reply]

Earlier discussion and endorsements

Endorsed Would be very helpful on Wikisource too... --Ernest-Mtl (talk) 03:35, 10 November 2015 (UTC)[reply]

This could be considered a use case for phab:T113004 (branching support), and there is a related proposal above. Cscott (talk) 18:47, 11 November 2015 (UTC)[reply]

@Qwertz84 and Ernest-Mtl: There is an Extension to duplicate pages (with their edit history): mw:Extension:Duplicator. It works fine. -- Reise Reise (talk) 17:21, 14 November 2015 (UTC)[reply]

الاصوات

Oppose per Wikipedia:Content forking. Also looks like a duplicate of 2015 Community Wishlist Survey/Miscellaneous#Support for version branching for pages. MER-C (talk) 10:46, 1 December 2015 (UTC)[reply]
Support Would ease the work to be done when we have to work on different editions of a same book. --Ernest-Mtl (talk) 14:53, 1 December 2015 (UTC)[reply]
Support, this sounds like a good way to deal with s:Annotations#Clean texts Beleg Tâl (talk) 15:16, 1 December 2015 (UTC)[reply]
Support. We sometimes have different versions of the same text on he.wikisource (I assume similar usefulness on other languages as well), and a copy function will definitely be useful in duplicating existing text to edit each page according to specific edition.--Nahum (talk) 19:27, 1 December 2015 (UTC)[reply]
Nahum, could you explain an example on he.wikisource? John Vandenberg (talk) 08:34, 2 December 2015 (UTC)[reply]
Nahum may have a different example in mind, but an obvious one to me is the text of the Hebrew Bible ("Old Testament"), which is available on HEWS with and without diacritics, and with and without cantillation marks. Ijon (talk) 09:54, 14 December 2015 (UTC)[reply]
Support --Usien6 (talk) 21:12, 1 December 2015 (UTC) // For Wikibooks only !![reply]
Support--Manlleus (talk) 15:57, 2 December 2015 (UTC)[reply]
Support --Le ciel est par dessus le toit (talk) 10:38, 3 December 2015 (UTC)[reply]
Support --Kasyap (talk) 15:40, 7 December 2015 (UTC)[reply]
Support ----Nrgullapalli (talk) 09:51, 8 December 2015 (UTC)[reply]

Better support for djvu files

Djvu files are a very interesting open format for full book digitalization, but mediawiki uses them only as "proofreading tools". On the contrary, they could be an interesting output of wikisource work, working about thoroughly editing of text layer and fully using their metadata. Even when they are used simply as "proofreading tools", much work could be done using details of text layer mapping, since it contains interesting suggestions about formatting (text alignment and indentation, text size, paragraphs, blocks...) presently not used at all.

Here a list of ideas:

to shift to indirect mode of djvu structure (so allowing a faster access to individual pages with a djvu reader extension of browsers);
to add a set of API requests, as an interface to all read-only DjvuLibre routines;
to add some API too for editing text layer by djvuxmlparser;
to allow minor changes of djvu files (i.e. editing some words into text layer) without the need of re-uploading the whole djvu file (the history of text edits could be saved with something like reviews history).

--Alex brollo (talk) 14:07, 10 November 2015 (UTC)[reply]

Earlier discussion and endorsements

Endorsed really really useful in wikisource context. --AlessioMela (talk) 14:12, 10 November 2015 (UTC)[reply]

Endorsed --Yann (talk) 16:30, 10 November 2015 (UTC)[reply]
EndorsedJayantanth (talk) 18:00, 11 November 2015 (UTC)[reply]
Endorsed -- George Orwell III (talk) 02:25, 22 November 2015 (UTC)[reply]
Endorsed I wanted to fix a detail on a djvu file once and I've realized I had a program for that on my old PC, but not on my new MAC. It was too complicated and I gave up. Every step for better support for djvu files is welcome.--Alexmar983 (talk) 21:14, 26 November 2015 (UTC)[reply]

Votes

Support Goldzahn (talk) 12:53, 30 November 2015 (UTC)[reply]
Support--Alexmar983 (talk) 16:42, 30 November 2015 (UTC)[reply]
Support --Yodin^T 17:34, 30 November 2015 (UTC)[reply]
Support John Vandenberg (talk) 01:29, 1 December 2015 (UTC)[reply]
Support Risker (talk) 04:15, 1 December 2015 (UTC)[reply]
Support--Kippelboy (talk) 05:41, 1 December 2015 (UTC)[reply]
Support--Candalua (talk) 08:57, 1 December 2015 (UTC)[reply]
Support --Accurimbono (talk) 09:05, 1 December 2015 (UTC)[reply]
Support--Shizhao (talk) 09:27, 1 December 2015 (UTC)[reply]
Support --Anika (talk) 09:35, 1 December 2015 (UTC)[reply]
Support --Xavier121 (talk) 09:42, 1 December 2015 (UTC)[reply]
Support--C.R. (talk) 12:02, 1 December 2015 (UTC)[reply]
Support--Jayantanth (talk) 14:33, 1 December 2015 (UTC)[reply]
Support--David Saroyan (talk) 14:46, 1 December 2015 (UTC)[reply]
Support --Arnd (talk) 15:00, 1 December 2015 (UTC)[reply]
Support--KRLS (talk) 15:07, 1 December 2015 (UTC)[reply]
Support Anubhab91 (talk) 15:54, 1 December 2015 (UTC)[reply]
Support--Arxivist (talk) 20:08, 1 December 2015 (UTC)[reply]
Support — George Orwell III (talk) 23:27, 1 December 2015 (UTC)[reply]
Support--Yodaspirine (talk) 12:59, 2 December 2015 (UTC)[reply]
Support — NickK (talk) 16:04, 2 December 2015 (UTC)[reply]
Support --AlessioMela (talk) 20:06, 2 December 2015 (UTC)[reply]
Support --Pymouss Tchatcher - 20:13, 4 December 2015 (UTC)[reply]
Support - Bcharles (talk) 22:16, 8 December 2015 (UTC)[reply]
Support --Davidpar (talk) 14:20, 14 December 2015 (UTC)[reply]

To implement a Internet Archive-like digitalization service

Just as many other wikisource users I appreciate a lot Internet Archive digitalization service, and I use it as deeply as I can (djvu files being only one from many uses of the rich file set that can be downloaded: collection of high-resolution jp2 images and abbyy xml being really extremely interesting).

I'd like that mediawiki should implement a similar digitalizing environment, but with a wiki approach and a wikisource-oriented philosophy, to share the best possible applications to pre-OCR jobs of book page images (splitting, rotating, cropping, dewrapping... in brief, "scantailoring" images), saving excellent lossless images from pre-OCR work; then the best possible OCR should be done, with ABBYY OCR engine or similar software if any, saving both text and full-detail OCR xml; then excellent images and best possible OCR text should be used to produce excellent seachable pdf and djvu files; finally - and this step would be really "wiki" - embedded text should be fixed by usual user revision work done into wikisource.

This is a bold dream; a less bold idea is, to get full access to best, heavy IA files (jp2.zip and abbyy xml) and to build tools for use them as thoroughly as possible.

--Alex brollo (talk) 07:08, 11 November 2015 (UTC)[reply]

Earlier discussion and endorsements

Endorsed. Jayantanth (talk) 17:42, 11 November 2015 (UTC)[reply] Endorsed --Yann (talk) 14:06, 17 November 2015 (UTC)[reply]

Votes

Support --Yodin^T 17:35, 30 November 2015 (UTC)[reply]
Support--Accurimbono (talk) 09:07, 1 December 2015 (UTC)[reply]
Oppose since I believe it is out of scope of the current process, there are solutions for this outside of Wikisource, although I acknowledge this is something useful. Alleycat80 (talk) 09:08, 1 December 2015 (UTC)[reply]
Comment A surprising statement. Are you a busy user of wikisource proofreading? --Alex brollo (talk) 17:30, 1 December 2015 (UTC)[reply]
Support--Shizhao (talk) 09:28, 1 December 2015 (UTC)[reply]
Support --Xavier121 (talk) 09:42, 1 December 2015 (UTC)[reply]
Support --Jayantanth (talk) 14:52, 1 December 2015 (UTC)[reply]
Support --Natkeeran (talk) 14:54, 1 December 2015 (UTC)[reply]
Support--KRLS (talk) 15:07, 1 December 2015 (UTC)[reply]
Support --Artem.komisarenko (talk) 19:28, 1 December 2015 (UTC)[reply]
Support--Barcelona (talk) 12:08, 2 December 2015 (UTC)[reply]
Support--Manlleus (talk) 15:57, 2 December 2015 (UTC)[reply]
Support — NickK (talk) 16:03, 2 December 2015 (UTC)[reply]
Support --AlessioMela (talk) 20:08, 2 December 2015 (UTC)[reply]
Support--Alexmar983 (talk) 23:23, 2 December 2015 (UTC)[reply]
Support - Wieralee (talk) 17:13, 4 December 2015 (UTC)[reply]
Support Lionel Scheepmans ^{✉ Contact} _{French native speaker, désolé pour ma dysorthographie} 23:09, 4 December 2015 (UTC)[reply]
Support --Yeza (talk) 10:45, 7 December 2015 (UTC)[reply]
Support --Kasyap (talk) 15:40, 7 December 2015 (UTC)[reply]
Support --Davidpar (talk) 14:20, 14 December 2015 (UTC)[reply]
Support --Rahmanuddin (talk) 15:11, 14 December 2015 (UTC)[reply]

Tool to upload from Panjab Digital Library

Panjab Digital Library has 1791 manuscripts and 8996 books on their website. All the manuscripts are in public domain and many books are also in public domain. Most of the manuscripts and books are in Punjabi language but some of them are in English, Hindi and Persian as well. They have digitized everything in form of images and they are not searchable. They have uploaded images in such a form that it is quite difficult to download them. I think a tool should be created to download all the manuscripts and books which are in Public domain. This will help in developing Punjabi Wikisource as well as Punjab related content on other Wikisources. This will again help in improving other projects as well.

--Satdeep Gill (talk) 07:31, 13 November 2015 (UTC)[reply]

Earlier discussion and endorsements

We have BUB which could be fixed and patched to support this source as well. The maintainers don't currently have time to make it work again. Nemo 09:49, 13 November 2015 (UTC)[reply]
Hi Nemo, patch to support Digital Library of India is also appreciated. -- Bodhisattwa (talk) 11:47, 14 November 2015 (UTC)[reply]

Endorsed. We definitely need these sources to be made available in multiple formats and multiple sources as government sites always go missing all of a sudden. An another reason to upload these sources to wikimedia project is to get away from technical difficulties in finding and researching further on the material through collaboration. Omshivaprakash (talk) 10:34, 14 November 2015 (UTC)[reply]

Endorsed the idea of fixing the BUB or DLI Downloader. --Subhashish Panigrahi (talk) 12:31, 14 November 2015 (UTC)[reply]

Endorsed--Charan Gill (talk) 15:14, 14 November 2015 (UTC)[reply]

Endorsed--Hundalsu(talk)

Endorsed--Dineshkumar Ponnusamy (talk) 08:43, 17 November 2015 (UTC) 08:42, 17 November 2015 (UTC)[reply]

Votes

Oppose One off task that does not result in long-lasting improvement to editor productivity, impact is limited to a small number of wikis. MER-C (talk) 10:03, 30 November 2015 (UTC)[reply]
Oppose per MER-C. We have bot tools to do one-off tasks like this. It may not be easy to find someone able and willing to do it. Sounds like a good Hackathon project. John Vandenberg (talk) 01:33, 1 December 2015 (UTC)[reply]
Neutral, could be implemented at https://tools.wmflabs.org/bub/index . Jayantanth (talk) 14:45, 1 December 2015 (UTC)[reply]
BUB is collapsed since August.--KRLS (talk) 15:07, 1 December 2015 (UTC)[reply]
Comment: By the way, BUB SHALL be fixed ASAP. It is a tool very useful for all Wikisources. --Accurimbono (talk) 08:17, 2 December 2015 (UTC)[reply]
Comment: this is a perriennial proposal at grants; we need to get a Swartz interested enough to migrate texts to internet archive. good global south project. Slowking4 (talk) 02:41, 3 December 2015 (UTC)[reply]

Tool to use Google OCRs in Indic language Wikisource

Tracked in Phabricator:
Task T120788

For a long time Indic languages Wikisource projects depended totally on manual proofreading, which not only wasted a lot of time, but also a lot of energy. Recently Google has released OCR software for more than 20 Indic languages. This software is far far better and accurate than the previous OCRs. But it has many limitations. Uploading the same large file two times (one time for Google OCR and another at Commons) is not an easy solution for most of the contributors, as Internet connection is way slow in India. What I suggest is to develop a tool which can feed the uploaded pdf or djvu files of Commons directly to Google OCRs, so that uploading them 2 times can be avoided.

-- Bodhisattwa (talk) 13:50, 10 November 2015 (UTC)[reply]

Earlier discussion and endorsements

Endorsed --Yann (talk) 16:30, 10 November 2015 (UTC)[reply]
EndorsedJayantanth (talk) 18:09, 10 November 2015 (UTC)[reply]
Endorsed , and other languages--Shizhao (talk) 02:28, 11 November 2015 (UTC)[reply]
Endorsed --Satdeep Gill (talk) 07:34, 13 November 2015 (UTC)[reply]
Endorsed --Pmlineditor (t · c · l) 16:34, 13 November 2015 (UTC)[reply]
Endorsed--Charan Gill (talk) 11:33, 14 November 2015 (UTC)[reply]
Endorsed--Omshivaprakash (talk) 12:02, 14 November 2015 (UTC)[reply]
Endorsed--Vikassy (talk) 16:26, 17 November 2015 (UTC)[reply]
Endorsed--Leutha (talk) 22:20, 20 November 2015 (UTC)[reply]
Endorsed -- George Orwell III (talk) 02:32, 22 November 2015 (UTC)[reply]

Votes

Support --Tobias1984 (talk) 11:35, 30 November 2015 (UTC)[reply]
Comment This sounds like it relies on an OCR service hosted by Google, similar to Yandex (see SaaSS). Which Google service is this? Has the legality of using the service been checked? John Vandenberg (talk) 01:37, 1 December 2015 (UTC)[reply]
Oppose. Yeah, this is SaaSS. Oppose per https://www.gnu.org/philosophy/who-does-that-server-really-serve.html. MER-C (talk) 08:43, 1 December 2015 (UTC)[reply]
Comment - There is no such things like free and open source OCR softwares present for Indic languages which is as accurate as Google OCR. There are people, who have tried to build such free OCRs, but no such real luck till now, and we fear, not in near future. Even WMF is not ready to develop free OCRs due to lack of expertise and infrastructure, as stated at the Wikisource Conference 2015 in Vienna recently, even it was acknowledged at the conference that this is one of the highest priority need for Wikisource community. Google OCR is the only successful OCR available for us, so we just cannot ignore it as it is SaaSS. Ravi has explained in detail below. -- Bodhisattwa (talk) 19:54, 1 December 2015 (UTC)[reply]
Support --Satdeep Gill (talk) 14:25, 1 December 2015 (UTC)[reply]
Support Most needed as of now for Indic Wikisource. We are suffering.Jayantanth (talk) 14:30, 1 December 2015 (UTC)[reply]
Support This would come handy for many Wikisourcers. --Subhashish Panigrahi (talk) 14:43, 1 December 2015 (UTC)[reply]
Conditional support: We need to check with legal first as the OCR service is hosted by a for-profit organisation, if the legal team give the green signal than support otherwise oppose. ~ Nahid ^Talk 14:44, 1 December 2015 (UTC)[reply]
Comment In Commons, there is a Javascript gadget which helps to check whether any uploaded image is present in other website or not using Google Images. The gadget is listed in the preference section of Commons. I dont think, legal team has any problem on that part. The gadget is utilizing the same for-profit organization. Besides, we are talking about semi-automation here, just like the said gadget, nothing more, just trying to make our lives easier. -- Bodhisattwa (talk) 19:25, 1 December 2015 (UTC)[reply]
Support, very needed and Comment for many years, Wikisources are using Tesseract, a free (Apache licenseed) OCR now sponsored by Google, are we talking about Tesseract or a similar software ? Cdlt, VIGNERON * ^discut. 14:47, 1 December 2015 (UTC)[reply]
Comment This is Google OCR in their Google Drive. Not a FOSS software. Can only be used via Google Drive or Services. Not the best situation, but the best practical solution for now. Should be viewed as a transition solution until FOSS OCR solutions become effective. --Natkeeran (talk) 15:00, 1 December 2015 (UTC)[reply]
Thank you for this already but could you give more info? For instance: does this software have a name? a license ? what are the « more than 20 Indic languages » ? and did they all have a Wikisource? (no need to answer all my question but I'm curious). Cdlt, VIGNERON * ^discut. 15:30, 1 December 2015 (UTC)[reply]
This is the Google OCR we are talking about. And these are the Wikimedia projects running in 22 Indian languages. As you can see, all of them don't have Wikisource projects, but almost every major old languages is running it and few are also in multilingual Wikisource. But this proposal will not only help Indic language contributors, but also others who face the same problem like us. At the bottom of this link, there is a list of Google OCR supported languages and its plenty. -- Bodhisattwa (talk) 19:31, 1 December 2015 (UTC)[reply]
Thank you for explaining this service is part of w:Google Drive. Their terms of service do not allow accessing "using a method other than the interface and the instructions that we provide." There is an official API however it does not allow upload by URL (only direct POST/PUT from the client), so I expect it is against their TOS to integrate Google Drive into any process which automatically transmits a document from Commons to Google Drive, as it is Wikimedia Foundation doing the upload instead of the end user. It may be more legal to upload to Google Drive first, mark it as public and then Wikimedia Commons imports that document with OCR from Google Drive into Commons. John Vandenberg (talk) 07:47, 2 December 2015 (UTC)[reply]
Support - This is a common task in many Indian Languages, including Tamil. We are looking for similar tool to upload scanned images via Google OCR into WikiSource. --Natkeeran (talk) 14:55, 1 December 2015 (UTC)[reply]
Support --KRLS (talk) 15:07, 1 December 2015 (UTC)[reply]
Strong support If we can do it , it would be excellent. -- Tito☸Dutta 15:12, 1 December 2015 (UTC)[reply]
Neutral Humbly acceptable, but only as a temporary brief solution, while waiting - and actively working - for a free opensource software into an excellent "wikisource OCR service" . --Alex brollo (talk) 17:37, 1 December 2015 (UTC)[reply]
Comment I agree that, building an accurate free open-source OCR is the only permanent solution. As discussed at Wikisource Conference 2015 in Vienna recently with WMF staff (community tech+ language engineering), it became clear that WMF is not interested in development of OCR software due to lack of infrastructure and expertise in this field. Besides, other FOSS based available OCRs are far from accurate and practically speaking, they are useless. So, integrating Google OCR is the only practical alternative available to us, which is not only accurate but saves a lot of time and effort. -- Bodhisattwa (talk) 19:25, 1 December 2015 (UTC)[reply]
Wikisource Community User Group/Wikisource Conference 2015/Participants doesn't list anyone from mw:Wikimedia Language engineering. Who did you talk with? Nemo 14:28, 3 December 2015 (UTC)[reply]
We had long discussion with Frances Hocutt, Software Engineer, WMF about this matter. We also had a Skype session with Amir Aharoni, Software Engineer, Language Engineering team, WMF. Thanks - Bodhisattwa (talk) 14:43, 5 December 2015 (UTC)[reply]
Support Anubhab91 (talk) 15:53, 1 December 2015 (UTC)[reply]
Support There are Wikimedia projects running in 22 Indian languages. While many of the Wikipedias in these languages have a slow growth owing to our socio economic and political conditions, many Indian languages have a rich sources of books and classics available in public domain dating 1000s of years. Unlike the western world or global north, we do not have Guternberg like projects with an army of volunteers to proofread and transcribe. Hardly 1 in 20 Wikipedians contribute to WikiSource projects and only 1 in a 10 million population become Wikipedians. Stats of the leading Indian language WikiSource projects for Malayalam, Tamil, Telugu and Bengali can be checked. And for the sake of informing the global community better, this is what we mean by Google OCR and here is what we need: Check this example page in Tamil Wikisource where we have proof read extension installed. Now, Google OCR will help us transcribe this page and make our job easy in proof reading. But, we need to upload images page by page to Google OCR. We can't upload more than 10 pages at a time. And then, we are again limited by the storage capacity of one's Google Drive. Our Wikipedian T. Shrinivasan came up with this python script to automate this process. But not everyone are tech savvy enough to run this script. What we need is an OCR solution that is as easy as the proofread extension itself or one that integrates with it. Even 3rd party bookmarklet that interacts with Google should be enough. We have seen enough FOSS based and other industry grade OCR solutions that won't even come near Google OCR's output for the next decade simply because they cannot match Google's resources or approach to solve this issue. It is not a question of whether WMF should do this or if it is within its operating principles of free software. In the past, WMF has redefined global norms if it believes it is in the best interest to serve our mission. This is a matter of immense impact and the question should be how the community can be helped. After all, the output will be available as free content in Wikimedia projects and will also be of great use to add references to Wikipedia. If it is needed, the WMF should talk to Google to get an API or a special agreement that supports WikiSource as even Google only stands to benefit from more content being added to the web. --Ravi (talk) 16:01, 1 December 2015 (UTC)[reply]
Oppose - No google in to wikipedia pls Singhalawap (talk) 17:01, 1 December 2015 (UTC)[reply]
Comment Can you please elaborate your reason of opposition? As explained above, we are not incorporating Google into Wikipedia!!! For your information, we are using Google OCR practically in all Indic language Wikisource projects everyday. We just want to make the task semi-automated, that's all. -- Bodhisattwa (talk) 18:35, 1 December 2015 (UTC)[reply]
Support This will be an investment for the future, especially if it is a free and open source OCR. I see that Tesseract (FOSS OCR software that is developed and sponsored by Google) supports some Indian languages, maybe extend it to WikiSource and improve it? Kenrick95 (talk) 01:48, 2 December 2015 (UTC)[reply]
Support --Sayant Mahato (talk) 04:20, 2 December 2015 (UTC)[reply]
Support This is very much needed. - Shubha (talk) 04:26, 2 December 2015 (UTC)[reply]
Support It's a shame there's no FOSS option, but this sounds like a pretty good way to go for the time being. — Sam Wilson ( Talk • Contribs ) … 08:53, 2 December 2015 (UTC)[reply]
Support But, I also think WMF should try and negotiate a freer fair use agreement with Google (if there are any restrictions). We should also invest in developing an open OCR software for Indian languages after sufficient amounts of training data is available in Wiki projects. -- Sundar (talk) 08:58, 2 December 2015 (UTC)[reply]
Support This would not affect non-Indic Wikisources, but it would have an huge impact on Indic ones: I think "language equity" is an important goal for Wikimedia projects, so this is definitely something to do. Aubrey (talk) 09:03, 2 December 2015 (UTC)[reply]
Support - I would suggest, if any auto spell-check is available same to be included with option to replace it with suggested words. This would save time of typing the correct word and work of proof reading can get more speed. --Sushant savla (talk) 09:22, 2 December 2015 (UTC)[reply]
Support- Reasons are well explained by Ravi (see above)-Nan (talk) 10:38, 2 December 2015 (UTC)[reply]
Strong Support --Mathanaharan (talk) 10:41, 2 December 2015 (UTC)[reply]
Support-In order to protect the anonymity of contributors, a solution through API aggrement between WMF and Google would be better. --Arjunaraoc (talk) 10:55, 2 December 2015 (UTC)[reply]
Support--Balurbala (talk) 12:19, 2 December 2015 (UTC)[reply]
Support This would come handy for many Wikisourcers--Kurumban (talk) 13:07, 2 December 2015 (UTC)[reply]
Support I am in an openion this will help Tamil (my mother tongue) and other Indic Languages --உமாபதி (talk) 13:20, 2 December 2015 (UTC)[reply]
Support --Sivakosaran (talk) 15:14, 2 December 2015 (UTC)[reply]
Support--Parvathisri (talk) 17:45, 2 December 2015 (UTC)[reply]
Oppose. I do understand frustration of the Indian community that the only good OCR tool is a non-free tool. I am not sure however that we can do anything here unless Google is generous enough to release the source code of their OCR under a free license. In the end this request depends on Google and not on Wikimedia Foundation, as WMF can do nothing without a number of actions by Google, thus it is not a task for Community Tech — NickK (talk) 16:09, 2 December 2015 (UTC)[reply]
Support-this is a game changer where wikisource can go where Gutenberg does not. language support in global south. Slowking4 (talk) 02:38, 3 December 2015 (UTC)[reply]
Support --Vikassy (talk) 16:51, 3 December 2015 (UTC)[reply]
Support -- it will verymuch useful to Indic Languages.--சஞ்சீவி சிவகுமார் (talk) 08:07, 4 December 2015 (UTC)[reply]
Support --Pymouss Tchatcher - 20:15, 4 December 2015 (UTC)[reply]
Strong Support --ViswaPrabhaവിശ്വപ്രഭ^talk 23:51, 4 December 2015 (UTC)[reply]
Oppose Google Drive OCR Feature is not even a product with API availability . It is a proprietory product feature within Google drive. I dont know what you mean by Integrating a non existent product to wikisource. The OCR's google open sourced are Tesseract and ocropus . Tesseract is already integrated with Wikisource. I believe the proprietiry feature in Google Drive is more an optimization of these engines. Why dont Wikimedia invest on Community Tech for optimizations and improvments for tesseract and ocropus . The issue with indian languages is absense of financial support for people working on these domains. If wikimedia can address it , we can easily beat it . Integrating non existent proprietory service is always a burden and does not help in solving the OCR problem in long run. -- AniVar (talk) 08:48, 5 December 2015 (UTC)[reply]
Support Very Much useful. -தமிழ்க்குரிசில் (talk) 13:06, 5 December 2015 (UTC)[reply]
Support -- நி.மாதவன் ( பேச்சு )
Support --Kasyap (talk) 15:39, 7 December 2015 (UTC)[reply]
Support This will help decrease many man-hours Yohannvt (talk) 07:23, 8 December 2015 (UTC)[reply]
Strong support This is obvious idea that any wiki librarian working on indic language wikisources would get. This will be handy, helpful and boast to indic-wikimedians. So, I support this. --Pavan santhosh.s (talk) 07:59, 8 December 2015 (UTC)[reply]
Support -- Mayooranathan (talk) 18:39, 8 December 2015 (UTC)[reply]
Support As the Indian language communities have no other good option so it should be implemented. --Jnanaranjan sahu (talk) 18:54, 8 December 2015 (UTC)[reply]
Support - If an API solution cannot be negotiated with Google, then something along the lines of user:John_Vandenberg's suggestion under point 8 above. Bcharles (talk) 23:03, 8 December 2015 (UTC)[reply]

Visual Editor adapted for Wikisource

Tracked in Phabricator:
Task T48580

Currently, Wikisource is using the old but reliable text editor. This requires all Wikisource contributors to know lots of templates that are different from one Wikisource language to another. Having a special version of the Visual Editor, adapted for the Wikisource needs, would facilitate inter-language help on Wikisource and bring ease to new contributors on the Wikisource projects. By having selected buttons on this adapted Visual Editor for titles, centering, left or right margins text, tracing lines, etc, would be easy to learn, especially if those are derived from a word processor general look and contribute to bring people on different language Wikisource...

Placing a title in French, in English, in Spanish or Croatian would now be the same thing : selecting the text and pressing a button... not use a different named-template depending on which Wikisource you are. Many people could help proofread pages in different languages, for example, with a global project of the week... Myself, being a french-speaking canadian, yes, I could proofread in French, English, Russian, Ukrainian, Spanish, but I'd need to know all the different templates in all these languages,and as my level of speaking and understanding in these languages are not as fluent as my native language, it is sometimes difficult to find and search on the other Wikisource projects... But nothing would prevent me from helping on any of those or even an Italian, Bulgarian or Portuguese special projects... These are the same fonts... Proofreading only needs us to be able to compare the orignal text of the book and the text transcribed... But not knowing all the templates on the different other wikisource prevent me of helping other communities...

The magic in all this, we don't have to re-invent the wheel! I figure it would be easy to apply some modification to the actual Visual Editor used on the other projects to be able to concentrate the needs of Wikisource editing in a concise list of buttons for the most basics needs, that would allow to proofread 95% or even more of the actual book pages... Worst case scenario, the 5% left would be done the old-fashion way...

--Ernest-Mtl (talk) 03:31, 10 November 2015 (UTC) — WMCA[reply]

Earlier discussion and endorsements

Endorsed Slowking4 (talk) 04:36, 11 November 2015 (UTC) user interface major impediment to new users - make WSUG happy[reply]

I believe the linked phab task (phab:T48580) is at least one of the prerequisites to this task. Generally speaking, wikisource uses several extensions to mediawiki core, and both Visual Editor and Parsoid need to have code added to support those extensions. cscott (talk) 18:57, 11 November 2015 (UTC)[reply]

Endorsed - VE for Wikisource would greatly lower the barriers to entry on what (IMHO) is the sister project with the greatest potential for rapid growth if a little bit of resources were allocated to it. Wittylama (talk) 12:21, 23 November 2015 (UTC)[reply]

Votes

Support --Tobias1984 (talk) 11:36, 30 November 2015 (UTC)[reply]
Support VE is likely to see a much higher adoption rate on Wikisource than on Wikipedia, and the ability to extract high-fidelity Wikisource content as a DOM using Parsoid will be a very important step forwards. John Vandenberg (talk) 01:42, 1 December 2015 (UTC)[reply]
Support Risker (talk) 04:17, 1 December 2015 (UTC)[reply]
Support--Kippelboy (talk) 05:41, 1 December 2015 (UTC)[reply]
Support --Candalua (talk) 08:57, 1 December 2015 (UTC)[reply]
Support --Accurimbono (talk) 09:08, 1 December 2015 (UTC)[reply]
Support --Anika (talk) 09:28, 1 December 2015 (UTC)[reply]
Support --Rahmanuddin (talk) 15:10, 14 December 2015 (UTC)[reply]
Support --Xavier121 (talk) 09:40, 1 December 2015 (UTC)[reply]
Support -- Bodhisattwa (talk) 13:27, 1 December 2015 (UTC)[reply]
Support --Satdeep Gill (talk) 14:26, 1 December 2015 (UTC)[reply]
Support obviously. Cdlt, VIGNERON * ^discut. 14:35, 1 December 2015 (UTC)[reply]
Support --Tito☸Dutta 14:43, 1 December 2015 (UTC)[reply]
Support--David Saroyan (talk) 14:46, 1 December 2015 (UTC)[reply]
Support --KRLS (talk) 15:07, 1 December 2015 (UTC)[reply]
Support -- Wittylama (talk) 15:07, 1 December 2015 (UTC)[reply]
Support In order to start to make Wikisource proofreading workflow what it should be: an easy task doable by everyone. Tpt (talk) 15:15, 1 December 2015 (UTC)[reply]
Support Beleg Tâl (talk) 15:18, 1 December 2015 (UTC)[reply]
Support Sadads (talk) 16:18, 1 December 2015 (UTC)[reply]
Support --Wesalius (talk) 19:17, 1 December 2015 (UTC)[reply]
Support--Arxivist (talk) 20:10, 1 December 2015 (UTC)[reply]
Support -- Daniel Mietchen (talk) 20:46, 1 December 2015 (UTC)[reply]
Support Trizek ^{from FR} 22:17, 1 December 2015 (UTC)[reply]
Support --Aristoi (talk) 22:55, 1 December 2015 (UTC)[reply]
Support — George Orwell III (talk) 23:28, 1 December 2015 (UTC)[reply]
Support but please, ensure an immediate switch between VE edit and traditional edit while editing, without the need of saving/re-edit the page. If this feature already exists, I apologyze for this inappropriate comment. If this feature is impossible to get, please allow simple, fast and persistent disabling of VE as an user preference. --Alex brollo (talk) 00:00, 2 December 2015 (UTC)[reply]
Support Of course, we are talking about VE in the Proofread page, right? Aubrey (talk) 09:06, 2 December 2015 (UTC)[reply]
Support --Barcelona (talk) 12:09, 2 December 2015 (UTC)[reply]
Support--Yodaspirine (talk) 13:02, 2 December 2015 (UTC)[reply]
Support --Le ciel est par dessus le toit (talk) 13:56, 2 December 2015 (UTC)[reply]
Support, one of the few cases where VisualEditor will really make life simpler for experienced users — NickK (talk) 16:05, 2 December 2015 (UTC)[reply]
Support Pyb (talk) 01:10, 3 December 2015 (UTC)[reply]
Support this is a hard ask, but it is a game changer. even incremental progress would be appreciated. Slowking4 (talk) 02:42, 3 December 2015 (UTC)[reply]
Support - Wieralee (talk) 17:14, 4 December 2015 (UTC)[reply]
Support --Pymouss Tchatcher - 20:12, 4 December 2015 (UTC)[reply]
Support Halibutt (talk) 00:24, 5 December 2015 (UTC)[reply]
Support --Yeza (talk) 10:47, 7 December 2015 (UTC)[reply]
Support --Kasyap (talk) 15:41, 7 December 2015 (UTC)[reply]
Support Abyssal (talk) 16:50, 10 December 2015 (UTC)[reply]
Support --ESM (talk) 16:41, 13 December 2015 (UTC)[reply]
Support --Davidpar (talk) 14:21, 14 December 2015 (UTC)[reply]