Community Wishlist Survey 2022/Wikisource

From Meta, a Wikimedia project coordination wiki
Wikisource
11 proposals, 172 contributors, 374 support votes
The survey has closed. Thanks for your participation :)



New search tool using informations from Index pages

  • Problem: The search tool on Wikisource works the same way as the Wikipedia one: I mean, it suggests only pages whose title is exactly the request, and gives as results a list of pages containing every word of the request. However, on Wikisource, users commonly want to find a text thanks to its title or its author: for example, they search "Chats Fleurs du mal" or "Chats Baudelaire" to find Baudelaire's poem "Les Chats" published in the Fleurs du Mal, but whose transclusion is named "Les Fleurs du mal/1857/Les Chats". For very famous textes, like this one, a network of redirections enables to find them even if their title is not included in the name of their transclusion; but it is time- and human resources-consuming. It would be smarter if the search tool used not the whole text of pages, but just the information included in the header template (which often comes from the Index: page), namely Author, Title, Subtitle, Editor, Translator, Publisher, etc. Continuing with the same example, the header template of the transclusion contains the informations that the page is called "Les Chats", the author is "Charles Baudelaire" and the book is untitled "Les Fleurs du mal".
  • Proposed solution: Create another search tool for Wikisource, which use the information included in the header templates or the Index: pages to find results. This search tool may also be declined in an advanced search tool, enabling users to search by title, by author, by date of publication, etc., like in a library catalogue.
  • Who would benefit: Readers (and contributors who create redirections)
  • More comments:
  • Phabricator tickets:
  • Proposer: ElioPrrl (talk) 13:51, 13 January 2022 (UTC)[reply]

Discussion

Voting

Search in books

  • Problem: When books are transcluded into many pages, it is difficult to search text in them. For example, if I want to search Ajax in the Iliad, I must open 24 transclusions (Iliad/Canto I, Iliad/Canto II, etc.) and make a text search in every page; another solution, but which is barely known even by contributors of Wikisource, would be using the advanced search, namely its field "Subpages of this page". To remedy this, some contributors create a new page with the full text of the book : Iliad/Full text. I think it would be more useful to provide a search bar in every page of the main namespace which allows to find text in the current page or in all subpages.
  • Proposed solution: Provide a search bar on every page of the main namespace, giving results in the current page or in the whole book. In a word, this search bar would search text exactly in the same pages as the ones exported by the Export button. More precisely, it might be inserted next to the Export button, at the very top of the page, and may function in a similar way: if the Export button enables to export only the current page, the search bar reads « Search in this page » and gives the same results as Ctrl+F/Cmd+F; but when the Export button enables to export the whole book, the search bar reads « Search in full book ».
  • Who would benefit: ::Readers: on French Wikisource, the full text page of famous books (like Candide by Voltaire) are more read than the individual transclusions, because readers consult these books on Wikisource mainly for full text search.
Editors: we could save time if we stopped creating full text pages, which often have problems (too many templates, broken links, etc.)
  • More comments: Idea of improvement : enabling regular expressions in this search bar (which is impossible with Ctrl+F: it would then justify the presence of this search bar in every page).
  • Phabricator tickets: task T309490
  • Proposer: ElioPrrl (talk) 10:14, 14 January 2022 (UTC)[reply]

Discussion

But transcluded pages does not contain the text of the book. In Iliad/Canto I, there is not "The wrath of Peleus' son etc.", but <pages index="Homer - Iliad, translation Pope, 1909.djvu", etc.. And even if it would, for the average user, use prefixes, keywords, etc., is not easy. On every page of Gallica (webpage of the French National Library) or Internet Archive, when you consult a book in their portal, there are two search bar: the first one for searching in the whole website, the second one (commonly integrated in the book viewer) for searching the book. — ElioPrrl (talk) 12:31, 17 January 2022 (UTC)[reply]
Your use case above does not adequately explain what you're trying to explain to me then, because I still don't get why searching in wikitext doesn't get you what you want. Izno (talk) 19:26, 17 January 2022 (UTC)[reply]
@Izno:: Are you used to Wikisource? In Wikisource, the text of a page is rarely contained in the wikitext, but is transcluded from another page. Example : this transclusion only contains <pages index="Homer - Iliad, translation Pope, 1909.djvu" from=35 to=51 />, a command that transcludes the text containes in this page and the following ones; and note that the reader normally never accesses the Page: namespace. Therefore, even when restricting results to subpages of a given page, using insource: to search transclusions of a book would not give any result: give it a try! In fact, the results are better without this parameter.
Anyway, this a very complicated way, only known by contributors, not by the average user. In the same way, it is possible, on IA or the BnF, to use the advanced search to find words in a given book; but all users of IA and BnF search books not this way, but by using the search bar integrated in the book viewer. I ask to emulate this functioning in Wikisource: a search bar integrated to transclusions. I advise you to consult SWilson's first solution below to understand what I mean. — ElioPrrl (talk) 09:55, 18 January 2022 (UTC)[reply]
  • It's actually sort of already possible to put a search box in the 'indicators' area (where the export button is). Here's a hacky version: https://en.wikisource.beta.wmflabs.org/wiki/Test_Book — although, it's only possible to set a prefix to search, rather than following all the rules that WS Export uses (e.g. finding subpages referenced within ws-summary sections). I'm not sure of the best way around that, because it'd be really slow to determine what the extent of the work is (and even then it's still not foolproof), but we could think about adding a 'search' button next to export which would display a search dialog with some more info and the search form. —SWilson (WMF) (talk) 06:08, 17 January 2022 (UTC)[reply]
winkThanks! You have completely understood my thought. No matter if the first solution is not totally reliable: it's easier to urge developers into improving an existing tool than creating it from scratch. I only thought that, in order to save developer's time, the lines of code used in WSexport to select the pages to be exported could be reused for this new purpose, but maybe it's not feasible. — ElioPrrl (talk) 12:31, 17 January 2022 (UTC)[reply]
The prefix parameter would be basepagename, not fullpagename. The intent is to searh pages with the same base page. Other than that, excellent.--Snævar (talk) 07:10, 20 January 2022 (UTC)[reply]
@SWilson (WMF) and Snævar: I can also notice that sometimes the books may have transclusions whose title does not begin with the title of the base page. In particular, in French Wikisource, sometimes the complete works of an author are transcluded under their different titles; for example, in the book whose table of contents is found in Complete Works of Molière, the different plays are transcluded in the pages Tartufe, Don Juan, etc., not Complete Works of Molière/Tartufe, Complete Works of Molière/Don Juan (this is something I criticise on French Wikisource, but the very major part of the contributors still creates pages this way, because of this issue: they are afraid that under the long title the readers could not find the page, and they don't want to waste their time creating redirections to prevent this). — ElioPrrl (talk) 12:02, 4 February 2022 (UTC)[reply]
  • @ElioPrrl, maybe I did not understand every aspect of your question, but look at a random ns0 page in it.wikisource, e.g. this Iliad: under the header you can see a search box available also by the layman to have a full search of a word in the entire work (here is the result for Aiace). Is this what you asked for? - εΔω 08:18, 4 February 2022 (UTC)
@OrbiliusMagister: It is indeed what I am asking for. It seems that it is added automatically by your template Intestazione, doesn't it ? It would be nice if it could be added automatically on every page of ns0, even those without any template. — ElioPrrl (talk) 12:04, 4 February 2022 (UTC)[reply]
@ElioPrrl: helping other users is my pleasure. On it.source the search box is actually an embedded Template:Ricerca, but you may just help yourself with it in many projects. - εΔω 20:13, 4 February 2022 (UTC)

Voting

Translation of texts published in another language

  • Problem: Texts published on Wikisource in the original language (where copyright is respected), can be translated into another language. However, it is currently not possible to get help for that translation from the tool available for use on Wikipedia.
  • Proposed solution: Create a tool that fills the Wikisource translation of a language, based on a text available on Wikisource in another language. Possibly it will be about adapting the tool that can currently be used on Wikipedia.
  • Who would benefit: Wikisource collaborators willing to translate and include in the Wikisource of a language, texts already available in Wikisource of another language.
  • More comments:
  • Phabricator tickets:
  • Proposer: JLVwiki (talk) 19:38, 13 January 2022 (UTC)[reply]

Discussion

  • @JLVwiki: Thanks for your proposal! I've machine-translated it into English and moved your original text to the Spanish subpage, in preparation for it being translated to other languages. SWilson (WMF) (talk) 03:45, 21 January 2022 (UTC)[reply]
  • Note that not all Wikisource projects permit user-made translations. It is a source of edit-wars in communities where there aren't. And a source of potentially scarce quality contents. So I recommend to check if there is a wide consensus in Wikisource projects for such an extension. --Ruthven (msg) 10:59, 25 January 2022 (UTC)[reply]

Voting

Stick the toolbar in the page namespace

  • Problem: The page namespace could be compared to a word processor in editing mode and to a navigator in viewing mode. In both of them, the toolbar usually sticks to the top of the window. The behaviour is different in Wikisource where we have to scroll up to have access to a button and come back to see if the operation was performed in the text. Loosing focus on the text we are working on is annoying and unproductive.
  • Proposed solution: Harmonize the interface of the page namespace with what is found elsewhere on the market so that a direct access to the toolbar is always maintained while scrolling down the text.
  • Who would benefit: All the wikisource communities
  • More comments: The page namespace represents more than 90% of our contributions
  • Phabricator tickets:
  • Proposer: Denis Gagne52 (talk) 18:21, 20 January 2022 (UTC)[reply]

Discussion

  • This seems like it would already be fulfilled by Vector 2022 (skin with sticky header) and the 2017 wikitext editor. Although I guess the new wikitext editor doesn't support ProofreadPage at all... I would prefer to invest in the new wikitext editor however. —TheDJ (talkcontribs) 16:02, 7 February 2022 (UTC)[reply]

Voting

Export of modernised texts

  • Problem: On French, Spanish and Portuguese Wikisource, and maybe on other Wikisource projects, it is possible, for texts using old spellings (e.g. containing long s : ſ), to modernise the spelling automatically (e.g. substitute s to ſ). This is allowed by a template called Modernisation, which creates a new tab in the pages that contain it and substitute old spellings by their actual counterparts, without creating a new page.
However, when using WSexport, it is impossible to export the modernised version of the page.
  • Proposed solution: Add a new parameter in WSexport which enables to choose between the different versions of the same text.
  • Who would benefit: Every reader who wants to read old texts without getting headache due to old-fashioned spellings.
  • More comments: For more information on the template Modernisation, see its documentation.
  • Phabricator tickets:
  • Proposer: ElioPrrl (talk) 10:31, 11 January 2022 (UTC)[reply]

Discussion

  • I was going to say that this is achievable via template styles, but then realised that the crux of it is that it needs to be user-selectable at the time of export. That's right isn't it? It seems like it might be doable by adding a system of wikis being able to define multiple stylesheets in the way that they can currently define ebook.css, and then showing those as options in the export form. SWilson (WMF) (talk) 05:16, 12 January 2022 (UTC)[reply]
It's must be user-selectable, indeed: I indeed thought of a new option in the export form. I don't know if it can be managed by multiple stylesheets, because it's not a question of formatting : when clicking on the "Modernise" button, some text is found and replaced by some other text (e.g. avoient, old spelling, by avaient, modern spelling). But I'm not a technical man, maybe I'm badly understanding what you're saying. — ElioPrrl (talk) 17:19, 13 January 2022 (UTC)[reply]
  • There is an system in MediaWiki that converts text, Language converter. It is capable of converting text without creating a new page. It might be better suitable than an module.--Snævar (talk) 18:44, 12 January 2022 (UTC)[reply]
    I have just tested WSexport with Chinese Wikipedia, and WSexport do not support Language Converter either. (In fact there are no such option) C933103 (talk) 22:14, 15 January 2022 (UTC)[reply]
    Good point. LanguageConverter might be suitable for doing this for language variants (and actually we should probably look at implementing that anyway in WS Export) but I'm not sure it'd work for things like translating long S to normal S (that's not a language variant but a typographical archaism). That's done in some Wikisources by having a template output HTML for both variants and then hiding one or the other via Javascript: <span class="long-s">ſ</span><class="normal-s">s</span>, which is why I wonder if it could be done by making multiple stylesheets available in WS Export. This whole topic could definitely take some more investigation though! SWilson (WMF) (talk) 01:40, 17 January 2022 (UTC)[reply]
    Maybe I has not chosen my example judiciously. The typical action of modernisation modules is not replacing typographical variants (like s/ſ), but replacing words by another words (in French, avoient/avaient or tems/temps, in English shew/show or reflexion/reflection). — ElioPrrl (talk) 12:45, 17 January 2022 (UTC)[reply]
    Unfortunately, doing this with CSS has two disadvantages: (1) you replace a 2-byte character with 56 byte of HTML, so you can much easier hit transclusion limit (note, this would be transcluded twice: once from a template and second time using ProofreadPage page transclusion), and (2) the words containing this code will likely be non-searchable due to HTML markups inside the words. I personally also appreciate, if this can be done without creatig extra pages or an extra namespace, like it is done here+here. Ankry (talk) 20:30, 17 January 2022 (UTC)[reply]
    These are wise remarks. And indeed, on French WS, modernisation does not create extra pages, and we do not want this behaviour to change. ElioPrrl (talk) 10:04, 18 January 2022 (UTC)[reply]
    In Polish WS we tend to decide to create separate pages just due to current ws-export limitations. Hovewer, we would hapily withdraw from this. Ankry (talk) 11:36, 18 January 2022 (UTC)[reply]
  • You do want only the syntax char replacement. Maybe, the one easy way is only to replace chars in frontend of the page over Javascript. ✍️ Dušan Kreheľ (talk) 16:36, 26 January 2022 (UTC)[reply]
    I agree, that is one way to do this (and I think it's how it's already done on some Wikisources). But it has shortcomings for other things, such as exporting to other formats via WS Export (or any other tool that uses the rendered HTML). I'm sure we'll figure something out though! :-) SWilson (WMF) (talk) 00:57, 27 January 2022 (UTC)[reply]

Voting

Index creation wizard

  • Problem: Uploading books to Wikisource is difficult. In the current workflow you need to upload the file on Commons, then go to Wikisource and create the Index page (and you need to know the exact URL). The files need to be DJVU, which has different layers for the scan and the text. This is important for tools like Match & Split (if the file is a PDF, this tool doesn't work).

    More importantly, the current workflow (especially for library uploads) includes Internet Archive, and the famous IA-Upload tool. This tool is now fundamental for many libraries and uploaders, but it has several issues.

    As Internet Archive stopped creating the DJVU files from his scans, the international community has struggled solving the issue of creating automatically a DJVU for uploading on Commons and then Wikisource. This has created a situation where libraries love Internet Archive, want to use it, but then get stuck because they don't know how to create a DJVU for Wikisource, and the IA-Upload is bugged and fails often.

  • Proposed solution: # Incorporate the features of IA-Upload tool onto a wiki-side Index/Book Creation Wizard with a more friendly UX redesign.
  1. Expand the wizard's functionality to similar services (eg. Hathi Trust, Google Books, Gallica, etc.)
  2. Allow the wizard to assist in the creation of indexes and configuring its pagelist.

Discussion

Thanks for creating this proposal! I went ahead and added the most robust description to the problem, based on the previous wishes you linked to-- feel free to edit if you feel the problem statement seems outdates or like it needs more details! The more detailed the description of the problem, the better- hence my edit. Thanks so much! NRodriguez (WMF) (talk) 17:36, 17 January 2022 (UTC)[reply]

Voting

Complete the development of the integrated feature to display other language editions in toolbar from wikidata

  • Problem: due to the structure of data on wikidata for works/editions, wikisource editions do not have automatic wikilinks with editions in other languages. An integration in the interface was begun, some time ago, and now displays in the tool bar a link to WP work article, when it exists - but the integration was never finished, thus pushing ill-informed contributors to put all wikisource editions on the same item with the work item, thus making it impossible to properly describe the editions.
  • Proposed solution: using the edition or translation of (P629) and has edition or translation (P747) link to display all other editions in the toolbar, automatically, without having to maintain old manual wikilinks in all wikisource projects...
  • Who would benefit: all wikisource and wikidata (book) contributors
  • More comments:
  • Phabricator tickets: phab:T128173, phab:T71735, probably others...
  • Proposer: Hsarrazin (talk) 12:41, 23 January 2022 (UTC)[reply]

Discussion

Voting

Bibliographic Structured Data on Wikisource

  • Problem: Cataloging Wikisource with Wikidata is possible, but an embedded solution like Structured data on Commons would enable to add more detailed metadata holdings and annotation at/in the documents
  • Proposed solution: a structured data part in Wikisource like the Structured Data on Commons + integration Template:Annotate_QID developed by Mfchris84 in 2021, Wikidata: Q106805878
  • Who would benefit: visibility and SPARQL-Query usage for Wikisource, textbox quality, interoperability, Wikisource communities in participating international wikisource versions
  • More comments: There could be two ways in which Structured Data on Wikisource could be used:
  1. bibiographic metadata about documents on Wikisource are stored in Wikidata and linked over the sitelink (schema:) to Wikisource. The Wikisource community could use this data to automatic generate infoboxes (textboxes).
    • maybe there is a lack of discussion about the bibliographic metadata model on Wikisource. imho represents Wikisource pages a new version/edition of the underlying work. so instead of linking the Wikisource pages directly as sitelink to the originate works or editions, something similar to SDC "digital representation of" would be even more precise.
  2. a great benefit of "structured data on Wikisource" could be the possibility to tag or (semi-)automatically recognize "named entites" within the given text corpora. many items (e.g. persons, places) where mentioned in different ways in the text, in interesting and important ways - but not important enough to define them as "main subject" (P920) in the bibliographic (wikidata) item. but if we could have to create a tagged map of named entites linked by Q-IDs a really new way of searching or disovering Wikisource articles could be enabled.

Discussion

Hi. It is the renewed partly enriched proposal from Wishlist Survey 2021. We got some strong support last year. I hope some planing and development will take place step by step. --Jeb (talk) 09:36, 22 January 2022 (UTC)[reply]
WE-Framework is helpful to add metadata, but Ankry and other's said it's dangerous tool. Matlin (talk) 11:34, 22 January 2022 (UTC)[reply]

Voting

Ability to perform batch tasks on all (or selected) pages in a work

  • Problem: Sometimes we need to make an identical change to a large number of pages in a a work. At present, the only solution is to ask a bot editor to make the change.
  • Proposed solution: Something akin to VisualFileChange.js on Wikimedia Commons.
  • Who would benefit: Wikisource editors
  • More comments:
  • Phabricator tickets: T289506
  • Proposer: Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 13:31, 15 January 2022 (UTC)[reply]

Discussion

  • You may also be interested in AutoWikiBrowser. It can find and replace texts in several pages, add code to the beginning or end of several pages, add/remove/substitute categories on several pages, etc. — ElioPrrl (talk) 20:24, 15 January 2022 (UTC)[reply]
  • @Pigsonthewing: Thank you for your proposal. Would you mind being a bit more specific about what kind of changes you are referring to? Thanks DMaza (WMF) (talk) 16:01, 19 January 2022 (UTC)[reply]
    • I'm thinking initially of complex find & replace actions (as I said, like those in Commons 'perform batch task'). For example, I recently had a work with something like {{rh||LOREM IPSUM DOLOR SIT AMET|}} on half the pages and {{rh||{{uc|Lorem Ipsum Dolor Sit Amet}}|}} on the other half, and I wanted to standardise them to the latter. In another example, suppose one transcriber had used {{ls}} in half the pages, and another transcriber had just used "s". We might want to replace all the templates with plain text. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 19:19, 19 January 2022 (UTC)[reply]
  • Just a side note, this can be done easily with WP:JWB and some basic knowledge of RegExp. AFAIK, JWB works on all public Wikimedia wikis as long as the pages are technically edit-able and you have AWB access. NguoiDungKhongDinhDanh 22:00, 30 January 2022 (UTC)[reply]

Voting

Fix search and replace in the Page namespace editor

  • Problem: Like every year, we're begging to fix this bug, that it's ridicously still present. The search and replace function in the default editor is broken, and it's a basic function of every editor.
  • Proposed solution: Fix the underlying issues in the JS API leading to the search and replace function being broken. Alternatively, remove the search&replace button and replace function as a independent widget.
  • Who would benefit: All Wikisource users
  • More comments: Former submissions:

Discussion

I strongly support this proposal. Fixing known bugs should be alwais prioritized. --Alex brollo (talk) 14:11, 18 January 2022 (UTC)[reply]

Well, in fact fixing known bugs should be a routine, not a subject of annual pleading. --Jan Kameníček (talk) 00:33, 29 January 2022 (UTC)[reply]
Totally agree... — ElioPrrl (talk) 11:06, 29 January 2022 (UTC)[reply]

According to T183950, the problem here is that PRP's Page: content model breaks assumptions other parts of Mediawiki make regarding the data of a wikipage. Specifically, Page: wikipages consist of three distinct sections: the header, main content area, and footer. Other parts of the stack fundamentally assume a wikipage is one complete part. To handle this PRP overrides (among other things) the text selection methods from the jquery.textSelection plugin, but so far it has only implemented the getSelection method because that was needed to make VisualEditor work (I am unclear on what specifically this need was; VE has at least some specific knowledge of PRP for other reasons, so it's entirely possible the getSelecion override isn't even necessary any more). What it's doing is concatenating the header, body, and footer so that it can provide VE a single text unit to operate on. But when the 2010 Wikieditor's search and replace function runs, it gets the same concatenated text, meaning that when it finds matching substrings within the text it finds a range that is offset relative to the first character of the header rather than the first character of the body. PRP does not yet override setSelection, so when the 2010 Wikieditor selects the matched text it uses this offset and ends up selecting a range that is off by the number of characters in the header (which includes <noinclude> tags etc., so even an apparently empty header will throw it off). The same holds true for when it tries to replace the found text.

As a working assumption, the fix is simply to implement overrides for the other jquery.textSelection methods in PRP, in the vicinity of /mediawiki/extensions/ProofreadPage/modules/page/ext.proofreadpage.page.edit.js L385. The overrides should presumably just need to track the offset caused by the header and adjust the value when calling through to the original method. If we want search and replace to work in header and footer fields we'd need to be a little more fancy, keeping track of which ranges correspond to which text field and mapping to the correct offset depending which field we're in. To be hyper-hyper fancy we'd need to add UI to the 2010 Wikieditor to allow toggling on and of searching in the header/footer; but I don't think there is any real need for this. Just getting search and replace working in the body will be a massive improvement.

This isn't really a bug per se. Multiple parts of the tech stack have changed over the years, leading to the missing functionality (setSelection override and friends) that didn't used to be a problem now showing up as seeming bugs in other components. In other words, AIUI this should be firmly within the CommTech CW scope and ought to be a nicely manageably-sized task. tpt is also familiar with the existing code there (judging by T183950 and git blame) and has historically been very generous with their limited time in answering questions about such things. I also believe SWilson and Matmarex have touched this code for various reasons and may be able to assess whether my understanding expressed above is at least approximately correct. --Xover (talk) 11:24, 5 February 2022 (UTC)[reply]

On French Wikisource, there is another button (RegEx, in the Aide à la relectureProofread tools tab) that performs search and replace without any bug. Maybe it suffices to implement it in every Wikisources. — ElioPrrl (talk) 19:21, 12 February 2022 (UTC)[reply]

Voting

Allow side by side display of different version of same text

  • Problem: Sometimes, a document on wikisource could be originally in different languages/scripts, or is a translated version of a raw document. In these cases, it will be useful to display the source document, translitrated documents, and translated documents side by side.

    A number of wikisources are doing the same to some of their documents, by hosting copies of different versions of a source document on different pages, either within the same wikiprojects or from other wikiprojects. This is bad for controlling, editing, and correcting documents whenever applicable.

    Thus I believe there should be a tool that allow each page to host only exactly 1 language/script version of a document, and let different versions of the same document be softly linked with each other through some sort of linking mechanism, possibly through wikidata or such, and defaulting the display of source document in their original languages/scripts together with a translated modern version of the text, without needing to have same data being copied character-by-character to multiple pages/sites.

  • Proposed solution: Create an interface that can display of 2+ different text from same/different wikisource side by side, with default version display defined in wikitext but also allow user to choose their own version to compare.
  • Who would benefit: Anyone who use/contribute/manage multilingual content, or translated content, on wikisource.
  • More comments: It is noted that there are some javascript tools which tries to replicate the same, but they're limited to two versions of text as far as I am aware of, and do not line up the documents line by line properly.

    The wish was previously proposed in 2019 Community Wishlist.

  • Phabricator tickets:
  • Proposer: C933103 (talk) 03:41, 17 January 2022 (UTC)[reply]

Discussion

Voting