Community Wishlist Survey 2022/Wikisource/Export of modernised texts

From Meta, a Wikimedia project coordination wiki

Export of modernised texts

  • Problem: On French, Spanish and Portuguese Wikisource, and maybe on other Wikisource projects, it is possible, for texts using old spellings (e.g. containing long s : ſ), to modernise the spelling automatically (e.g. substitute s to ſ). This is allowed by a template called Modernisation, which creates a new tab in the pages that contain it and substitute old spellings by their actual counterparts, without creating a new page.
However, when using WSexport, it is impossible to export the modernised version of the page.
  • Proposed solution: Add a new parameter in WSexport which enables to choose between the different versions of the same text.
  • Who would benefit: Every reader who wants to read old texts without getting headache due to old-fashioned spellings.
  • More comments: For more information on the template Modernisation, see its documentation.
  • Phabricator tickets:
  • Proposer: ElioPrrl (talk) 10:31, 11 January 2022 (UTC)[reply]

Discussion

  • I was going to say that this is achievable via template styles, but then realised that the crux of it is that it needs to be user-selectable at the time of export. That's right isn't it? It seems like it might be doable by adding a system of wikis being able to define multiple stylesheets in the way that they can currently define ebook.css, and then showing those as options in the export form. SWilson (WMF) (talk) 05:16, 12 January 2022 (UTC)[reply]
It's must be user-selectable, indeed: I indeed thought of a new option in the export form. I don't know if it can be managed by multiple stylesheets, because it's not a question of formatting : when clicking on the "Modernise" button, some text is found and replaced by some other text (e.g. avoient, old spelling, by avaient, modern spelling). But I'm not a technical man, maybe I'm badly understanding what you're saying. — ElioPrrl (talk) 17:19, 13 January 2022 (UTC)[reply]
  • There is an system in MediaWiki that converts text, Language converter. It is capable of converting text without creating a new page. It might be better suitable than an module.--Snævar (talk) 18:44, 12 January 2022 (UTC)[reply]
    I have just tested WSexport with Chinese Wikipedia, and WSexport do not support Language Converter either. (In fact there are no such option) C933103 (talk) 22:14, 15 January 2022 (UTC)[reply]
    Good point. LanguageConverter might be suitable for doing this for language variants (and actually we should probably look at implementing that anyway in WS Export) but I'm not sure it'd work for things like translating long S to normal S (that's not a language variant but a typographical archaism). That's done in some Wikisources by having a template output HTML for both variants and then hiding one or the other via Javascript: <span class="long-s">ſ</span><class="normal-s">s</span>, which is why I wonder if it could be done by making multiple stylesheets available in WS Export. This whole topic could definitely take some more investigation though! SWilson (WMF) (talk) 01:40, 17 January 2022 (UTC)[reply]
    Maybe I has not chosen my example judiciously. The typical action of modernisation modules is not replacing typographical variants (like s/ſ), but replacing words by another words (in French, avoient/avaient or tems/temps, in English shew/show or reflexion/reflection). — ElioPrrl (talk) 12:45, 17 January 2022 (UTC)[reply]
    Unfortunately, doing this with CSS has two disadvantages: (1) you replace a 2-byte character with 56 byte of HTML, so you can much easier hit transclusion limit (note, this would be transcluded twice: once from a template and second time using ProofreadPage page transclusion), and (2) the words containing this code will likely be non-searchable due to HTML markups inside the words. I personally also appreciate, if this can be done without creatig extra pages or an extra namespace, like it is done here+here. Ankry (talk) 20:30, 17 January 2022 (UTC)[reply]
    These are wise remarks. And indeed, on French WS, modernisation does not create extra pages, and we do not want this behaviour to change. ElioPrrl (talk) 10:04, 18 January 2022 (UTC)[reply]
    In Polish WS we tend to decide to create separate pages just due to current ws-export limitations. Hovewer, we would hapily withdraw from this. Ankry (talk) 11:36, 18 January 2022 (UTC)[reply]
  • You do want only the syntax char replacement. Maybe, the one easy way is only to replace chars in frontend of the page over Javascript. ✍️ Dušan Kreheľ (talk) 16:36, 26 January 2022 (UTC)[reply]
    I agree, that is one way to do this (and I think it's how it's already done on some Wikisources). But it has shortcomings for other things, such as exporting to other formats via WS Export (or any other tool that uses the rendered HTML). I'm sure we'll figure something out though! :-) SWilson (WMF) (talk) 00:57, 27 January 2022 (UTC)[reply]

Voting