Talk:Community Tech/Ebook Export Improvement

From Meta, a Wikimedia project coordination wiki

Project Overview: Request for Feedback (May 2020)[edit]

Hello, everyone! We invite you to read the content page of the project, which includes an analysis of the ebook export process and its primary issues, and share your feedback below. Thank you!

Have we covered the main reasons why people export ebooks?[edit]

  • Yes, this is accurately covered and clearly explained. MartinPoulter (talk) 18:47, 28 May 2020 (UTC)[reply]
  • Great resume. In our actions, we should always keep in mind that there are two types of users: contributors but also visitors to Wikisource. We have to make sure visitors have a good experience exporting books to whatever device they have. --Viticulum (talk) 19:25, 28 May 2020 (UTC)[reply]
    • @MartinPoulter and Viticulum: Thank you for the feedback! Also, it's a great point that we'll need to be continually mindful of both contributors and visitors. While these two groups will have some overlapping needs in this project (such as being able to find & download books), the contributors may have greater familiarity with Wikisource. For this reason, it's important that we identify the largest problems with the user experience, so we can hopefully improve UX overall. Thanks again! --IFried (WMF) (talk) 15:03, 4 June 2020 (UTC)[reply]

Have we covered the main methods to export ebooks?[edit]

  • Well explained. I have myself learned new thing.
    • In French Wikisource, we use mostly option "#4: Export via links at the top of text", and "#3: Export via links on the main page" when announcing new books.
    • We also have those links on the author page for each book: Ex.: See an author
    • My main concern is that external user find and understand easily how to export books. For "#2: Export via the left side panel" external user can't export a full book, and this can be misleading to them. --Viticulum (talk) 19:28, 28 May 2020 (UTC)[reply]
  • This depends on Wikisource. On czech WS we have only options PDF and EPUB, and no other in gadgets. On sk.WS there is only PDF option. Export should be avaliable on all langage versions for all users. JAn Dudík (talk) 11:30, 29 May 2020 (UTC)[reply]

@MartinPoulter, Viticulum, and JAn Dudík: Thank you for the feedback! It's also very helpful to be reminded of the fact that different wikis have different common practices, and some do not have as many options available. Ideally, we will want to improve overall user experience, so that: 1) users can easily discover how to download books, and 2) users have various options available to them, if possible, rather than being limited by one option. We'll investigate how we can improve this experience. Thank you again! --IFried (WMF) (talk) 19:57, 9 June 2020 (UTC)[reply]

Have we covered the main problems experienced when exporting ebooks?[edit]

  • Yes, and I like that reliability was placed first as book export is so core to the functionality of Wikisource that it needs really high uptime. An observation that occurs near the end is crucial: "The WSExport tool is not easily discoverable, and it doesn't provide an intuitive user experience", yes this is Colleagues, who are intelligent enough and very familiar with the Web, have looked at this site and not grasped that any book on the site can be exported in a variety of formats, and it's easy to see why they miss that. MartinPoulter (talk) 18:47, 28 May 2020 (UTC)[reply]
@MartinPoulter: Thank you for sharing this! It is a fantastic point and it really helps frame some of the key issues. We want the WSExport to work well, of course. In addition, we need people to be able to find it, otherwise all of our potential improvements will have only limited impact. For this reason, we have been investigating some of the primary issues related to user experience and we hope to improve the discoverability of WSExport, along with its general reliability and performance. Again, thank you for sharing this perspective! --IFried (WMF) (talk) 23:43, 24 July 2020 (UTC)[reply]

Problem to export the image on the first page in pdf format[edit]

  • Pour moi le soucis le plus gênant : lors d'un export "pdf", dans la majorité des cas (voir par exemple sur le livre https://fr.wikisource.org/wiki/Le_Lorgnon_(Scribe), l'image de la page de garde ne s'affiche pas en première page, et un message d'erreur "Données insuffisantes pour une image" s'affiche à l'ouverture du fichier.

For me, the most frustrating problem while exporting an ebook in "pdf" format, in most cases (see for example for the book https://fr.wikisource.org/wiki/Le_Lorgnon_(Scribe), the image on the cover page does not appear, and an error message can be seen when opening the file (in french "Données insuffisantes pour une image" => trad. : insuffient data for an image). Thanks, Laurent --Lorlam (talk) 17:36, 28 May 2020 (UTC)[reply]

@Lorlam: Thank you; much appreciated! --IFried (WMF) (talk) 22:06, 10 July 2020 (UTC)[reply]
This seems a side effect of changes made earlier in 2020, as this was not happening before. --Viticulum (talk) 19:33, 28 May 2020 (UTC)[reply]
Yes, this problems appears in february 2020 --Lorlam (talk) 20:43, 28 May 2020 (UTC)[reply]

@MartinPoulter, Lorlam, and Viticulum: Thank you for this information! We completely agree that the user experience is not optimal, and we hope to improve it (both for experienced editors/readers & newcomers). Also, thanks for the information regarding the PDF export issue, which seems to have appeared around February 2020. I'll share this information with the team & see if we can investigate. --IFried (WMF) (talk) 20:24, 9 June 2020 (UTC)[reply]

@Lorlam: Thanks for this comment. The link you provided doesn't seem to have an image on the first page, so I wasn't able to properly test/reproduce this issue. Do you have another example? Thanks! --IFried (WMF) (talk) 16:18, 11 June 2020 (UTC)[reply]
@IFried: The link given (https://fr.wikisource.org/wiki/Le_Lorgnon_(Scribe)) should have an image after exporting the book (see https://fr.wikisource.org/wiki/Livre:Scribe_-_Th%C3%A9%C3%A2tre,_13.djvu), but I could have others examples (https://fr.wikisource.org/wiki/Le_Codicille ... https://fr.wikisource.org/wiki/%C3%89chec_et_mat_(Feuillet) ... https://fr.wikisource.org/wiki/Monsieur_de_Chimpanz%C3%A9). Thanks, --Lorlam (talk) 18:02, 11 June 2020 (UTC)[reply]
@Lorlam: Thanks for the information! We have tested this, and we noticed the following: When we first downloaded the book, the first cover page was blank. However, when we looked at the download a day later (in one case), the cover page had changed to display the expected content. The page was red-colored, which was not expected, but the content and imagery looked fine. Does this reflect your experience? --IFried (WMF) (talk) 23:00, 24 June 2020 (UTC)[reply]
@IFried: Interesting ! A red colored image instead of a Grayscale image ! This may indicate a bad object definition in the pdf structure and insufficient data to render the Green and Blue channels. The Object definition for the Scribe cover should be: <</Type/XObject/BitsPerComponent 8/ColorSpace/DeviceGray/DL 22383/Filter[/DCTDecode]/Height 565/Length 22383/Subtype/Image/Width 400>> instead of <</Type/XObject/BitsPerComponent 8/ColorSpace/DeviceRGB/DL 22383/Filter[/DCTDecode]/Height 565/Length 22383/Subtype/Image/Width 400>> as written by Calibre. As mentioned, a workaround is to convert 8bits grayscale cover to 24bits True color.--Denis Gagne52 (talk) 15:37, 27 June 2020 (UTC)[reply]
@Denis Gagne52: Ah, interesting; thanks for providing that potential explanation (which I have noted). Much appreciated. --IFried (WMF) (talk) 21:57, 7 July 2020 (UTC)[reply]
@IFried: Sorry, for me, for books which have problems, the cover page never displays (for information, I use Acrobat Reader to open pdf files) --Lorlam (talk)
@IFried:I had information about this problem which is described on Github https://github.com/wsexport/tool/commit/2fcf826411ff97b60fa4bf78e91092d046800302Github cu --Lorlam (talk) 15:58, 26 June 2020 (UTC)[reply]
@Lorlam: Thanks for getting back to us and providing this additional information. We'll check it out. One thing: It seems like the Github link that you provided didn't work (404 error when I tried to access it). Can you try to share it again? Thanks! --IFried (WMF) (talk) 21:54, 7 July 2020 (UTC)[reply]
@IFried: Yeh ! it is https://github.com/wsexport/tool/commit/2fcf826411ff97b60fa4bf78e91092d046800302 cu --Lorlam (talk) 00:10, 8 July 2020 (UTC)[reply]
@IFried: Hi ! We are in august... For me this problem is still not corrected :-( --Lorlam (talk) 17:30, 21 August 2020 (UTC)[reply]
@Lorlam: Hello, and thank you for reaching out! We have not yet begun prioritizing and tackling individual bugs, such as the one you described. We are still primarily in the research phase of the project. We cannot guarantee that we will fix specific Wikisource or WSExport specific bugs yet (and there are many for us to look into), however we do hope to fix many bugs, with priority given to those related to: 1) people being unable to download books, and 2) people unable read the text within the books (i.e., font rendering issues). In the meantime, we would love if you could check out our August update and share your feedback. By sharing your feedback, you'll help us get closer to wrapping up our research phase of the project, and we'll then be able to dig into some of the highest priority Wikisource-related bugs. Also, one quick note: My username is actually IFried_(WMF), so if you use that in the future, I'll definitely get the ping. Anyway, thank you for all of your feedback so far, and we hope to read more in response to the August update! --IFried (WMF) (talk) 23:32, 21 August 2020 (UTC)[reply]
Just to say, although this may seem a minor issue, it actually is quite a nuisance for end users who download multiple books. Without a set cover unique to each book, they all look the same (Wikisource logo plus small text title, or boring text title). ebooks with designed covers eg the original title page are much more identifiable etc. JimKillock (talk) 13:27, 28 August 2020 (UTC)[reply]

Which formatting and style issues are the most common and frustrating, in your opinion?[edit]

  • Many a times we add image using the crop image tool. For the web view it is okay. But if we try to download the book instead of cropped image the whole image of the page is downloaded in the book.
  • Tables are not rendered properly many a times in the downloaded book
  • sfrac template is not rendered properly in downloaded book.

--Balajijagadesh (talk) 18:27, 27 May 2020 (UTC)[reply]

@Balajijagadesh: Thank you for this information! One question: When we conducted some basic tests, the fractions in ebook exports looked okay. Maybe you can provide some examples of the sfrac template issue, which we can use for analysis? Thanks! --IFried (WMF) (talk) 23:07, 24 June 2020 (UTC)[reply]
@IFried (WMF): Hi. Thanks for reaching out. The sfrac template is rendered properly in pdf and epub formats. But is not rendered properly in mobi format. The horizontal bar disappears and the introduces alignment problem. Let me know if you can reproduce the problem. Regards -- Balajijagadesh (talk) 13:50, 2 July 2020 (UTC)[reply]
@Balajijagadesh: Thank you for sharing this information! We have tested this issue on mobi, and we have been able to reproduce the issue. I have written a ticket for this. Appreciate it! --IFried (WMF) (talk) 22:22, 10 July 2020 (UTC)[reply]
@Sushant savla: Thank you for this feedback! We think we captured this issue in our example #5 on the project page. Is this correct/is this the same issue that you are describing? Thanks! --IFried (WMF) (talk) 23:09, 24 June 2020 (UTC)[reply]
  • I have found that the main problem with "Download as PDF" is fonts. When special fonts are used, especially those that support diacriticals, the output is not always rendered in the same font. Rather, a standard font is sometimes used, one which does not support the diacriticals. There are also sometimes unexpected changes to font size that can ruin the formatting. Dovi (talk) 12:12, 28 May 2020 (UTC)[reply]
@Dovi: Thank you for providing this information! We have some follow-up questions (so we can better understand the problem). Our questions: Can you provide an example of where you are seeing this issue? And how are you downloading the PDF? Is it via the side-panel (and, therefore, via ElectronPDF) or via the top panel (and, therefore, via WSExport), or somewhere else? Thanks in advance! --IFried (WMF) (talk) 23:13, 24 June 2020 (UTC)[reply]
@Dovi: Hello! The ability to see "Choose format" should be available, if WSExport is enabled, on the wiki. If you want to enable it in the sidebar, you can try to contact someone with interface admin rights on your wiki in order to enable it in the sidebar. --IFried (WMF) (talk) 23:15, 24 June 2020 (UTC)[reply]
  • (Perhaps this should be in a new section, feel free to move): The first example from enWS is, in my opinion, not a good example. The markup at enWS was using the <center> tag, and it wasn't using the s:en:Template:page break template, which inserts some CSS-styled div to produce a page break in ereaders (break-after:page; page-break-after:always;). So, I think these issues are not really the fault of the WS-export tool, but rather an issue that should be fixed at enWS. Perhaps WS-export could spot "suspect" markup and make a best-effort attempt to hotfix them during export, but that would mask the underlying issue of poor markup at the source and offload the burden onto the WS-export maintainers. Inductiveload (talk) 10:53, 29 May 2020 (UTC)[reply]
@Inductiveload: Thanks for sharing this information; it was very helpful. We can see, like you wrote, that the example is due to incorrect markup (i.e., template:pagebreak should have been used instead of <center> tag). In this case, the issue seems to be community outreach and education rather than a technical issue. However, we still want to document that this is happening, so that we can inform our communities how to mitigate these issues when they export books using WSExport. We'll also look into adding more details on the project page about this. Thanks again! --IFried (WMF) (talk) 23:18, 24 June 2020 (UTC)[reply]
  • I tried to export several books with WSexport tool. And the biggest issue was - metadata. On cs.wikisource we have on all content pages infobox with information about author, source, licence etc. And the same table was at the beginning of every chapter in exported book. There should be option to hide these informations on export and have them only once in text. JAn Dudík (talk) 11:50, 29 May 2020 (UTC)[reply]
@JAn Dudík: Thanks for the feedback! While we see that someone provided a solution to the metadata issue with ebook exports, we also understand that there are other issues, and we hope to improve the ebook export experience overall. Furthermore, we see that there’s an issue with encoding in external hyperlinks, which we've noted. Thanks! --IFried (WMF) (talk) 23:32, 24 June 2020 (UTC)[reply]
  • @JAn Dudík: The support for WSExport on cs.wikisource is very poor. If cs.wikisource community wants good exported e-books it would unfortunately require lot of changes there. Hiding metadata table is one of simple changes. --EBookian (talk) 20:35, 29 May 2020 (UTC)[reply]
  • @JAn Dudík: WSExport is quite simple tool which takes some pages and translates them into e-book, there is not much to document while it surely lacks in some areas. You added microformat there which is good thing. On the other hand cs.wikisource heavily relies on those metadata tables at the moment and if you exclude them from export now you will see no divide between chapters. You need to unify the style of pages, create e-book CSS, ... I am getting out of scope of this page, if you wish we can continue this talk somewhere else. --EBookian (talk) 21:20, 29 May 2020 (UTC)[reply]
@JAn Dudík: Thanks for bringing up this question about documentation! We also see that improved documentation of best practices can help people encounter less confusion and errors. We’re currently looking into how to do this, and we’ll update the project page when we have information. --IFried (WMF) (talk) 22:06, 7 July 2020 (UTC)[reply]
@JAn Dudík: Thanks for this information. In order to better understand the problem, we have a few questions: 1) When you say you are using the mobile app, what do you mean, exactly (since there is no Wikisource app?). Are you using the mobile view of a desktop browser, for example? 2) Did you use the download PDF button on this page (we are asking because this link uses ElectronPDF rather than WSExport)? Thanks! --IFried (WMF) (talk) 22:10, 7 July 2020 (UTC)[reply]
@IFried (WMF): I used wsexport for generating epub file from cs.wikisource book. Then I copy it to my mobile and opened using Cool Reader app (but you can imagine any other e-book reader). Text of book and images were correct, but external link from infoboxes were with bad encoding. JAn Dudík (talk) 09:22, 8 July 2020 (UTC)[reply]
@JAn Dudík: Thank you for this explanation! We tested accessing a downloaded epub via a mobile reading app, and we didn’t see any issues with the external links. However, we understand that this may issue may still occur sometimes. For this reason, we have documented this issue in Phabricator. We may not have time to fix it in the scope of this project, since we are primarily focusing on issues related to the WSExport tool not working/working too slowly or books no being readable (i.e., basic functionality of the tool and basic readability of the text). However, we have it documented, in case someone would like to fix it now or in the future. Thank you for reporting it! --IFried (WMF) (talk) 23:33, 24 July 2020 (UTC)[reply]
  • While converting the text from wikisource into pdf or rtf, the text is indented at the start of the every paragraph. It even indents the first line of the poem even if it is enclosed under poem tag. So the output for poems are bad spoiling all the alignment for the poems. The poems are not indented in epub or mobi format. The issue can be seen here -- Balajijagadesh (talk) 07:06, 3 July 2020 (UTC)[reply]
@Balajijagadesh: Thanks for this feedback! We have tested this issue on epub, pdf, and mobi. As you wrote, the pdf version had incorrect indentation. The mobi version had the numbers smashed into the text, which also looked strange. The only version that looked okay was epub. We have written a ticket to track the issue, and we’ll see if we can look into this. In addition, we are beginning to investigate the best practices for proofreading content to Wikisource. Once we share these findings, we hope it can help prevent some formatting and styles issues in the future. Thanks! --IFried (WMF) (talk) 22:03, 10 July 2020 (UTC)[reply]
@Balajijagadesh: Thank you for bringing this up! We have covered this issue in example #2 on the project page, and we agree that this is a big problem. We really hope that we can fix it, and we have begun investigating how we may be able to do this. Thanks again and we hope to provide updates on this issue soon. --IFried (WMF) (talk) 22:05, 10 July 2020 (UTC)[reply]

Which user experience issues are the most common and frustrating, in your opinion?[edit]

@Balajijagadesh and Nemo bis: Thank you for the feedback on the most frustrating UX issues! This is helpful and we will take a look. --IFried (WMF) (talk) 22:09, 10 July 2020 (UTC)[reply]

Which problems, overall, do you find the most critical to fix, and why?[edit]

  • Since the latest version, WSExport is slower than before. External visitors may not be patient if system too slow (they may think it is not working). When time-out is reach, message is not user-friendly for external visitors. --Viticulum (talk) 19:31, 28 May 2020 (UTC)[reply]
@Viticulum: Thanks for the feedback! One question: What is the latest version you are referring to? Also, thanks for the comment about the need to improve user-friendly messaging (we’ll look into it). --IFried (WMF) (talk) 22:11, 10 July 2020 (UTC)[reply]
@IFried (WMF): Sorry for being so long to come back to you. The slowness were experimented in May in production. I do not know how to determine versions. I will test 10 books this week, everyday. Results on Friday. --Viticulum (talk) 19:40, 26 July 2020 (UTC)[reply]
@IFried (WMF): Please see here the result of my test for Export Time. --Viticulum (talk) 19:56, 2 August 2020 (UTC)[reply]
@Viticulum: Thank you for sharing this very useful information! We have included it as a note in our current investigation about Wikisource errors and issues. This will help us have a better understanding of the wait times experienced by some Wikisource users when downloading books. We hope this analysis can help us identify primary issues and how we can go about fixing or improving them. Thank you again! --IFried (WMF) (talk) 21:31, 11 August 2020 (UTC)[reply]
  • We need multi-year reliability. Multi-page export needs to be provided by a MediaWiki extension again to all the formats people need: PDF and EPUB at a minimum (but when you support EPUB, it's easy to add ZIM and ODT as well). The development and maintenance extension needs to be outsourced to a third party, with sufficient funding for at least 5 years, so that users and partners (for instance libraries) can be sure that it will keep existing in the future and not vanish overnight if a couple persons at WMF decide so. Without a reliable export, it's impossible to get national libraries and the various access methods to bring users to Wikisource. Nemo 13:45, 29 May 2020 (UTC)[reply]
@Nemo bis: Thanks for the feedback! Just to make sure we understand your comment, can you clarify what you mean by “multi-year reliability?” To your other point, we agree that Wikisource should have more standardized and easily accessible tools and gadgets. For this reason, we will be working to improve this issue, especially through the ‘Migrate Wikisource specific edit tools from gadgets to Wikisource extension’ wish. Finally, to your point regarding maintenance: While the Community Tech team will not be maintaining Wikisource, overall, in a long-term capacity, we are hoping to increase the overall health and usability of Wikisource, so that it is easier to maintain in the future. --IFried (WMF) (talk) 22:13, 10 July 2020 (UTC)[reply]

Anything else you would like to add?[edit]

  • I would like the developers/technical team to pay attention to eBooks in RTL languages. These are written right-to-left (E.g., Hebrew and Arabic). I hope the Export tool will also support such languages. From past expreience, such support is not automatic, and special care is needed to ensure this.--Naḥum (talk) 12:14, 28 May 2020 (UTC)[reply]
@Nahum: Thanks so much for this feedback! We would love to learn more about the issues and challenges unique to RTL users on Wikisource, especially regarding ebook exports. Can you provide more details? We agree that this should be looked into as well, so we look forward to your response. --IFried (WMF) (talk) 22:15, 10 July 2020 (UTC)[reply]

Modernisation does not export[edit]

Hi ! One issue with the export is that the modernisation system that we use, at least in the fr.wikisource, does not work in exported formats because its in JS. But it cause very unpleasant reading of old texts who have been transcribed in the original version then modernised with the modernisation system. Its very convenient to use on wikisource itself but very disappointing with the export. --M0tty (talk) 12:00, 28 May 2020 (UTC)[reply]

See this example [1] for modernisation of old French: On middle/left there is "Orthographe originale" or "Orthographe moderne". This is done for each chapter. It is not possible to extract a chapter or the whole book in modernised French. This functionality is not incorporated in WSExport. I believe this would be a whole project in itself. Tpt could give more insight. --Viticulum (talk) 19:46, 28 May 2020 (UTC)[reply]
Ideally, as it seems possible to include some Javascript in an ePub, it would be great if the ePub file could contain both versions and switch from one to the other using exactly the same Javascript code as in the French Wikisource. However it's possible that this would require to load not only the "local" replacements present as a parameter of the modernisation model, but also the entire Wikisource modernization dictionary, or at least the subset of words which are found in the exported text. --George2etexte (talk) 14:13, 2 June 2020 (UTC)[reply]
@M0tty, Viticulum, and George2etexte: Thanks for sharing this information. From my understanding, you are writing about the fact that Wikisource readers online can choose which orthography to select, but this is not available for ebook exports. Is this correct? And, if so, can you provide a bit more explanation and context around it (for example, do you know if there is already a Phabricator ticket that documents this problem)? The fix for this may be a large project that is out of scope for the current project. However, it’s good for us to still know about this issue, and we would like to document it in Phabricator. We look forward to your response. Thanks! --IFried (WMF) (talk) 22:17, 10 July 2020 (UTC)[reply]
Hi @IFried (WMF): Yes, that's exactly that. The epub export can't export the modernisation layer. I haven't found any ticket on fabricator regarding this issue. Thx for looking after this. --M0tty (talk) 17:46, 11 July 2020 (UTC)[reply]
@M0tty: Thank you for your response! We have documented the issue on Phabricator. We may not be able to work on it during the span of this project, since we’re primarily focused on fixing issues related to people not being able to download, access, or read books (i.e., core, basic usage bugs). However, we wanted to document it, and we hope that it can be picked up by someone to be fixed in the future. Thank you for letting us know about this issue! --IFried (WMF) (talk) 23:17, 17 July 2020 (UTC)[reply]

Math export[edit]

  • Currently on the different wiki, it is possible to activate MathML to have a nice render of mathematical formulas instead of vectorial images. The current export process does not allow to have this MathML format and include all mathematical formulas as images, like the old mediawiki way. MathML being now vastly handled, it would be really useful to be able to export the code with MathML. — Alan Talk 13:16, 28 May 2020 (UTC)
@Nalou: Thanks so much for this information! As a first question, can you let us know a bit more about how you activate and use MathML (with an example, preferably)? If we understand correctly, you are writing about the inability to use math markup in Wikisource. For this reason, users need to employ tactics that aren’t ideal, such as capturing an image of a formula with the crop tool. Is that correct? Thanks! --IFried (WMF) (talk) 22:19, 10 July 2020 (UTC)[reply]
@IFried (WMF): Thanks for tracking my remark. For activating MathML I simply checked the dedicated button in the Appearance tab in the preferences (at the bottom of the page). It allows to have a nice MathML rendering in pages. One example can be found here. There are LaTeX formulas embedded in math tags in the wikicode. We can have math markup on Wikisource website; this is an example of it. And it works very nicely in webbrowser. But if I want to export a pdf version of the book, the exported document uses images for the math formulas. If you try the export in htmlz format, you'll see in the html code that the formulas are included using images. To reformulate what I said, I would like to have proper math formulas like in a LaTeX document. The wikicode exists. I do not know how it is treated by wsexport but the math tags are exported as images whereas there exists some possibilities to handle directly the latex formulas. Your last sentence is partly correct: I tried to automate the modification of the html export of the book to replace the images with the LaTeX formulas. But it is quite complicated so I stopped... As a test, I suggest that you look at the export of the previous book I mentionned (direct link to wsexport here). If you look at page 6, you will see that the math symbols are not in the same fonts as the main text and that it cannot be selected. This comes from the pre-rendering of math formulas by the mediawiki engine and the inclusion of it in the document. This system was done to allow the best crossplatform accessibility of math in wikipedia but it is not quite adapted for exporting documents today. I am convince that a better solution may be possible. Scientific books are a huge part of our culture. It would be a very nice possibility to produce modern version of old and innaccessible books in pdf or epub. I tried to be a bit more exhaustive on the description. Feel free to ask me more if it is not clear (quite hard to describe everything using text). — Alan Talk 13:03, 11 July 2020 (UTC)
@Nalou: Thank you so much for your detailed response! From my understanding, the issue is that mathematical formulas are sometimes expressed as images rather than text, which limits what people can do in terms of reading, sharing, and analyzing the information. Are we correct in this analysis? If so, we have documented this issue on Phabricator, and we’ll see if we can do anything to fix it. If this isn’t the issue, we would love to hear more details so we can understand. Thanks! --IFried (WMF) (talk) 23:15, 17 July 2020 (UTC)[reply]

Wrong date order for exports in french langage[edit]

  • Le format de la date est inversé (mois/jour/année) comme c'est la norme en anglais, par exemple aujourd'hui : "Exporté de Wikisource le 05/28/20" => c'est bizarre tout de même d'avoir le commentaire "Exporté de…" en français avec un format de date au format "anglais"

For book exports in french Wikisource, the date order is inverted (month/day/year) as it is the rule in english. For example for today : "Exporté de Wikisource le 05/28/20" => But it is strange to have the comment "Exporté de…" in french, with a wrong date order, as it is the rule in english (in french the date order in day/month/year), so, in french, we are today the 28/05/20 (and not the 05/28/20). Thanks, Laurent --Lorlam (talk) 17:48, 28 May 2020 (UTC)[reply]

=> Ok now, the problem has been fixed. --Lorlam (talk) 00:39, 25 June 2020 (UTC)[reply]
@Lorlam: Thanks for reporting this issue! As you wrote, the issue appears to be fixed in some cases. However, we still see this issue arising in other cases, such as in Tamil exports, so we’ll look into this. One possible solution may be to display the name of the month rather than the number. Thanks! --IFried (WMF) (talk) 22:24, 10 July 2020 (UTC)[reply]

Bad export in "pdf" for french civility titles[edit]

  • L'outil d'export en "pdf" ne sait pas traiter les modèles de civilité entre accolades "M." / "Mlle" / "Mme" / "Mmmes" / etc… et on obtient une sortie "pdf" pas très jolie ou les caractères sont soulignés en pointillés ce qui ne les rend pas très lisibles…

The "pdf" export tool does not export correcty french civility titles that we use in french Wikisource (under embrace "M." for Monsieur / "MM." for Messieurs / "Mlle" for Mademoiselle / "Mme" for Madame / "Mmes" for Mesadames / etc…). The export in "pdf" shows caracters underlined with a dotted line, which is not well readable... (example for the distribution list of the play https://fr.wikisource.org/wiki/Un_gros_mot) Thanks, Laurent --Lorlam (talk) 18:16, 28 May 2020 (UTC)[reply]

  • To add a clue for this problem, someone in french wikisource said it is a problem with the {{abréviation}}
model (see here https://fr.wikisource.org/wiki/Mod%C3%A8le:Abr%C3%A9viation), and all others models which uses it. All these "civility titles" models are described here : https://fr.wikisource.org/wiki/Cat%C3%A9gorie:Mod%C3%A8les_de_titre_de_civilit%C3%A9 ... thx --Lorlam (talk) 21:00, 28 May 2020 (UTC)[reply]
=> This problem has been fixed by modifying the model in french Wikisource, so Okay now ;-) --Lorlam (talk) 21:10, 31 May 2020 (UTC)[reply]
@Lorlam: Thanks for reporting this! It appears that this issue has been fixed, as you have written. However, if this issue arises again, please do let us know. Thank you! --IFried (WMF) (talk) 22:25, 10 July 2020 (UTC)[reply]

e-book navigation[edit]

The way Table of Contents is translated into e-book navigation (I mean e-book reader navigation, not ToC that would be printed) is very limited. It would be beneficial if there was a way to allow editors to change the structure of e-book navigation to align better with the book structure (probably by some ToC tags). --EBookian (talk) 20:58, 29 May 2020 (UTC)[reply]

@EBookian: Thanks for the information! Can you provide more details on this problem (perhaps a specific example of where you are seeing this problem)? This will help us understand the problem better. Much appreciated! --IFried (WMF) (talk) 22:26, 10 July 2020 (UTC)[reply]

Long chapters and footnotes[edit]

I have observed on several occasions the case of footnotes in books with no chapters or with chapters exceeding 80 or 100 pages. The wsexport epub tool arbitrarily splits the chapter and breaks the links to the footnotes, forcing you to artificially split the chapter to get around the problem. See exemple in Histoire de l'affaire Dreyfus T.2)--Cunegonde1 (talk) 03:36, 30 May 2020 (UTC)[reply]

@Cunegonde1: Thank you for this information! We have conducted some basic tests on EPUB and PDF to try to reproduce the splitting and link problems. However, we were unable to reproduce the issues. The footnotes appeared to properly display at the end of the chapter with linking functionality. Perhaps you can share a screenshot and more details that demonstrate the issue? This will help us understand the problem better and see if it is something we can fix. Thanks! --IFried (WMF) (talk) 22:28, 10 July 2020 (UTC)[reply]
@IFried: You can see the issue on this book : Sade - histoire_de_Juliette, if you create the epub and edit it, you can see that the first footnote call is localised on chap : c1_L_histoire_de_Juliette_premiere_partie.xhtml, page 62, and the text of footnote is localised on a chapter call : c1_L_histoire_de_Juliette_premiere_partie_2.xhtml, the link beetwin the footnote call and the footnote text is : <a xmlns:epub="http://www.idpf.org/2007/ops" href="#cite_note-1" epub:type="noteref">[1]</a> is not pointing to the chapter where is the text of footnote. Excuse my poor english. Abstract in french : Le lien entre l'appel de note et la note elle-même ne fonctionne pas, l'appel de note se trouve dans une section de l'epub et la note elle même dans une autre sans qu'il y ait un lien pointant vers cette section.--Cunegonde1 (talk) 06:53, 11 July 2020 (UTC)[reply]
@Cunegonde1: Thank you for your response! Are you saying that the footnote link (for example, “1” on page 62) does not actually redirect the user to the appropriate footnote section when they click on it? If so, we are able to reproduce the issue & we have documented this behavior in a Phabricator ticket. If this is something else, maybe you can provide a screenshot or more details? Thanks. --IFried (WMF) (talk) 23:13, 17 July 2020 (UTC)[reply]
@IFried (WMF): Thanks, you describe exactly the issue.--Cunegonde1 (talk) 05:46, 18 July 2020 (UTC)[reply]

Prevent page breaks after headings[edit]

Je voudrais signaler aussi des sauts de pages intempestifs, typiquement entre un titre de section et le texte de la section, quand celle-ci ne commence pas sur une nouvelle page (par exemple, dans l’epub exporté à partir de cet ouvrage de Gauss, dont les chapitres sont eux-mêmes divisés en courts articles, comme on peut le voir sur cet exemple, le numéro de l’article et le début de l’article se trouvent souvent sur deux pages séparées). Sur le Wikisource français, des modèles ont été créés justement pour la mise en forme des titres et leur hiérarchisation, de {{t2}} à {{t6}}, à partir des balises HTML h2 à h6. Pourrait-on modifier certains paramètres, de ces modèles ou de l’export, pour empêcher un saut de page entre un tel titre et le début de la section, quel que soit le nombre de retours chariot qui le suit dans le code ?

I would like also to draw your attention on some inappropriate page breaks, basically between a section heading and the text of this section, especially when this section does not begin on a new page (for example, in this Gauss' work, whose chapters are themselves divided in small articles designated by numbers, as you can see here, the article number and the beginning of the article are frequently separated in the epub by a page break). On French Wikisource, some templates, namely {{t2}} to {{t6}}, are specifically designed to specify the style and the hierarchy of headings (based on HTML h2 to h6). Could these models or the export tool be modified to prevent page breaks after headings, whatever the number of carriage returns following it in the code ?ElioPrrl (talk) 15:27, 30 May 2020 (UTC)[reply]

Did you consider to use these tags:
  • <div style = "page-break-inside: avoid;"> <! - Beginning of the block: Skip page to avoid ->
  • Your text-block…
  • </div> <! - End of block: Skip page to avoid -> This could be encapsulated in a template --Denis Gagne52 (talk) 16:27, 27 June 2020 (UTC)[reply]
@ElioPrrl: Thanks for reporting this issue! We have tested this issue, and we were able to reproduce it. We also see that the possible inclusion tags (as describe above by Denis Gagne5) could fix this issue. Can you let us know if the issue is indeed fixed by the tags, or no? Thanks! --IFried (WMF) (talk) 22:31, 10 July 2020 (UTC)[reply]
@Denis Gagne52 and IFried (WMF): These tags cannot fit the bill, since I want to avoid page breaks not inside a paragraph, but after a paragraph (most of the time, after a title tag) ; there exists also a propriety called page-break-after, but I have not succeeded in fiwing this problem thanks to it (but I'm knew to CSS, I've learned it for six months, and maybe this explains my difficulities ). After taking a look in the exported code, I saw that the title tags h1, h2, etc., are often followed by one, or several, blank lines <p><br/></p>, where the page can be broken : I think that here lies the unefficiency of page-break-after. And as we cannot predict how many blank lines will follow a title tag, I have no idea how to prevent thesepage breaks, whatever the number of carriage returns following it.ElioPrrl (talk) 15:14, 17 July 2020 (UTC)[reply]

@ElioPrrl: Your title must be enclosed in the div followed by the paragraph or part of it. The page-break will happen before or after the div.

<div style = "page-break-inside: avoid;">
Your title block
The following paragraph
</div>
--Denis Gagne52 (talk) 01:32, 19 July 2020 (UTC)[reply]
@Denis Gagne52: I do understand ; but, thus, there can't be any page-break in the following paragraph either... And if the paragraph consists in more than three or five lines (even more so if the paragraph is longer than a page), this solution is much too coarse. By the way, it would be far more comfortable if the solution were implemented either in Mediawiki or in our models t2, ..., t6. — ElioPrrl (talk) 08:53, 19 July 2020 (UTC)[reply]
@Denis Gagne52 and ElioPrrl: Thank you for this explanation and feedback! This issue is focused on problems related to proofreading, if we understand correctly. The current project focuses on improving ebook exports, rather than proofreading, so this issue is not in the scope of this project. However, we hope to improve the experience of proofreading by sharing documentation of best practices for all Wikisource users (the research is in progress, and we’ll share our findings in the future). I hope this can be of help, and thank you again! --IFried (WMF) (talk) 23:31, 24 July 2020 (UTC)[reply]

Initials[edit]

Initials built with the lettrine model in the French Wikisource are not properly displayed in the ePub exported files (see e.g. this play). It's a bit better in the PDF exports (the font size is a bit larger than the text, although it does not seem to adapt to the number of lines given in the « lignes= » parameter of the model). --George2etexte (talk) 14:13, 2 June 2020 (UTC)[reply]

@George2etexte: Thank you for letting us know about this! We have conducted some basic tests. In our tests, we found that the lettrine was represented better in EPUB than PDF, but we understand that there may be different experiences on different devices. We may not have capacity to fix this issue, but we have noted it. Is there a Phabricator ticket for documentation purposes? If not, would you like create one and tag us? Thanks! --IFried (WMF) (talk) 22:32, 10 July 2020 (UTC)[reply]

Cropped image handling[edit]

Telugu wikipedia wikisource extensively used cropped scan to represent images or figures in text as in example page. Current wsexport handles it well and we would like this functionality to be handled in future. --49.206.8.248 04:50, 3 July 2020 (UTC)[reply]

@49.206.8.248: Thank you! We are happy to hear that WSExport handles cropped images well on Telugu Wikisource (we assume you meant Wikisource?). However, we also know that there are cropped image issues experienced by other users, so we’ll see if there is something that we can do to improve this issue. Thanks again for commenting. --IFried (WMF) (talk) 22:07, 10 July 2020 (UTC)[reply]
Thanks [User:IFried (WMF)|IFried (WMF)]] for your response.--Arjunaraoc (talk) 16:52, 4 September 2020 (UTC)[reply]

Early findings: Request for feedback (August 2020)[edit]

Pinging everyone who previously commented on this page (and apologies if I missed anyone!).

@Balajijagadesh, Sushant savla, M0tty, Sannita, Dovi, Nahum, Nalou, Lorlam, MartinPoulter, Consulnico, Viticulum, Inductiveload, JAn Dudík, Nemo bis, EBookian, Cunegonde1, ElioPrrl, George2etexte, and Denis Gagne52:

Hello, everyone! We have just posted an August update for the ebook export improvement project, which shares our findings related to the project so far. We invite you to read our analysis and share your feedback below. We deeply appreciate your feedback, which will help us determine next steps for the project. Thank you in advance!

What are your general thoughts about the guiding principles that we have learned from the consultation so far (i.e., “Lessons from the consultation”)? Is there anything that you think we should add or change?[edit]

I am very happy to see all the recognition given to the export tool and its importance. Thank you to all the team for all the great work. The visitor experience will always remain “my priority”. (I know, I am repeating myself…). I understand resource availability, but we sure are in the good direction.
Sharing good practices is a great idea. --Viticulum (talk) 16:10, 21 August 2020 (UTC)[reply]
Viticulum Thank you so much for all of your feedback! It makes us really happy to know that you think we are going in the right direction. We agree that user experience is very important, and we also want to improve it. For this reason, we will be sharing an update in September about proposed UX improvements. In the meantime, we’ll continue investigating and looking into how the WSExport tool, font rendering, and other core issues can be improved, as well. Thank you again! --IFried (WMF) (talk) 21:32, 28 August 2020 (UTC)[reply]

Is there anything you would like to share about the work we have done so far (i.e., VPS work, Calibre upgrade, various investigations, and the consolidation of tickets)? We’re open to any thoughts or suggestions![edit]

What do you think of the proposal to investigate cache generated ebooks? Would this be useful and high-priority, in your view? Do you have any concerns?[edit]

I think this would make server resource utilization much more efficient. This is also complementary to the next section (request queue optimizing; see also my comment there) and (IMO) can be developed together. However, we should have an option to skip or clear the cache for a specific request, especially as an e-book test feature for Wikisource editors. Ankry (talk) 22:08, 14 August 2020 (UTC)[reply]
@Ankry: Thank you for sharing this feedback! It’s great that you think this could be a useful improvement. In regard to the ability to skip the cache, is the main reason why because the user may want to see a new version (rather than the cached version) of the book? We think this is a good reason why; we just want to confirm that we understand your thought process correctly. Thank you and we look forward to your response! --IFried (WMF) (talk) 21:35, 28 August 2020 (UTC)[reply]
@IFried (WMF): I mean the case when Wikisource editors make some fixes and want to see the result (whether their fixes work or not). This is a special case, so definitely not for default behaviour. Ankry (talk) 13:11, 29 August 2020 (UTC)[reply]
I understand the technical point of view, and the usefulness for speed reason. My concern is that books continue to be validated, corrected even once they are published. (First phase is correcting to have all yellow pages, then sometimes another user validates to green pages). Since WSExport “cannot” know this, we may not be downloading the latest version. --Viticulum (talk) 16:14, 21 August 2020 (UTC)[reply]
@Viticulum: Yes, this is a great point, and we touched upon this in the discussion above with Ankry. One possible solution may be to allow editors to skip the cache, if they want, so it’s a choice rather than a requirement. I will communicate this point to the engineers, and we’ll see if we can come up with a solution that takes into account this concern. Much appreciated for bringing it up! --IFried (WMF) (talk) 21:36, 28 August 2020 (UTC)[reply]

What do you think of the proposal to investigate job queue for more efficient ebook generation? Would this be useful and high-priority, in your view? Do you have any concerns?[edit]

I think that efficient request handling is important as it would allow to shorten the time that clients wait for the requested e-book. It is IMO more likely that the same e-book generation is requested multiple times in a short time period due to eg. being announced as a new work just completed or information about the book being shared between people, than having completely random e-books requested. The advantage would be both: higher end user satisfaction due to receiving the e-book faster and lower server load due to merging multiple requests into a single e-book generation process.
It would be also nice if the users get immediately the information of the e-book generation process status. My tests on an external server suggest that many users who do not get any result in 10-15s try to request the e-book again and again. E-book generation time is usually longer than 15s. Ankry (talk) 21:52, 14 August 2020 (UTC)[reply]


I agree with Ankry. But then, when a book is announced, I think this is also the period where there are the most corrections being done, another user decides to start validate the book (for example). Not easy to find a compromise, but speed is important . --Viticulum (talk) 16:20, 21 August 2020 (UTC)[reply]
@Ankry: Thank you so much for sharing this! We are very happy that you think the job queue work would help address issues related to reliability. Your detailed explanation was helpful, as well. Regarding the export status: We agree that it is currently confusing to users who may not know the status of the export. We would like to address this, as well, if possible. I’ll talk with the team about how we can let users know the status of the export in a more intuitive and accessible way. --IFried (WMF) (talk) 21:38, 28 August 2020 (UTC)[reply]
@Viticulum: Yes, this will be recurring theme in this project (i.e., balancing the need for speed with the need to have the latest version of the book). This is something that we will be mindful of as a team and consider high priority in terms of how we think about the user experience. For this reason, I’ll be sharing this topic with the engineers, and we’ll discuss what can be done. Once we have a better idea, I can share an update on the project page. --IFried (WMF) (talk) 21:42, 28 August 2020 (UTC)[reply]

What do you think of the proposal to investigate how to prevent incomplete book downloads? Would this be useful and high-priority, in your view? Do you have any concerns?[edit]

Yes this should be looked into. It gives a bad opinion to external visitors. --Viticulum (talk) 16:21, 21 August 2020 (UTC)[reply]
Great! We appreciate the feedback. --IFried (WMF) (talk) 21:43, 28 August 2020 (UTC)[reply]

What do you think of the proposal to switch to a new system of fonts? Would this be useful and high-priority, in your view? Do you have any concerns?[edit]

What work or investigations would you like to see that is *not* being addressed or is being addressed in a different way than you would expect? In other words, what do you think we’re overlooking, if anything?[edit]

I think you should investigate the "TOC tree" (I don't know how it's called) of the generated ePub. Specially for larger books, it's very useful to have sections and subsections and sometimes even beyond. We should encourage the use of semantic tags, such as h1, h2, etc. for that matter. I think the French Wikisource already does that but I don't know how that translates into the epub. Regards, Ignacio Rodríguez (talk) 01:48, 14 August 2020 (UTC)[reply]

This is an improvement that I would appreciate. A single level table of contents is not suitable for many types of books. It could be easier, in my opinion, to specify the position from the index rather than the h2, h3, ... tags. The only way to get this result actually is to split the index between the main page and the sub-pages, an arduous process that should be simplified. --Denis Gagne52 (talk) 23:47, 21 August 2020 (UTC)[reply]

@Ignacio Rodríguez and Denis Gagne52: Thank you so much for sharing this information! You both wrote about difficulties related to the Table of Contents, and specifically about the fact that WSExport does not always download complete books that have sub-sub pages. For this reason, it may be useful for the team to analyze how we can better support books that have sub-sub pages in the ebook export process. We have two follow-up questions:
  1. Do you have specific examples of the problem related to Table of Contents to share? We would like to examine any specific examples you can provide.
  2. Do you have any ideas of the best way to solve it? We noticed that your comments express differing views on whether we should use h1, h2… tags. Can you share more on your opinion on the tags, among other solutions?
We are very curious about any additional information that can be provided. Thank you in advance! --IFried (WMF) (talk) 21:48, 28 August 2020 (UTC)[reply]
Excuse in advance if I don't make myself clear, as English isn't my first language. I haven't personally experienced problems with downloading only partial books. I am referring to the resulting TOC that my ebook reader would get. Normally it takes the elements directly from the "index page" (.ws-summary div?). But sometimes the structure inside of that is lost. I am suggesting that if you specify the structure with "h tags", that would take precedence and the TOC can build from there. Take for example this book I proofread in 2017. I specified h2, h3, and h4 sections (referring to Book, Chapter and Section levels on the original book). But when I donwload the ePub, the resulting TOC only have the links I provided (linking to the Chapter [h3] levels), and there's no clean way that I know to make a TOC that respects the section level links.
The other option I can think, is to try to stablish a format directly from the Index, like Denis suggested, but I think that would be harder, as every project has its own index formatting templates. --Ignacio Rodríguez (talk) 02:32, 29 August 2020 (UTC)[reply]
@Ignacio Rodríguez: Thank you for your response! If I understand correctly, the problem in the example you provided is that only chapters are included in the TOC, but the book names (such as Libro Primero) and sub-titles for chapters (such as “Resumen de la…”) are not included in the TOC. Is that correct? --IFried (WMF) (talk) 14:42, 10 September 2020 (UTC)[reply]
@IFried (WMF): That's it --Ignacio Rodríguez (talk) 14:58, 10 September 2020 (UTC)[reply]
@Ignacio Rodríguez and IFried (WMF): We both share the same goal. The means do not matter as long as the result is achieved. As we cannot put aside the current method which supports all that is in inventory, my proposal is to add a notion of hierarchy to the ws-summary which would be independent of the division into pages. Currently we have to modify links in the main page and repeat them in subpages so that they appear second in the TOC. Here’s an example from the book I am working on to show the difficulty of producing a two-level TOC with ws-export. --Denis Gagne52 (talk) 23:58, 30 August 2020 (UTC)[reply]
@Denis Gagne52: Hello, and thank you for your response! We have tested the example you provided, and we were able to see the chapters properly displayed (and linking to the relevant content) for both PDF and EPUB. Are you still experiencing this issue -- and, if you are, can you provide some more details or a screenshot example? Sorry for the inconvenience; we just want to ensure that we are getting all the information we can about bugs, and we unfortunately cannot reproduce this one. For this reason, any further information would be appreciated. Thank you in advance! --IFried (WMF) (talk) 14:43, 10 September 2020 (UTC)[reply]
@IFried (WMF): This example was provided not for you to see the chapters properly displayed but to show the complexity to build a multi-level TOC. If it was user-friendly we would find many of these in Wikisource. --Denis Gagne52 (talk) 21:10, 10 September 2020 (UTC)[reply]
@Denis Gagne52: Thank you for clarifying the issue you were describing. From our understanding, you are talking about the complexity of building a multi-level Table of Contents. Ideally, this process should be easier for people to do. While this is outside the scope of WSExport work, and it sounds like a different wish, it is good for us to know about. Perhaps it can be approached as a new wish in a future wishlist or a volunteer developer can take it on. Thank you again! --IFried (WMF) (talk) 14:54, 15 October 2020 (UTC)[reply]

Anything else you would like to add?[edit]

Hi! I'm a little surprise because the status update of August start with some lessons that the tech team have learned : 1. Keeping in mind both contributors and visitors, and 2. Thinking about user experience improvement rather than technical improvements. These are 2 excellent points. But it seem in my opinion that none of the suggested improvements bellow follow theses two principles : The cache generator for e-books is a technicality that will just provide a very marginal improvement and is just about performances. The font change seem to be a detail for a non-tech guy like me, etc etc. As important as they may be, it seem to me that the impact of theses improvements will be minor on the user experience. They still remain pertinent to globally improve WSexport tho. Good job. Greetings. --M0tty (talk) 19:14, 21 August 2020 (UTC)[reply]

@M0tty: Hello! Apologies, I should have clarified that we'll be sharing our proposal to improve the user & reader experience in the next status update (most likely, in September). We are definitely planning to address general user experience issues, which are high priority for us. We're just still conducting research on that front, so we're not quite ready to share our findings yet. I'll update the August update to make that more clear. Also, thanks for your other comments. We agree that improving the reliability of the WSExport tool is very important, and we hope to make a meaningful difference through our work. Thanks again for your comment! --IFried (WMF) (talk) 19:54, 21 August 2020 (UTC)[reply]
Hi @IFried (WMF):! Thx for the clarification. Cheers! --M0tty (talk) 22:45, 21 August 2020 (UTC)[reply]

Portal site[edit]

Hey, I would like to use this chance to recommend the creation of a portal where readers can browse, read, and download all available ebooks in all languages. As far as I know, most metadata is now stored in Wikidata (including genre, and so on), so it should be pretty straightforward to build a simple browse&download site for readers with the option to go to Wikisource for editing the source. On a separate note, I appreciate that the Foundation spends some time on Wikisource, the mobile version doesn't work that well, and it seems that it is becoming a standard this days.--LibraryFighter (talk) 13:35, 5 September 2020 (UTC)[reply]

@LibraryFighter:Thank you so much for your feedback! You provided a very interesting and exciting vision for a future Wikisource experience. Unfortunately, this would be a whole different project (since it doesn’t directly deal with improving WSExport and the ebook export experience). Also, this project idea could be quite large, due to the fact that it would be for so many different communities. However, we encourage you to continue exploring this idea, and perhaps a team can explore it in the future (especially if it became smaller in scope). You may also consider proposing a new project inspired by this idea. Finally, we thank you for your kind words about this initiative that the team is taking on. We’ll also look into the mobile experience and see if we can improve it. Much appreciated! --IFried (WMF) (talk) 14:55, 15 October 2020 (UTC)[reply]

Request for feedback on UX mockups & general updates (November 2020)[edit]

Pinging everyone who previously commented on this page (and apologies if I missed anyone!).

@Balajijagadesh, Sushant savla, M0tty, Sannita, Dovi, Nahum, Nalou, Lorlam, MartinPoulter, Consulnico, Viticulum, Inductiveload, JAn Dudík, Nemo bis, EBookian, Cunegonde1, ElioPrrl, George2etexte, Denis Gagne52, LibraryFighter, Ignacio Rodríguez, and Ankry:

Hello, we are requesting your feedback on the November update. It is very important to us to read your feedback, so we can determine the best way to improve user experience of the ebook export process. Your feedback will also help us determine if we are making the right decisions around improving font support and overall reliability of the WS-Export tool. Please refer to the questions below, and thank you in advance!

Hello @IFried (WMF):. I did not received this notice. Maybe the ping did not work for others also ? I will read with attention this November update. It seems a lot of work has been done ! --Viticulum (talk) 17:52, 30 November 2020 (UTC)[reply]
@Viticulum: Thank you for updating me and letting me know the ping did not go through. I had a feeling something wasn't quite right. I'll reach out to folks on their user talk pages this week, so we can ensure that people are aware of the update and share their feedback. And, yes, we would absolutely love if you could read the update and share your feedback below. There's a lot we have worked on (and more coming up!), and we want to know what Wikisourcers think of everything we have shared. We're especially interested in collecting feedback on the mockups for a new download user experience. Thank you! --IFried (WMF) (talk) 19:43, 30 November 2020 (UTC)[reply]

What do you think of our recent font support work? Does the issue seem to be largely resolved of boxes appearing rather than text?[edit]

Even if this is not an issue on French Wikisource, I am very happy that more fonts are beeing supported, as this will increased interest for Wikisource of many other langages. --Viticulum (talk) 17:30, 1 December 2020 (UTC)[reply]
The font issues with Tamil is greatly resolved. As far as I have checked the boxes dont appear for characters. kudos -- Balajijagadesh (talk) 02:32, 19 December 2020 (UTC)[reply]
@Viticulum and Balajijagadesh: Thank you so much for the feedback and apologies for the delayed reply (I was on holiday break). We are so happy to read that font support is now improved for you and other Wikisourcers. If you see the issue of boxes appearing in the future, please do let us know, as we want to ensure that this issue doesn’t appear again. Thank you! --IFried (WMF) (talk) 20:13, 7 January 2021 (UTC)[reply]

What do you think of our recent and upcoming reliability work? Do you have any thoughts, concerns, or suggestions to share?[edit]

Remarkable speed increased as per my tests. Congratulations. --Viticulum (talk) 17:29, 1 December 2020 (UTC)[reply]
Seems to be faster and more reliable. Thanks! — Alan Talk 13:43, 4 December 2020 (UTC)
The speed and reliability is awesome. No complaints so far. -- Balajijagadesh (talk) 02:33, 19 December 2020 (UTC)[reply]
Everyone is very happy with the downloading speed and the French Wikisource contributors would like to congratulate the technical team for this great success. It is beyond our expectations. --Viticulum (talk) 17:03, 21 December 2020 (UTC)[reply]
@Viticulum, Nalou, and Balajijagadesh: Everyone, thank you so much for sharing this feedback! We are thrilled to read that you are all seeing significant improvements in the speed and reliability of WS-Export. Meanwhile, we still have more work we are doing to improve the tool, so we feel very excited about the next steps of the project. We will provide another update soon on our current work and thank you again for your feedback! --IFried (WMF) (talk) 20:14, 7 January 2021 (UTC)[reply]

What do you think of our proposed improvement to the download user experience overall? Do you like the general idea and user flow (as displayed in the mockups)?[edit]

To me, it is a very good idea. It would be way more visible to find and easier for random visitors to see they can download it. The idea of the automatic download is interesting. I only download multiple format when I check if export works well. No preferences for me between the two different options. For the automatic option, why not allowing people to select a format in their preferences? — Alan Talk 13:43, 4 December 2020 (UTC)
The proposed improvement to the download user experience is very nice. looking forward for it -- Balajijagadesh (talk) 02:34, 19 December 2020 (UTC)[reply]
Comments from the French Wikisource: The general feeling is : keep it simple.
Multiple format download at the same time: there seems to be no interest for such a feature, as different format are normally downloaded on different devices.
It will be difficult to change habits of contributors: a single Download button versus the 3 icons (French Wikisource). My personal choice is the single Download button with explanation for visitors of the different formats. The visitors are the one that need more information and it is a good idea to think from their point of view.
On the graphic, I would not recommend step 3.2 and 5. Step 6 seems interesting. --Viticulum (talk) 17:11, 21 December 2020 (UTC)[reply]
@Nalou, Balajijagadesh, and Viticulum: Thank you, everyone, for your feedback! This is really helpful. We are now able to have a better understanding of some of the main concerns and ideas around the mockups. To summarize what we read: It seems that people are generally in approval of the idea of improving the download user experience, since it will be easier and more inviting. From what we have read, automatic download may not be necessary, according to feedback shared above. We’ll look into this more as a team. Furthermore, the status indicators may be useful, especially the ones that tell users if there is an error in the download. Above all, people want things to be easy and simple. Thank you again for this feedback! --IFried (WMF) (talk) 20:16, 7 January 2021 (UTC)[reply]

Do you usually download the same file format (e.g., PDF, MOBI, EPUB, etc) every time you download a book, or do you often pick a different format?[edit]

As a contributor, I mainly test pdf, but when I am satisfied by the result, I try to test other format : ePub and Mobi. --Viticulum (talk) 17:31, 1 December 2020 (UTC)[reply]
I always download EPUBs because I have an Android-based reader and I want a reflowable document for a small reader. However, if a had a Kindle, I'd always download MOBIs. Inductiveload (talk) 19:27, 2 December 2020 (UTC)[reply]
I always export in "pdf" format. I would like to keep a simple export process (with just the "pdf export" button), and not many parameters to selected every time I want to export a file... Thanks for your work. --Lorlam (talk) 09:44, 3 December 2020 (UTC)[reply]
In Tamil language people select different format according to their choice of device and purpose etc. So while downloading different options is good and helpful -- Balajijagadesh (talk) 02:35, 19 December 2020 (UTC)[reply]
From the French Wikisource: Auto-download: there does not seem to be much interest for such an option. --Viticulum (talk) 17:13, 21 December 2020 (UTC)[reply]
@Viticulum, Inductiveload, Lorlam, and Balajijagadesh: Everyone, thank you so much for this feedback! We heard from some people that they download the same format every time, which could be PDF, EPUB, or MOBI, depending on the user. Meanwhile, other people did not download the same format every time, so it would be important for them to have all options available. In other words, there is no strong consensus for any one approach. This means we will try to think of a solution that is more flexible for different use cases. Overall, this was very helpful. Thank you very much! --IFried (WMF) (talk) 21:05, 15 January 2021 (UTC)[reply]

Is there anything else you would like to share?[edit]

Hi. I have still the problem, when exporting in "pdf" format, for several books (in french Wikisource) : first page image not visible in the export file. And I have an error message when opening the file with Acrobat Reader : "Données insuffisantes pour une image" (insufficient data for an image). For example, I have the problem for the following book : https://fr.wikisource.org/wiki/Les_Inconsolables. I have already reported this problem last year without success :-( --Lorlam (talk) 09:41, 3 December 2020 (UTC)[reply]
Can't wait for the maths export :) Maybe you would need some special fonts to handle math export properly. You can find nice one in LaTeX. TeX Gyre Pagella, TeX Gyre Schola or TeX Gyre Termes offer a nice text font with a support for maths formula. That would give some homogeneity to the text. thanks for those great progress! — Alan Talk 13:43, 4 December 2020 (UTC)
@Nalou: Thanks for this input! Can you share a bit more information on this suggestion, and why you think it may work well? Any extra information would be very helpful as we analyze this request. Thank you! --IFried (WMF) (talk) 21:11, 15 January 2021 (UTC)[reply]
The first line of every paragraph or line except the poem is indented. Also the text is justified by default. Is it possible to do something about it?--Balajijagadesh (talk) 02:37, 19 December 2020 (UTC)[reply]
@Balajijagadesh: Can you provide some specific examples and let us know what you would like to see instead? Thank you in advance! --IFried (WMF) (talk) 21:13, 15 January 2021 (UTC)[reply]
From the French Wikisource: Issue "Investigate PDF cover page bug" Phabricator:T254937 : This is an important issue for many contributors, as it gives a bad impression of Wikisource. It may also trigger visitors to try the export many times, not being certain the export worked well. --Viticulum (talk) 17:15, 21 December 2020 (UTC)[reply]
@Lorlam and Viticulum: Thank you for providing this information about the cover page bug for French Wikisource! I have shared the relevant Phabricator ticket with the Community Tech engineers. When we began to discuss it, we realized we needed more information to be able to analyze it properly. Can you provide specific examples of what you expect to see in the download vs. what you see? Any extra details on what the French Wikisource community expects to see in the cover page would be very helpful, since we're not 100% sure what should be fixed. Thank you in advance! --IFried (WMF) (talk) 21:09, 15 January 2021 (UTC)[reply]
@IFried (WMF): For example on the page https://fr.wikisource.org/wiki/Une_Visite_%C3%A0_Bedlam : if you export using the upper page "Télécharger en pdf" button, you obtain a pdf file with the first page empty while it should be the image of book cover page. cu --Lorlam (talk) 23:45, 15 January 2021 (UTC)[reply]
@Lorlam: Thank you! This information was very helpful, and I see what you are talking about. I have updated the ticket (Phabricator:T254937), and will discuss this issue with the team. --IFried (WMF) (talk) 02:45, 19 January 2021 (UTC)[reply]
@IFried (WMF): I have sent you by e-mail 2 examples of books that did not have the error message in the past (2019), and now have the message. Hope this is useful. Thanks. --Viticulum (talk) 17:48, 21 January 2021 (UTC)[reply]

Project updates: Request for feedback (March 8, 2021)[edit]

@Balajijagadesh, Sushant savla, M0tty, Sannita, Dovi, Nahum, Nalou, Lorlam, MartinPoulter, Consulnico, Viticulum, Inductiveload, JAn Dudík, Nemo bis, EBookian, Cunegonde1, ElioPrrl, George2etexte, and Denis Gagne52: --IFried (WMF) (talk) 23:40, 8 March 2021 (UTC)[reply]

We have shared another project update, and we would love to read your feedback below. Thank you in advance!

What do you think of the new "Download" pop-up? Do you find it easy to use?[edit]

  • I like it. Availability on Mobile, ability to add buttons on a page (especially in our New Texts list) and access to the full export form for more options is probably the biggest things missing for me. All are tracked, though. Inductiveload (talk) 09:03, 9 March 2021 (UTC)[reply]
Hello, Inductiveload. Thank you for your response! We are very happy to hear that you like the “download” button. We have created a ticket to improve the download experience for mobile device users (T276976), which we will discuss as a team. As for the question of adding the download button for featured texts or new texts on the main page, we have created a ticket for this (T277187), which we will discuss as a team. Also, we have created a ticket for “other formats” link in the pop-up (T274999), which is already on our work board and will be taken up by engineers soon. Finally, can you clarify what you mean by “all are tracked.” We’re not totally clear. Thank you! --IFried (WMF) (talk) 17:40, 12 March 2021 (UTC)[reply]
@IFried (WMF): I just meant that I know these tasks are already known and have tickets in the system (I made two of those linked!), as opposed to coming up with new tasks. The "add buttons" task might also be T275003: there's no specific need to limit it to the main page. Just a way to say "download button for text at page XXX, positioned right here" will do. Inductiveload (talk) 15:57, 15 March 2021 (UTC)[reply]
@Inductiveload: Thank you for clarifying that you mean there are tickets for these issues in Phabricator. Much appreciated! --IFried (WMF) (talk) 17:38, 19 March 2021 (UTC)[reply]

What do you think of our additional work to improve reliability? Are you still seeing improvements in the speed and performance of WS Export?[edit]

@Inductiveload: Thank you for this feedback! Yes, we understand that one side effect of implementing the cache work is that now, when you download a book that was just recently downloaded, you may not see new tweaks or changes. However, one work-around is to check the box for “Bypass all caching” on the Wikisource Export page. Has that been helpful for you? --IFried (WMF) (talk) 17:42, 12 March 2021 (UTC)[reply]
@IFried (WMF): to be honest, not really, as I often find that even with "nocache" (or visiting the form and checking the box), things sometimes don't update for quite some time. I haven't measured it yet—normally I wander off and it resolves in the background. I'm not sure if it's an export thing, or a more general replag thing. Inductiveload (talk) 16:02, 15 March 2021 (UTC)[reply]
@Inductiveload: Thank you for clarifying and providing more context! We understand how it can be frustrating for you. Since we are closing out this project soon, we don’t know if we will be able to get to this issue. However, we know that it is documented in T276418, and we see that you have shared a hack in the ticket that you are using in the meantime. Overall, this may be something that our team, another team, or a volunteer developer can look into in the future, and we’ll definitely keep the ticket open so further progress on it can hopefully be made. Thank you again for reporting this! --IFried (WMF) (talk) 17:39, 19 March 2021 (UTC)[reply]

What do you think of our remaining work to disable credits (as an option on the Wikisource Export page)?[edit]

@Inductiveload: Thank you for this feedback! It’s great to know that you would be fine with us creating the option to remove the credits from the download. --IFried (WMF) (talk) 17:43, 12 March 2021 (UTC)[reply]

What do you think of our remaining work to improve messaging and support when there are download errors?[edit]

Is there anything else you would like to add?[edit]

Although there has been some discussion of embedding fonts, I don't see any attention paid to individuals with different abilities. Shouldn't we also consider the needs of this community when discussing fonts or the generation of ebooks? What do we need to make ebooks disability friendly? Languageseeker (talk) 04:02, 9 March 2021 (UTC)[reply]

Epub accessibility is often something that the Wikisource needs to add at source. For example, image alt texts, sane semantic markup in templates, etc. Font choice is actually a minor thing, because the thing about EPUBs is that the user agent (e.g. an e-reader) can, and often does, apply its own font, and certainly can change the font size. It's possible the exporter can do good things with ARIA roles, but I haven't gotten that far yet.
That said, a "nice" accessibility feature might be to streamline a way to download PDFs (and EPUBs, but that's less critical) with something like OpenDyslexic. Making dsylexic users dig through a list of nearly 200 font names each time is probably poor UX. Inductiveload (talk) 09:14, 9 March 2021 (UTC)[reply]
Partially agreed, if we are to support accessibility at the source then the ebook export tool should warn about accessibility issues and the markup should allow for alternative renderings. For example, does an image have a description besides the caption. Is there contrast issues? Are there static font sizes? Etc..Languageseeker (talk) 18:58, 10 March 2021 (UTC)[reply]
I think that we should automatically convert tool tips to footnotes for ePub and pdf export. Tooltips do not work on either and footnotes function more like tooltips on ebook readers. Languageseeker (talk) 18:58, 10 March 2021 (UTC)[reply]

Final project update: March 31, 2021[edit]

@Balajijagadesh, Sushant savla, M0tty, Sannita, Dovi, Nahum, Nalou, Lorlam, MartinPoulter, Consulnico, Viticulum, Inductiveload, JAn Dudík, Nemo bis, EBookian, Cunegonde1, ElioPrrl, George2etexte, and Denis Gagne52: --IFried (WMF) (talk) 23:40, 8 March 2021 (UTC)[reply]

We have shared our final update for this project! Thank you to everyone who participated in this project as a voter, collaborator, advocate, or thought partner! We really valued all of your help, and we learned so much! Please check out the update for a summary of our work and its impact. Please feel free to share your feedback below, and we sincerely hope to see you as collaborators in our next Wikisource project: OCR Improvements. Again, thank you so much everyone! --IFried (WMF) (talk) 19:50, 31 March 2021 (UTC)[reply]

@IFried (WMF): thanks for the update, and thanks for all the work in cleaning this tool up. The uptick in exports is very encouraging! What is the status of the Export tool going forwards? Will unfulfilled tasks at Phabricator be worked on by Comm Tech, or are those things that the general community will need to work on if they want them? For example, phab:T275003 will be handy for putting export buttons on the front page that can invoke the shiny new dialog. Inductiveload (talk) 15:01, 1 April 2021 (UTC)[reply]
@Inductiveload: Thank you so much for your comment! We are very happy to read that you find the uptick in exports to be encouraging. We also find the data to be very exciting, and we think it says a lot about the impact that can be made when technical support is given to the Wikisource communities. In response to your question: We will need to now focus our energy on the next Wikisource project (OCR Improvements). For this reason, we won’t have the capacity or resources to work on further WS Export tasks in the immediate future. However, if you would like us to do further work on the tool, we strongly encourage you to propose a new wish on WS Export in the 2022 Community Wishlist Survey, which will be later this year. In the meantime, there may also be possibilities for volunteer developers to do work on the tool. Thank you again for your help and collaboration over the course of the project, and we hope to see you sharing feedback on the OCR Improvements talk page! --IFried (WMF) (talk) 20:47, 8 April 2021 (UTC)[reply]
@IFried (WMF): First, many many thanks for the development of wsexport! Many ws users (and ebook readers) appreciate the improvements.
However, I believe that work on the "investigate how to prevent incomplete book downloads" should be given high priority, because many works are not generated with their required content (subpages). It should not be accepted that correctly created pages, which were exported with subpages before the "Ebook Export Improvement" project, are not handled correctly by the new ws-export. Backward compatibility should be ensured. There may be many of such books in pl ws (see: phab:T275870#6974070). Zdzislaw (talk) 22:20, 5 April 2021 (UTC)[reply]
@Zdzislaw: Thank you so much for your feedback and kind words about the project! We are very happy to read that many Wikisourcers appreciate our work. In response to your comment: We are already working on fixing phab:T275870#6974070, and a fix should be deployed to production soon. As for the more general topic of incomplete exports and subpage issues, this is a large topic that could be a new wish. We encourage you to submit a wish in the 2022 Community Wishlist Survey, which will open up later this year. In the meantime, it is possible that a volunteer developer can also issue further improvements to WS Export. In the immediate future, the Community Tech team will be focusing our energy on the next Wikisource project: OCR Improvements. We would love your feedback on it, and we invite you to share your feedback on the talk page. Thank you! --IFried (WMF) (talk) 20:50, 8 April 2021 (UTC)[reply]

Congratulations & Incorporation of the button in the mobile version[edit]

First of all, I want to congratulate the whole development team for this great tool. It is so cool and it finally makes some justice to the much needed renovation of the sister projects. I take the advantage to ask you about a problem which I guess it might be reasonably easy to solve. Could you kindly add the blue Download button and its display interface in the top or in the top-right the mobile version? It would be nice that people can easily find, use and download the tool to their smartphones and tablets, given the current context in which many readers use nowadays these big-screen devices -rather than the computer version. Thank you so much again! @IFried (WMF) Xavi Dengra (MESSAGES) 07:33, 4 May 2021 (UTC)[reply]