Community Tech/Ebook Export Improvement
The Ebook Export Improvement project aims to improve the experience of exporting books from Wikisource. Under the current system, users struggle with a variety of issues, including reliability, formatting, styles, and user experience. These issues add complexity and frustration to the Wikisource process, and they discourage some users from deeply engaging with Wikisource. This project was the #1 request from the 2020 Community Wishlist Survey.
Overall, Wikisource ebook exports have tremendous potential, but they must be improved in order to serve a wider audience. In the course of this project, we’ll aim to investigate and identify the key issues, collaborate with various Wikisource communities, and implement solutions that further sustain and improve ebook exports. We look forward to community feedback on the Talk page.
Why Export Ebooks
Generally speaking, ebook exports are a core part of the Wikisource experience, and users export ebooks for a variety of reasons. First, they may export ebooks to avoid issues with internet accessibility. With offline books, users can easily read materials on a variety of devices, no matter if their internet connection is slow, intermittent, or unreliable.
Second, users may want to read the book on a device that is optimized for ebooks. For example, devices such as the Kindle or Kobo allow users customize the interface, add notes, look up words, and have a long battery life. Consequently, many users choose to export an ebook into a compatible file format, which can then be transferred to an eReader device.
Third, users may want to share an ebook with someone in an easily accessible format. For this reason, they may choose to export the ebook in a preferred format, which they can share with the recipient. This process is generally more flexible than sharing a web link to Wikisource.
Fourth, the user may work in a professional setting, such as an educational institution, archive, or museum, and they want to store an offline copy for general reference or educational purposes. The export of an ebook allows them to integrate the content into their own files and workflow, rather than having to consistently access Wikisource.
How Ebook Exports Work
In Wikisource, there are four primary methods to download an ebook: directly via WSExport, via the left side panel, via links on the main page (such as a featured book), or via the links above texts. There is also a fifth method (create a book), which is technically available, but very uncommonly used. We will discuss all of the methods below. However, it is important to understand that not all wikis have the same download options. Some have only one download option, while others have many options.
First, WSExport is the primary tool for exporting ebooks on Wikisource. This tool, originally developed by user Tpt for French Wikisource, permits downloads in a variety of formats, such as EPUB and PDF. To access WSExport, one can navigate to the left side-panel and click on “Choose format,” under the “Download/print” section. Alternatively, users can directly visit the URL: https://wsexport.wmflabs.org/tool/book.php
When using WSExport, the user must specify certain things to generate an ebook export. These include the language code, the title of the page, fonts included (if any), and whether images should be included. The user must manually type in the language code and page name, but they can select the file format and fonts via drop-down. Once the user clicks “Export,” the tool will provide a downloadable version of the page in the user’s specified format. Further documentation on the tool can be found on Wikisource:WSexport.
It should be noted that the “Include fonts” section has different use cases, depending on the language selection. For Latin-scripted languages, the user is generally selecting a font, such as “Free Serif.” However, for Indic languages, the user must often specify an actual script, which is required to properly export the ebook.
#2: Export via the left side panel
Second, users can access ebook exports via the left side panel (“Download/print”). With this method, the user can see all file formats available for download. For example, in the screenshot below, the user has the option to download a PDF, EPUB, or MOBI file of The Jungle.
When using the side panel, the default option available is typically only single-part PDFs, which are downloaded via ElectronPDF. If you want to download a multi-part book in the side panel, you need to enable WSExport in Preferences > Gadgets > Interface > “Add a print/export link to download pages as EPUB files using the WSExport tool.” This will add the EPUB and MOBI option to the navigation menu, so that you can directly download multi-part books. In the example below, you can see that the user can download the entire contents, since they have enabled multi-part book downloads.
Third, users can download books via links on the main page. For example, in the screenshot below, you will see that “April’s Featured Article” has a section called “Grab a download!” One can choose from four main file formats, which are specified if the user hovers over each icon. This user can click an icon, which automatically triggers a download that uses the WSExport tool.
Fourth, users can sometimes access download links at the top of text. For example, in the screenshot below, download options are presented for a text in Bengali Wikisource. Like the side panel, the user sees the file formats that are available for download, and the downloads are conducted with the WSExport tool.
#5: “Create a book”
Fifth, you can create a book as an ODT and ZIM file. To do this, go to the “Create a book” link in the side panel, which will redirect you to Special:Book, also known as “Manage your book.” On this page, you can manually specify each page that you want in your book, which you will need to do repeatedly until you have specified all the pages. It should be noted that this process was designed for Wikipedia users who want to create source texts. It is not convenient for Wikisource users, so it is rarely used for Wikisource-related purposes.
The Primary Issues with Current Ebook Exports
There are many issues with the current ebook export process, which we will divide into three categories: reliability, formatting and styles, and user experience.
The WSExport tool is not consistently reliable. Users report frequent downtime and timeout issues, which prevent them from exporting books. Some of these issues have been documented in Phabricator tickets, such as T250614 and T219330#5060262. In 2019, this issue became the #4 wish in the 2019 Community Wishlist Survey. As a result, the Community Tech team launched the Ebook Export Reliability project, which aimed to improve the export experience. By the completion of the project, WSExport had 99.42% average monthly update (recorded on June 20, 2019), and downtime went from 941 minutes total (between May 1-15, 2019) to 179 minutes (between June 1-15, 2019). Further data on Wikisource reliability after the team’s changes can be found in T226136.
Despite these improvements, issues persisted for the Wikisource community. Users continued to experience issues, such as those detailed Problems detected in epub generated with Wsexport, compiled by Viticulum on French Wikisource. For example, on September 30, 2019, WSExport was inoperable for about 12 hours. In total, 13 outages were reported, with many of them lasting for hours. Furthermore, recent tests conducted by the Community Tech team have identified intermittent problems when trying to download MOBI files. The timeout message currently only appears when on the WSExport page (not via download links), as well. In total, this situation can be very frustrating for Wikisource users, and they often don’t know how to respond to such issues.
Formatting & Styles
Ebook exports often have formatting and style issues. These issues vary, but they may include: missing or altered text, duplicated text, poor pagination, missing table titles, incorrect capitalization, incorrect border styles, incorrect content alignment, incorrect table alignment, and incorrect table styles. In some cases, the words themselves are altered. These errors can be confusing and concerning to users. They also go against the Wikisource policy of mirroring the source text. Below, we have provided some examples of the issues we’re seeing. This isn’t an exhaustive list, but it can give some idea of common errors.
- Example #1: Page split between 2 pages
In the screenshot below, you will see that the content is divided into two pages. However, in the original version in English Wikisource, it was displayed on one page.
- Example #2: Fonts not rendered
In the screenshots below, you will see that the files are not properly exported from Tamil Wikisource (bottom left) or Kannada Wikisource (bottom right). Rather, the text displays as rectangles. This is due to the fact that Kannada is not included in the “Include fonts” section, which creates various issues, such as the one below. Meanwhile, Tamil is included in “Include Fonts,” but there are still issues.
- Example #3: Consonant conjuncts incorrectly rendered
In the example below, you will see that the text is incorrectly displayed. In the original version in Bengali Wikisource, the user will see “প্রথম.” However, in the ebook export, the word is changed to “পরথম.” This particular error shows conjunct consonants, which is when two consonants are usually clustered together in a word. In these examples, the consonants are separated due to issues with font rendering. This issue occurs for users in many Indic languages.
- Example #4: Incorrect text wrap
In the example below, you will see that the text is wrapped around the image. However, in the original version on Bengali Wikisource, the text is displayed below the image with no text-wrap.
- Example 5: Content alignment altered
In this example below, you will see that content is aligned to the left in the PDF. However, in the original text from Armenian Wikisource, the content is centered on the page.
Accessibility & User Experience
The ebook export process is not very inviting to newcomers. There are many quirks and exceptions that one must learn. The WSExport tool is not easily discoverable, and it doesn't provide an intuitive user experience. For example, it doesn’t include all scripts in the “Include fonts” section (such as Bengali). Even if a language script is included in “Include fonts,” the export may still experience language errors. Meanwhile, in the sidebar, it’s confusing to determine how to download multi-part ebooks for new users, among other issues. If we want Wikisource to expand, we need the experience to be intuitive and accessible newcomers. For this reason, the UX considerations involved in the ebook export process may be investigated for improvement as well.
- Have we covered the main reasons why people export ebooks?
- Have we covered the main methods to export ebooks?
- Have we covered the main problems experienced when exporting ebooks?
- Which formatting and style issues are the most common and frustrating, in your opinion?
- Which user experience issues are the most common and frustrating, in your opinion?
- Which problems, overall, do you find the most critical to fix, and why?
- Anything else you would like to add?
We look forward to reading your feedback on the Talk page! Thank you!