Static content group

From Meta, a Wikimedia project coordination wiki

This is a group and set of projects dedicated to gathering and sharing static content from Wikimedia projects, including CDs and DVDs, single-file databases for use with specific browsers and readers, and pdf and html exports.

Background[edit]

The first Wikipedia CD was released in 2004, and the concept of WikiReaders for offline printing and reading was refined.  In 2006, the Special Projects Committee authorized the creation of a subcommittee dedicated to static snapshots of Wikimedia content, to identify the groups working on such projects, and to help them work effectively together and share their results.  This page is derived from those efforts and the work of all who have pursued similar goals.

Related objectives

  • content quality/vetting for "Wikipedia 1.0"
  • content on paper
  • topical subsets
  • selections for different audiences

Goals[edit]

  1. Maintain, offer and assist interested parties in getting
    • static content of Wikimedia projects in a variety of formats
    • metadata about its completeness, complexity/age level, and quality.
  2. Coordinate efforts to produce and distribute snapshots, and software for viewing snapshots.
  3. Stimulate research about content and making content accessible in an offline or semi-online environment.

Status[edit]

Older status[edit]

As of mid-2006, the Wikimedia Foundation served to the world outdated version of static content of Wikipedias (November 2005 snapshot - [1]) and offered the same [contenthttps://www.fiverr.com/s/qv0YbZ content] for download ([2]). There seeem to be persistent small problems with static content installation (categories,search, perhaps other).

There were plans to set up an up-to-date server with current static content of Wikipedias.

MediaWiki 1.5 included routines to dump a wiki to HTML, rendering the HTML with the same parser used on a live wiki.

There have been several separate attempts at producing software to convert SQL dumps into data formats that are suitable. Directmedia Publishing GmbH in Germany had by then a successful history of distributing Wikipedia content on DVD for Windows and MacOSX (Linux is beta).

Recent status[edit]

Needs updating since 2007!

See List of Offline Projects for updated list of offline projects in the community.

Food for thought[edit]

German content[edit]

Other languages[edit]

Other project languages : Chinese? Dutch? Italian? Russian?

Other languages as of 2013[edit]

The following languages are working on creating offline files for some subset of the Wikimedia projects.

Language Project type Project name Project status
English General collection Wikipedia 1.0 0.8 completed; waiting for 1.0 update 2011
English Schools collection Wikipedia for Schools 2008-09 version completed; waiting for the updated 2011
English Schools collection Offline Wikipedia for Indian Schools under progress
Spanish General collection Wikipedia 1.0 Est. August release date; for articles see ToolServer
Portuguese Schools Collection Ongoing! Project Page; List of articles
French General collection Wikipédia:1.0
German General collection
Polish General collection
Hebrew Total collection
Tamil General collection
Malayalam (Wikipedia) General collection of selected articles Wikipedia 1.0 Released on 2010 April 17
Malayalam (Wikisource) General collection of selected books Wikisource 1.0 Released on 2011 June 11
Malayalam (Wikisource) General collection of selected books Wikisource 2.0 Released on 2013 Coctober 14

In addition to these aggregate selections, there is also need and room for the selection of niche collections: e.g., the creation of articles regarding to Chemistry, India, etc. Some of these are accessible by the independent creation of books via the PediaPress book extension tool on Wikipedia. These creations are independent projects and can be found at Wikipedia:Books.

Other projects[edit]

Since before 2004:


Formats and readers[edit]

Distribution formats: plain (x)HTML, PDF, TomeRaider, Plucker, Webaroo pack, proprietary formats

e-Reader projects:

  • Directmedia (free, if not open source, on Linux and Mac platforms - Linux version (digibux) is GPL)
  • KDE
  • Browser-based (generic reader platform; javascript optional)
  • Other (See below)

Kiwix[edit]

Kiwix on iPhone (iOS)
Kiwix Flyer - Your Wikipedia Offline

Kiwix brings internet contents to people without internet access. It is free as in beer and as in speech.

As an offline reader, it is especially thought to make Wikipedia available offline, but technically any kind of web content can be stored into a ZIM file (a highly compressed open format) and then read by the app: there are currently several hundred different contents available in more than 100 languages, from Wikipedia, Wikiquote, the Wiktionary to TED conferences, Gutenberg library, Stackexchange and many others.

Why offline matters[edit]

We're featuring a quote here from the UN Broadband Commission from their September 2013 report, because it's the easiest, most pragmatic and straight-forward way to show you the importance of disseminating knowledge - and information - offline, complementary to all activities that we do online: "While more and more people are coming online, over 90% of people in the world’s 49 Least Developed Countries remain totally unconnected.”[1]

Projects that involve Wikipedia Offline[edit]

Kiwix is mostly installed in schools that cannot afford broadband internet access. In these cases, it's so much faster to use Wikipedia offline.

Wikimed[edit]

Wikimed is a free android app that collates all medicine-related articles from Wikipedia and makes them available offline. It is currently available in English, French, Arabic, Farsi, German, Spanish, Chinese, Odia, and Portuguese. You can download it here.

Wikipedia offline in jails[edit]

Main article: Wikipedia in jails

Since March 2013, prisoners in the prison Bellevue in Gorgier (western Switzerland) who request it can have access to Wikipedia offline, because Swiss prisoners have very restricted access to the Internet. The idea is to stimulate or to support the interest for education of prisoners who were, for a large majority, condemned to long-time sentences. After a three month pilot phase, the project was proven very successful. Among the 36 prisoners of the Bellevue’s prison in Gorgier, 18 possess or rent a computer. All of them requested the upload of Wikipedia offline on their PC.

The feedback is unanimously positive: it reveals that access to Wikipedia is seen as an improvement of education and/or information activities in jail.

The followup of the project aims to use Wikipedia in the training program of the prisoners. The use of Wikipedia in the classes, the organization of general culture contests, and even the training of new Wikipedia editors. The partnership between Wikimedia CH and the direction of the prison aims to be durable. Wikimedia CH installed the Kiwix files and trained the IT team of the prison, who can now upload the software for every new prisoner who requests. Detention Centers for minors are excluded from this program in Switzerland as they get access to the Internet and don't have the need to access Wikipedia offline.

In 2014, WMCH started to collaborate with the Swiss Insitute for Education in Detention Centers to expand the coverage of Wikipedia offline in prisons all over Switzerland. As of May 2014, all prisons in the German-speaking part of Switzerland have access to Wikipedia offline, thanks to the Swiss Institute for Education in Detention Centers.

Canada, Germany, the US, France, Belgium and Italy (jail in Pavia, where a Kiwix server runs in a dedicated computer room, led by http://www.informaticisenzafrontiere.org) also have similar projects in prisons that involve Wikipedia offline.

Wikipedia for Schools[edit]

"At SOS Children, we wanted to bring this fantastic resource to children without internet access around the globe. So we began work on an ambitious project to get the very best content from Wikipedia into a self-contained selection which could be distributed on a CD. We checked every article for child friendliness and structured the content around the national curriculum. Today, Wikipedia for Schools is in its fourth incarnation, and the new version is ready to go - this time on USB. At EduWiki 2013, we will show you how the project has benefited students and teachers here in the UK, and in countries across the developing world. With the help of others, we have distributed copies globally, and we have had an amazing response from the people who count. In the UK, Wikipedia for Schools has been a great classroom companion for students and teachers alike.” [2]

Mesh Sayada[edit]

Mesh Sayada[3] is a collaboratively designed and built wireless network. The town of Sayada is located in Tunisia. The network serves as a platform for locally-hosted content, such as Wikipedia Offline in Arabic and French thanks to Kiwix software, free ebooks and Open Street Maps. The Mesh is serviced and maintained by a local NGO, CLibre[4] with the help of local volunteers.

User Feedback[edit]

  • "Very important and helpful source of information" (User from Bahrain)
  • "Thank you for your help! Now my school can use Wikipedia offline."' (User from Mexico)
  • "I like to browse my favourite encyclopedia even when there is no network" (User from Yemen)
  • "I have no internet in my house. Kiwix is such a help, because I need Wikipedia for my study."' (User from Cuba)

Features & Tech specifications[edit]

  • Open-source: all code is stored on GitHub;
  • Available on Windows, GNU/Linux, macOS, iOS and Android;
  • Works like your regular browser;
  • Allows searching of articles and within articles;
  • Web Server: you can share content on your LAN.

Get involved[edit]

There are many ways to participate and to work with us in order to develop the Kiwix - Wikipedia offline project. The following list features many topics where help would really be appreciated:

  • Translate: the Kiwix user interface is translated into more than 100 languages. We still have some more work to do here: see translatewiki:Translating:Kiwix to help.
  • Share: Kiwix has a broad user community - we need to care for it and share news. See below to become an ambassador;
  • Deploy: if you want to deploy Kiwix anywhere, let us know! You can also file for a grant request with the Wikimedia Foundation.
  • Develop: if you are a coder, feel free to join us on GitHub - openZim for scrapers, and Kiwix for the app itself. Look for tickets labelled "Good First Issue" (to get started) or "Help wanted" (for real challenges). See https://www.kiwix.org/en/support-us/code/

Become an ambassador[edit]

As an ambassador, you are going to spread the word about Kiwix in different ways:

  • Mention Kiwix when talking about Wikimedia: e.g., add a slide about Kiwix when making a public presentation about Wikipedia. If you are interested in giving a talk at a meet-up or conference or organizing a Kiwix event, the other Kiwix guy can provide you with slides, flyers and other material you might find useful. Don't forget to add the Category:Kiwix presentations at your uploaded file on Commons.
  • Represent Kiwix in conferences and workshops.
  • Answer questions about the project within your community.
  • Use social media to get in touch with users.
  • Ask your language community to add a link in the sidebar of the wiki inviting readers to download the content throw Kiwix. (See example).

Official country and language ambassadors[edit]

This user is a Kiwix ambassador.

The Ambassador program lists people who are familiar with both Kiwix and a language-related project: they will try to assist with questions and requests for presentations. An ambassador is a trusted volunteer that has been vetted by existing ambassadors and/or their local chapter. Contact them if you need help!

Countries and languages


Are you a Kiwix user? Do you want to help? Contact us!

Get in touch[edit]

References[edit]

  1. http://www.broadbandcommission.org/Documents/bb-annualreport2013.pdf Annual UN Broadband Commission Report 2013
  2. https://wiki.wikimedia.org.uk/wiki/EduWiki_Conference_2013/Abstracts#Workshops by Jamie Goodland, who works with the International children’s charity SOS Children
  3. Case Study: Mesh Sayada by Ryan Gerety, Andy Gunn and Will Hawkins Open Technology Institute
  4. Association pour la culture numérique Libre

Useful (external) links[edit]

Software

Database dumps

Participants[edit]

Initial members of the subcommittee:

Other interested people:

  • Eric Astor (working on OEPC)
  • Erik Garrison (working on related statistics)
  • Eyu100 (asked the get involved)

Guidelines and coordination[edit]

Style guidelines for each project should be written down, for the benefit of projects to come after them. Coordination across projects of aspects such as script writing can also be quite helpful -- in catching mistakes and corner cases, and in avoiding repeated effort. Some specific ideas follow.

Registering a new snapshot[edit]

Needed : a process for announcing and registering a snapshot for others to see in progress, contribute to, or download and use. Start with finished projects to date in German, English, and Polish.

Current snapshots:

  • En:wp: SOS CD project; Andrew wrote some scripts for this. 2006 articles on a CD.
    • Initially distributed to benefit a children's charity in early 2006
    • Wikiwizzy's version of the above for distribution in S.Africa
  • De:wp: Directmedia CD, then DVD.
  • Pl:wp: ?? DVD, planned for completion in October (slightly new deadline).

Questions[edit]

  • How should snapshots recognize authors? What's the best way to attribute WP as a project as well as individuals, not simply to satisfy the GFDL?
  • How can snapshots share algorithms? Part of snapshot design is choosing content.
  • How can snapshots get updated? Scripting the creation so they don't get stale; minimizing editorial time needed.
  • Style : different ways to handle
    • templates (navigational, other)
    • foul language
    • images (size, content)
    • text (when too long, for balance)
    • citations
    • redlinks
    • interlanguage, interwiki links
  • Enhancing content : how to handle
    • delicate subjects (warning templates)
    • conflicted subjects (pov templates)
    • fast-changing subjects (news / current event templates)
    • (note: all this can update the dynamic database directly)

Listing related scripts[edit]

Main article: Static version tools
  • Interactivity and interfaces : Front-ends to read and interact with different snapshot formats.
  • Reducing text : summarizing, auto-excerpting
  • Ranking text : bot-assisted reviewing/vetting/rating, metric analysis (apsp, grank, hit-popularity, edit-popularity, expertise, writing style, &c)
  • Metadata : bot-assisted annotation (audience, type, categorization)
  • Spellchecker, grammar checker
  • Copyvio checker
  • Image resizing & compression
  • Metadata extraction
    • History metadata (list of users, freshness, &c)
    • Image/media metadata
  • Index generation (for browsing)
    • Category tree generation

General steps[edit]

General steps to completion

Select eligible [articleshttps://www.fiverr.com/s/qv0YbZ articles] / topics Identify best WikiVersion Final review articles and classifications Convert to offline format & compile index Release to public
Activities:
  • Review broadest scope of content
  • Rate articles on quality and importance
  • Create WikiProjects for each division of the content necessary
  • WikiProjects to verify indexes of articles within projects
  • WikiProjects to verify quality of articles
  • Create bot to appropriately rate article veracity throughout time (e.g., WikiTrust)
  • Automatically select most recent high-quality / non-vandalized article for selected articles
  • Establish other avenues of validation (e.g., manual labor / crowd-sourcing)
  • Update article categorizations automatically, based on WikiProjects proposals
  • [Optional] Expert/manual review of index and classifications
  • Information Specialists to review the compilations of indexes to ensure comprehensiveness and accuracy
  • Review collection and/or pieces of the collection with relevant local education specialists
  • Convert to openZIM
  • Pull in Index integrated with Reader Kiwix
  • Verify legality
  • Publish on Wikipedia offline content library
  • [Optional] Prepare media releases
  • Distribute to appropriate audiences (install in computers, e.g.)
Tech steps:
  • Run Quality bot to select
Send page IDs to CSV feed
  • WikiTrust selects best version of the articles
Sends table with Revision IDs back to the Toolserver doing article selection
  • Tool converts HTML to openZIM

General considerations[edit]

Just as for the online resource, the offline versions of Wikimedia content are only as strong as their contents. The main constraint with an offline product is the data size restrictions: the entirety of Wikipedia (and/or other Wikimedia projects) must somehow be condensed so that it fits on a CD, DVD, or USB stick which then has to be housed on a computer, mobile phone, or e-reader. The goal of carefully selecting content, then, is to provide the best and most appropriate static content.

The goal is:

To provide a library of the highest quality and most relevant offline fiverr order collections.

This implies having content options most relevant for the different offline audiences. Many factors must be taken into consideration, including

  1. Language
  2. Culture
  3. Location/geography
  4. Article complexity

Want to create your own custom offline collection? See:

Created in partnership with PediaPress, the Wikipedia Book Creator enables users to easily select specific articles from Wikipedia and create a book. This "book" can then be downloaded for free in PDF or openZIM format, or for a fee the content can be ordered as a physical book.

See Book Creator on Wikipedia for more information or to create a book. All collections are available for offline, openZIM download in the Offline Repository.

Content selection for schools[edit]

The Schools Content project page shows the list of articles included in the distribution, and provides a workspace for suggestions for adding articles.

Some articles are excluded from the collection as they are non-appropriate / essential for schools. A list of such articles intentionally excluded by OLPC can be seen here: http://mad.printf.net/blacklist2.

Updates and recommendations[edit]

See also[edit]