Static content group
| STATIC CONTENT |
|
|
| Static content group (talk) |
|---|
| CD/DVD on meta |
| WP 1.0 on meta |
| German CD on meta |
| Polska DVD on meta |
| Mandriva on meta |
| Offline readers |
| Offline task force on strategy wiki |
| Software tools |
| Alt parsers (on MediaWiki) |
| WikiMiner(pl,en) |
| Kiwix |
| wiki2cd
|
| GERMAN WP 1.0 (t) |
| de info in English |
|
|
| POLISH WP 1.0 |
|
|
| ITALIAN WP 1.0 (it) |
|
|
| Malayalam WP 1.0 (ml) |
|
|
| ENGLISH WP 1.0 (t) |
| Bot (t) Criteria. |
| SOS Children DVD online browsable (t) |
| Version 0.5 (t) (bot) (Nominations) (t) |
| Core topics — Torrent |
| Work via WikiProjects |
|
|
| Wikipédia Junior (active) |
| FRENCH CD (very old) |
This is a group and set of projects dedicated to gathering and sharing static content from Wikimedia projects, including CDs and DVDs, single-file databases for use with specific browsers and readers, and pdf and html exports.
Contents |
Background [edit]
In 2004, the first Wikipedia CD was released, and the concept of WikiReaders for offline printing and reading was refined. In 2006, the Special Projects Committee authorized the creation of a subcommittee dedicated to static snapshots of Wikimedia content, to identify the groups working on such projects, and to help them work effectively together and share their results. This page is derived from those efforts and the work of all who have pursued similar goals.
Related objectives
- content quality/vetting for "Wikipedia 1.0"
- content on paper
- topical subsets
- selections for different audiences
Goals [edit]
- Maintain, offer and assist interested parties in getting
- static content of Wikimedia projects in a variety of formats
- metadata about its completeness, complexity/age level, and quality.
- Coordinate efforts to produce and distribute snapshots, and software for viewing snapshots.
- Stimulate research about content and making content accessible in an offline or semi-online environment.
Status [edit]
Older status [edit]
As of mid-2006, the Wikimedia Foundation served to the world outdated version of static content of Wikipedias (November 2005 snapshot - [1]) and offered the same content for download ([2]). There seeem to be persistent small problems with static content installation (categories,search, perhaps other).
There were plans to set up an up-to-date server with current static content of Wikipedias.
MediaWiki 1.5 included routines to dump a wiki to HTML, rendering the HTML with the same parser used on a live wiki.
There have been several separate attempts at producing software to convert SQL dumps into data formats that are suitable. Directmedia Publishing GmbH in Germany had by then a successful history of distributing Wikipedia content on DVD for Windows and MacOSX (Linux is beta).
Recent status [edit]
Needs updating since 2007!
See List of Offline Projects for updated list of offline projects in the community.
Food for thought [edit]
German content [edit]
- gateway to Wikipedia content publication on WP-De
- w:Directmedia
- Wikipress.de
- "WP 1.0" related to same (on hold)
- PDA-friendly content hosted by wikipedia:Axel Schäfer, member of parliament ([3] and [4] )
- /Germany - Introduction to the current situation of "Static content in Germany and/or in Germany"
Other languages [edit]
- En: Organised by the w:Wikipedia:Version 1.0 Editorial Team with help from a bot. Release of w:Wikipedia:Version 0.5 (a test CD) is planned for late 2006.
- En: Earlier in 2006 the UK children's charity SOS Children released "A World of Learning" containing 2200 articles suitable for kids. They are now working here to expand the selection.
- Polish Wikipedia DVD: Major release planned for autumn 2006 by Wikimedia Polska in collaboration with Helion SA. See also this report in English.
- French Wikipedia: Work beginning on software, now collaborating with en and po.
- En/Fr Mandriva project: Wikimedia and Mandriva (very little activity)
- No: See no:Wikipedia:Wikipedia_1.0 (put on hold).
- Sv: See sv:Wikipedia:Wikipedia_1.0 (little activity).
- Bn: See bn:উইকিপেডিয়া:উইকিপ্রকল্প সিডি প্রকাশনা (Target: A 2000 article CD by the end of 2007)
Other project languages : Chinese? Dutch? Italian? Russian?
Other projects [edit]
Since before 2004:
- fixedreference
- The en:Webaroo Wikipedia and related packs
- OEPC : en:Wikipedia:One_Encyclopedia_Per_Child (simple, laptopwiki)
Formats and readers [edit]
Distribution formats: plain (x)HTML, PDF, TomeRaider, Plucker, Webaroo pack, proprietary formats
e-Reader projects:
- Directmedia (free, if not open source, on Linux and Mac platforms - Linux version (digibux) is GPL)
- KDE
- Browser-based (generic reader platform; javascript optional)
- Gearswiki and google gears efforts
- Other (See below)
Kiwix [edit]
Kiwix is an offline reader for web content. It's especially thought to make Wikipedia available offline. This is done by reading the content of the project stored in a file format ZIM, a high compressed open format with additional meta-data.
- Pure ZIM reader
- Content and download manager
- Case and diacritics insensitive full text search engine
- Bookmarks & Notes
- kiwix-serve: ZIM HTTP server
- PDF/HTML export
- Multilingual
- Search suggestions
- ZIM indexing capacity
- Support for MacOSX / Linux / Windows / Sugar
- DVD/USB launcher for Windows (autorun)
- Tabs
- See also
- (English) (Français) (Español) Official Web site
- (English) RSS/Atom Planet
- (English) Follow our last improvements...
- translatewiki:Translating:Kiwix for localisation
- Wikimedia endorsement (recent)
- (English) Download the text of the entire English Wikipedia (Wikimedia Foundation)
- (Français) Une clé USB avec tout Wikipédia et des logiciels libres ! (Wikimedia France)
- (Italiano) wmit:Kiwix (Wikimedia Italia) and w:it:Wikipedia:Kiwix
Useful (external) links [edit]
Online content
Software
- http://meta.wikimedia.org/wiki/Alternative_parsers#A_non-parser_dumper
- Tero-dump wikipedia to html converter
- Convert Wikipedia's SQL dump to static HTML for local installation only a prototype, very limited, but usable
- Wikimedia content for Sharp Zaurus
- TomeRaider and TomeRaider database
- Encyclopodia
- Wikipedia in ipodlinux.org
Database dumps
- Static HTML tree dumps for mirroring or CD distribution
- Dynamic HTML generation from a local XML database dump
Participants [edit]
Initial members of the subcommittee:
Other interested people:
- Eric Astor (working on OEPC)
- Erik Garrison (working on related statistics)
- Eyu100 (asked the get involved)
Guidelines and coordination [edit]
Style guidelines for each project should be written down, for the benefit of projects to come after them. Coordination across projects of aspects such as script writing can also be quite helpful -- in catching mistakes and corner cases, and in avoiding repeated effort. Some specific ideas follow.
Registering a new snapshot [edit]
Needed : a process for announcing and registering a snapshot for others to see in progress, contribute to, or download and use. Start with finished projects to date in German, English, and Polish.
Current snapshots:
- En:wp: SOS CD project; Andrew wrote some scripts for this. 2006 articles on a CD.
- Initially distributed to benefit a children's charity in early 2006
- Wikiwizzy's version of the above for distribution in S.Africa
- De:wp: Directmedia CD, then DVD.
- Pl:wp: ?? DVD, planned for completion in October (slightly new deadline).
Questions [edit]
- How should snapshots recognize authors? What's the best way to attribute WP as a project as well as individuals, not simply to satisfy the GFDL?
- How can snapshots share algorithms? Part of snapshot design is choosing content.
- How can snapshots get updated? Scripting the creation so they don't get stale; minimizing editorial time needed.
- Style : different ways to handle
- templates (navigational, other)
- foul language
- images (size, content)
- text (when too long, for balance)
- citations
- redlinks
- interlanguage, interwiki links
- Enhancing content : how to handle
- delicate subjects (warning templates)
- conflicted subjects (pov templates)
- fast-changing subjects (news / current event templates)
- (note: all this can update the dynamic database directly)
[edit]
- Interactivity and interfaces : Front-ends to read and interact with different snapshot formats.
- Reducing text : summarizing, auto-excerpting
- Ranking text : bot-assisted reviewing/vetting/rating, metric analysis (apsp, grank, hit-popularity, edit-popularity, expertise, writing style, &c)
- Metadata : bot-assisted annotation (audience, type, categorization)
- Spellchecker, grammar checker
- Copyvio checker
- Image resizing & compression
- Metadata extraction
- History metadata (list of users, freshness, &c)
- Image/media metadata
- Index generation (for browsing)
- Category tree generation
Updates and recommendations [edit]
- Add links to new moulin and kiwix projects, update links from local wikipedias re: 1.0 (via interwiki links). +sj | help with translation |+
- Add links to review ideas, including content stamping (since 2000) +sj | help with translation |+ 20:15, 26 February 2008 (UTC)
- set up a cron job with a script to update static content whenever content dump is done
- This actually sounds like a project for the Wikimedia Toolserver. -- Mathias Schindler 09:19, 3 June 2006 (UTC)