Wikipedia on CD/DVD
From Meta
| STATIC CONTENT |
|
|
| Static content group (talk) |
|---|
| CD/DVD on meta |
| WP 1.0 on meta |
| German CD on meta |
| Polska DVD on meta |
| Mandriva on meta
|
| Software tools |
| WikiMiner(pl,en)
|
| GERMAN WP 1.0 (t) |
| de info in English |
|
|
| POLISH WP 1.0 |
|
|
| ENGLISH WP 1.0 (t) |
| Bot (t) Criteria. |
| SOS Children CD (t) |
| Version 0.5 (t) (bot) (Nominations) (t) |
| Core topics — Torrent |
| Work via WikiProjects |
|
|
| Wikipédia Junior (active) |
| FRENCH CD (very old) |
Almost all of the content of of the Wikimedia projects is published under licenses which allow anybody to download the contect and publish it any way they feel like. Various projects for republishing our content have been considered in the past.
Before we (Wikimedia) can publish Wikipedia on DVD, flashcard, or any other form of fixed digital medium, we must decide what the goal of this distribution shall be. Should it simply be a means of accessing Wikipedia offline, or should it offer a more feature-rich multimedia encyclopædia, such as the Britannica DVD, or Microsoft's Encarta?
If it were to be decided to go for a more interactive added feature edition, could this be created as a program that can use both the online wikipedia, and an offline datafile as a source? In this case, the creation of a more traditional digital encyclopedia could be combined with the creation of an off-line version of Wikipedia. If this were to be decided upon, an open source project for creating this application and the file formats and protocols for the offline version, and the communication with the central server.
In any case whether the offline version will be an HTML dump, or a more sophisticated application, it must also be decided what way it will be compiled. Should it be possible to download a version of the offline Wikipedia at any time, and get an up-to-date version, or should one be compiled say every week, or every month or year...?
After this has all been decided, one must decide on a way to package this (digitally) as how much space can it use, and whether it should be targeted for CD, DVD, etc, and decide on what to bundle with it. Should it come with interfaces for an array of platforms, say Linux, Windows, Mac, or should it simply be a general data-only distribution where a client is required to be downloaded separately? This becomes less applicable if a HTML only approach is taken, but a decision on how the filesystem on the distribution should be organized.
Lastly, after ALL the work above is done, one can start talking about physical distribution. It would make great sense to allow the disk image to be downloaded online, so that people can create their own media sets, as this is a free publication. In addition one would have to decide on how to distribute the pre-produced media, if there is to be such a thing. An investigation into the interest in such a thing, as well as what price people would be willing to pay is of great importance here. Relevant questions are: Would people be willing to pay for the additional cost of a more environmental disk? etc.
[edit] Environment
When and if Wikipedia is put on CD/DVD it should be made out of recycled material like the paper wikipedia should be.
<<Sorry, this section is a bit distant and has some unnecessary parts, but the page is overall well worth reading. Please keep on.>>
YES! I will only support a paper Wikipedia if its made from recycled paper. I will only support a CD/DVD Wikipedia if it's made from recycled plastic.
- Can CDs actually be made out of recycled plastic? I tried to find some information on the subject, but didn't get very far...surely the plastic would have to be of quite high quality to minimise optical distortion (but I don't know much on the subject)... Nick04 10:46, 13 Jun 2004 (UTC)
-
- Sanyo has some technology for making CDs out of Corn, but I forgot the name of it. 61.8.35.6 02:57, 16 Dec 2004 (UTC)
-
- Recycled CDs ( If possible ) would cost a lot more than regular CDs. Not economically feasible for Wiki. Yes, it is possible to recycle CDs, but there is no such thing as a "recycled DVD-R". You can probably find recycled CD Sleeves, which I recommend if we are to package the DVDs/CDs. I would not buy a plastic case or anything other than a paper sleeve as it will 1) crack 2) get bulky and 3) costly. If bought in bulk, 10,000 CD sleeves can cost .01 cents or so... Not sure on true pricing. Cheers-Maxwell
- Of course, the ideal format for Wikipedia is something like a Young Lady's Illustrated Primer, or w:The Hitchhikers Guide to the Galaxy. But that may be a few years off :) Hell yeah, Wikipedia should be put on CD! I don't have any preference whether it's recycled. A current-version-only edition isn't likely to fill more than a couple discs anyway. Maybe we should talk to CheapBytes or other low-cost outlets about selling cheap Wikipedia CDs. -- Wapcaplet 16:29 1 Jul 2003 (UTC)
[edit] Bootable Wikipedia
Maybe it can be put on a bootable Knoppix like live CD... Wikipedix or Knoppedia. Probably bootable DVD's will pop up sometime soon, then there will be enough space to keep all the software that comes with Knoppix :) -- Guaka 21:47 5 Jul 2003 (UTC)
- I find this idea quite delightful! :) --Brion VIBBER
-
- This is already done! - http://www.copy4freedom.de/index.php?id=6
- This might be a job for Morphix. The mini-CD version takes about 200 megs, and comes with a very nicely set up version of XFCe, and Firebird as a web browser. -Elo.
- But, as a user, what would you prefer? Shut down your Windows, boot some "strange" Linux just to look up something, then reboot Windows, thus not being able to copy'n'paste stuff from the 'pedia to Word etc.? Or a Window$ program that you double click like everything else?
- The bootable wikipedia would be an option for showcases (?), libraries, schools etc., running on not-up-to-date machines, though. --Magnus Manske 15:50, 4 Dec 2003 (UTC)
-
- It doesn't have to be exclusively for booting, either. Something that can be run as a user application from within several operating systems and also can boot itself would just be a neat extra. --Brion VIBBER
-
- One could use Qemu on a host system to run the guest system with Wikipedia on it. --ugehrig
- Funny, we had on the german Wikipedia some months ago the same discussion, but it was archived... We did not think of the "showcase or not-up-to-date" Examples...
- It is a shame, that we don't really manage to get other languages onto meta... Fantasy 14:27, 5 Dec 2003 (UTC)
- I can help a bit in this area. I can prepare Knoppix-like Live Linux which will run also as "Application window" inside Windows, using Colinux [1]. You can also use IE from windows to browse wikipedia runned via Colinux. If you think this is interesting, bug me (alekibango) using IRC at [2].
[edit] Copyright issues
Any material which infringes someone's copyright would cause a major problem. It might mean a batch of CDs/DVDs would have to be withdrawn. It would also be virtually impossible to be sure that there was no infringing material on the disc.
- Malcohol 08:47, 8 Jul 2003 (UTC)
- The material for a CD/DVD needs to be checked for problems anyway, for an official release. Except for copyright issues there can also be NPOV issues, or even vandalism or just plain nonsense. I think it is better to leave articles like this completely out.
- What can be done is something CVS like, where there can be a 'stable' and an 'unstable' branch. We probably need a group of 'trusted' users that can mark certain states of articles as 'okay for official release' or 'decent' or... When there is the need for a release, this is announced, and then after a certain period, or after a certain number of 'decent' articles has been reached, or even after a specific size (400 MB?), there will be a release. Guaka 00:19, 25 Sep 2003 (UTC)
thanks a lot!
[edit] Cafepress.com
Cafe Press now offers CD burning. $4.99 is the base price, we could tack whatever amount on top. How big is Wikipedia (English) anyway? Minus the talk pages, if possible. - user:zanimum
- $4.99 is shockingly overpriced, though since cafepress deals in one-off items that's not surprising. Letting people burn their own CDs would be virtually free; a moderate-sized pressing of 1000 discs through a more typical house might cost about $1.50 each. (price list for the first hit I found on google), though there may be additional overhead for distribution. The easiest thing would be to put out ISOs and let the folks who already repackage public-domain, shareware, and freely redistributable stuff to retail channels deal with that if people want it.
- The compressed since of the English Wikipedia (wiki text only, current revisions only, includes talk pages) is 109MB. Uncompressed and in HTML it runs more towards 400 MB IIRC (there are people who've been playing with scripts to do this). A more efficient storage method (say, storing everything in a zip file) could easily pack it back down a ways. Images and media are a few hundred more megs; judicial culling, resizing, and recompression may be required to fit everything on one CD, but a multi-CD set (eewwww) or DVD-ROM would be no problem. --Brion VIBBER 02:32, 27 Aug 2003 (UTC)
[edit] Start with Wikipedia on harddisk?
Shouldn't there first be a simple .tar.gz (and maybe an .exe installer for Windows users). This way people can simply install it on their drives, and GNU/Linux distributions can offer packages. I'd love to have the 'pedia on my laptop... It could also decrease the strain on the servers, since people might first check their local version. Of course then it might be a good idea to not offer the Wikipedia files on the Wikipedia serves :) Guaka 00:28, 25 Sep 2003 (UTC)
- I have the Tagalog Wikipedia (tl) in my hard disk. --Bentong Isles 03:29, 29 October 2005 (UTC)
[edit] Flash Wiki
In addition to the above - it appears that flash cards are generally getting pretty cheap (in the UK, £20 will buy you a 2Gb SD card). Given a compressed Wiki is around 110mb, and that much higher capacity cards for the same price can't be that far (6 months?) away, perhaps putting Wiki onto a flash card should also be investigated. This would enable people to use Wiki in laptops, and portable devices. Of course, it's still rather expensive, but (IMO) a neat idea.
-
- This thing is already working. -- Mathias Schindler 19:18, 17 Jul 2004 (UTC)
[edit] Experiments
Try my Windows installer for the German wikipedia (including images and stand-alone-webserver) here. --Magnus Manske 14:29, 4 Dec 2003 (UTC)
[edit] wik2dict.py
I wrote a little script that does a very rough conversion of the database dump into a w:en:DICT file. I put it on Wik2dict. Guaka 16:09, 23 Jul 2004 (UTC)
- Is the dict file legal to distribute? I don't see any attribution at all?
[edit] Digibux software
en:Directmedia Publishing has released the Linux software for its reader software under the GPL. Digibux is shipped on the German DVD containing Wikipedia content by directmedia http://savannah.nongnu.org/projects/digibux -- Mathias Schindler 15:46, 26 May 2005 (UTC)
- In contrast to the windows software it cann
[edit] Scope and Compatibility
Should it simply be a means of accessing Wikipedia offline, or should it offer a more feature-rich multimedia encyclopædia
- Absolutely the former. I believe this is most peoples goal - accessibility of Wikipedia where/when Internet is not readily available. A "feature rich" version should come only once this type of content has evolved in the central version, otherwise you could not leverage the enormous collaboration which is probably the strongest point of Wikipedia. Requiring this type of content just for an offline version could also prevent this project getting off the ground for a number of years.
could this be created as a program that can use both the online wikipedia, and an offline datafile as a source?
- We could learn a lot here from the help system in Apple OS X. This appears to be HTML based, using a local copy. If it detects an Internet connection, it will transparently download and display the latest version in the help application, keeping it for future offline use. This would alleviate the nature of offline content becoming outdated; even if it runs off read-only media it could cache changes to hard disk. Another major advantage would be to provide Edit/Discussion etc. hyperlinks so users can still contribute to Wikipedia when a connection is possible. This way it would also give the user the choice of it becoming their standard way of viewing Wikipedia, wherever they are, rather than a fork which they need to switch between depending on which device/location they're accessing from.
Should it come with interfaces for an array of platforms, say Linux, Windows, Mac, or should it simply be a general data-only distribution where a client is required to be downloaded separately? This becomes less applicable if a HTML only approach is taken
- It should absolutely be cross platform in the interests of openness and accessibility, which I believe are the fundamental ideologies which made Wikipedia possible. From a technical aspect, HTML seems the obvious choice - a simple mirror of Wikipedia means it would work anywhere out-of-the-box (think handhelds, phones, one child per laptop) with no development and software installation required, with the option of the user viewing it in a platform-specific client which can perform the aforementioned Internet detection and syncing. This client could be a stand-alone application or as simple as a browser plugin, depending on platform availability. Xlynx 03:42, 24 November 2007 (UTC)
-
- I should acknowledege that an HTML only option would see the loss of the search feature. However, this should not be seen as a negative because platform specific clients can implement their own searching if we bundle a simple search index (sqlite?). Platforms without a client would still benefit because taking any other route leaves them with no access at all! In those cases one would fall back to using categories, redirections and cross references as demonstrated on http://static.wikipedia.org/. We can continue to improve these on the main site, allowing everyone to reap the benefits. Xlynx 10:37, 24 November 2007 (UTC)
[edit] My Thoughts
Hope this is seen as a discussion-page.... otherwise please move my message anywhere... or delete. So, I just tried the DVD of the german Wikipedia.... and I was shocked. It was horrible! No images, no variable fonts... nothing that would be comparable to a Encarta-like encyclopedia (sorry to say that). But telling that something is worse is easy - so I thought of trying to writing a software on my owm. But before I spend a lot time on something that´s already been started by someone else it would be interesting to know if there´s someone out there who already did something in this direction...
What I thought: Such a program should be available for nearly every platform (the main ones: Linux x86, Win32 & Mac OS). So you have to decide a programming-language or an library which makes it easy to port the software. An HTML-export would be possible but VERY unusable by a novice user and could not bring up useful search-functions... JavaScript could perhaps helpout there with a big index-file but I´m convinced not in a reasonable time! Anything else? Flash..? No.... even if it can read XML. Only the index-file had to be about *min* 65MB (I tried...) A webserver on a disc is better than nothing... but who do you want to reach? PC-Pro´s? Which one would be better than Java? This has many beneficials:
- Available for the obove mentiones platforms.
- Good available (and usable in this application) free libraries:
- HSQL
- an implementable SQL-Engine with the possibility to make an database read-only - for CDs f.e.). No network access is made - not even loopback. It can be implemented as standalone built-in database.
- SWT (Eclipse)
- brings up a platform-independent API to the installed Web-Browser - can be IE or Mozilla or (as I know) Safari. It comes with a platform-dependend part, which is available for the above shown platforms. The browser-API supports listening to location-changes (click on a URL) and to feed the browser with a String, which contains the html to be displayed. (All Java-Browsers I tried aren´t usable in the currenty state, the JEditorPane has only support for HTML 3 and the looking of the rendered page is horrible)
- Lucene (Apache Jakarta)
- The one and only GOOD open-source fulltext search engine. Supports ranking, index written to files and all a developer in such a circumstance could dream about. Perhaps a fulltext search is not needed but it´s a possibility to think about.
- HSQL
So, I started already experimenting a little bit. Downloaded the cur-Table of the german Wiki (1 GB unpacked), deleted the content of the discussion-pages to speed up my MySQL-Server a little bit and started a HSQL-Database which only contains the Titles of all pages, the equivalent IDs and the is_redirect column. To speed up searches for the title I added a col 'utitle' which contains the title in uppercase to support case-insensitive search in a reasonable time. This... and the index on this columns increased the speed of an search with a wildcard *dramaticaly*. The search for every entries (which can easily be >> 5000) by a wildcard (where utitle like 'E%') endures only max. half a second. The table is about ~120MB now. This is not quite small but is beaten up by the fast search! I thought it would be possible to store the text-files in a directory-structure - each file shrinked by bzip or gzip. Perhaps a better solution would be to store this in the HSQL-Database, too but had no expreriences with storing binary (compressed) data there. Compression saves >60% data here. The format had to be as in the MySQL-table an no prerendered HTML for many reasons (possible to discuss that). As I think these text-files will take ~400-500MB for the german wiki. That brings me to a big problem: How to generate HTML out of the wiki-markup? Thought of implementing php as standalone (invoced as a executable by the java app) because it´s available for nearly all platforms, too. But I thinks that idea is bullshit. So I had a look at Parser.php ... think its possible to port it to java but a BIG job. No solution for that at this time. This makes me currently headaches. Then comes the Lucene-index. I have already some experiences with lucene-indexes about ~3000-4000 documents but 300.000 (as in the german wikipedia) is another league. Have no idea how big the size of the luceneindex would be. ... think about ~150MB could be possible. Then the images... tooooooo big. Does anyone know where I can get a download for the images of the commons`? Or is there none? I´ve only seen the images of the de-wiki. It´s about 10GB now. I´ll download it the next days to get an impression on that. There will be a way to filter unused pictures (which are not linkes to a page or only on user-pages...) but even that will be too much. If there comes additional the data of some commons-pics the size will exhaust. Don´t know if it´s realistic to resize and recompress every image. Think even big servers will need a lot of time for this job and currently I´ve no idea how big the result will be. Do one have to kill some pics? Or split on 2 DVDs?
So, I hope my entry didn´t bore you... If anyone has any suggestion, solution or further interest please leave me a message on my discussion page in the german wiki [3] or send an email to "m_p2 AT gmx DOT de"
[edit] Just some thoughts about distribution
- What about an official DVD Media distribution and up-to-date bleeding-edge CDs which are made on-the-fly (server-side) for people who are willing to donate to cover the bandwidth? We could then have others help cover the burden by using a P2P application such as BitTorrent or similar. Then, we could have a group of Wikipedians who are willing to send out burned copies of the encyclopedia to anyone who requests one (free of charge, or for a nominal shipping fee). We must consider though, that there should be a CD distribution due to the fact that a lot of third-world countries might not have DVD-ROM drives. Heck, I know a lot of people who still don't have a DVD-ROM drive and don't know why they would need one.
-Yves 09:48, 15 Jun 2005 (UTC) - de:Benutzer:Cljk
- If you release it on bit torrent I would recommend a rolling release, release it to certain members first extremely secretively, then expand to more users (which will probably leak), then maybe a public release (for non-savvy users). this could save bandwidth and make the torrent 'swarm' grow rather gracefully and exponentially. unless of cource there are some major bandwidth donations. --X1987x 01:06, 9 October 2005 (UTC)
w:Project Gutenberg is doing something similar to this. They are sending a free copy of their CD and DVD to anyone who asks. They also ask for a donation, and they would prefer that people download, but they will send out their discs free of charge. So far, donations have kept pace with requests. [4] --Cannona 17:00, 20 September 2005 (UTC)
- Gutenberg's entire collection was archived to one DVD ISO image as of June 2006 and made available through BitTorrent. This is a very simple and efficient way to distribute a set of Wikipedia DVD ISO images, assuming the collection is split into eight sections in order to archive each topic area into an ISO image that will fit on a DVD. Each section of Wikipedia (Arts, Biography, Geography, History, Mathematics, Science, Society, and Technology) will have its own DVD. These DVDs can also be sold as a collection to raise funds for the Wikimedia Foundation. The pages can be converted into HTML and would therefore become instantly accessible from the DVD with no need for a custom client-side application. -fisherm77
An idea a couple of friends and I were discussing would be to have a customizable version of the 'pedia. Meaning, for the *ix people out there, when you go to install most any distro, you can select the packages or groups of packages you want installed. This "custom" option could be available in multiple "module-packages" which would contain relevant topics/subjects in that one package. There could be multiple packages, that way someone can download and use whatever (little, as the full 'pedia is 23 GB's) storage space they have available from hard drives to flash media, to contain the most relevant material for their own use. If that person has a big hard drive, they could hold more general topics, and on the other hand, if they have a puny 512MB flash drive, they could stick with what they use most.
Another benefit to using the package-system would be system independance. If the files contained just the data for the 'pedia, then "front-end" programs could be made for the different operating systems and platforms so people can download what is relevant for their situation. If they want a nice, pretty front-end that would be larger for a more permanent location such as a hard drive, then there could be one for that; and on the other hand, if they want just a simple text frontend for a flash drive with limited room, then there might be one for that as well.
Tell me what you guys think, I would really like to be a part of whatever happens with porta-wiki. -User:Ciphercast
[edit] Crossplatform Application
The oxford advanced learners dictionary uses mozilla based application development to make their product accessable across multiple operating systems. I think a smilar approach towards wikipedia on a cd or dvd or any other media would be an excellent choice. The reasons for the same would be 1) you get to use the plain old html plus any enhancements (video/audio) 2) the updates to the content pages could be made independent of the mozilla based application 3) users could just move effortlessly between the webbased and the offline version and keepup with the "unstable" wikipedia via a cvs or darcs or whatever versioning system is appropriate. 4) the application can me made to look like the real encarta's or britannica's of the world. (mozilla based but its not necessarily a browser ;) Hope that helps. Incase you want to have a look at a mozilla application as opposed to the browser try chatzilla ;) an irc client.
[edit] Underlying architecture
I believe we could utilize SQLite as an embeddable database engine, as this doesn't require an external process like mysql does. The database could then be simply shipped on the disc and an application with sqlite can access the data. When a page is loaded, the application pulls the pages markup from the database and converts it into html where it is written to hard disk temporarily. this html file can then be loaded by a html control in the applications main window, perhaps Mozilla ActiveX could be used for this. Images could be stored in an LZMA archive.
Contact joshua DOT morgan AT (nospam) gmail DOT com for comments, etc. delete (nospam).
[edit] other goings-on
- Wikimedia France is developing a free software to build a DVD version of Wikipedia;
- Wikimedia Italia is also developing a DVD, based on a web browser interface;
- italian user Emc2 is developing a Qt based visualizator for wiki content (could be used also for the dvd);
The ugly is trying to put all this efforts together...
[edit] Other Option
I would suggest a program on the computer that would download requested raw files off the server and parse the file for links and download those x deep (like a website downloader spider). A database of md5 hashes on the server of the raw data would come in handy to compare your version's hash with the server's hash.
This would allow the user to download only what they want (choose images, or not, et cetera), or to download a precompiled archive of all the files of a certain topic, or even all the files. In order to make it completely cross-compatible, my suggestion is to use a simple text file (maybe a gzip of the raw data?) and to store the files on the CD (or harddrive).
There would then be a parser, similar to the MediaWiki php system, that would parse the raw files (unzip them first? would only have to store 1 file in memory. or however many wants to be configured.) and present it to the user in a way that they can interact with it.
As far as language concerns, there are any number of languages that are cross-compatible (python, java, ...). A system such as this would allow anybody to create one that ran on their computer. I believe that just setting up the system, besides the parser, would create an explosion of parsers that could run on any system on any architecture,
[edit] Blu-ray Disc
Given the current size of English Wikipedia, the only optical disc option left for a full release with images would appear to be HD-DVD. --Beta
- Blu-ray would be better, largely as its more widespread and HD-DVD is failing. Why is the title "Blu-ray Disc" if you go on to say that it would be better on HD-DVD? Danr2k6
[edit] Kiwix
Open source Wikimedia offline reader.
- HTML offline browser
- Skin
- Search
- Support for Linux / Windows / Mac OS
http://www.kiwix.org
Installation
Kiwix 0.5 source - http://sourceforge.net/projects/kiwix/
Kiwix - 0.5 - http://ftp.crihan.fr/mirrors/wikipediaondvd.com/kiwix-0.5.iso.bz2
Mediawiki 1.9.3 - http://sourceforge.net/projects/wikipedia/
Build
Mediawiki 1.9.3
You will need to clean your monobook skin deeply. No menu, user URL, edit etc., just plain input. Dump HTML with included maintenance script:
php dumpHTML.php -d html -k monobook --no-shared-desc
Prepare image directory copy. No thumb, no temp.
kiwix 0.5 source
Compile kiwixnormalizer html parser. Check requirement in source code(/kiwixbuilder). Usage:
kiwixnormalizer $fulldir/html
When you'll get 'segmentation fault':
a) There is only text in page. b) There is not at least one link. c) Check last page before error.
kiwix 0.5
This is full released version of kiwix include files
that aren't i kiwix .0.5 source CD, but needed:
a)kiwixcomponent (One can be compiled.) b)Macos support c)Windows support
Also include later fixed code.
CD
There are many modification needed after:
Browser:
chrome/content/interfacewiki/interfacewiki.xul (browser layout definition) chrome/content/interfacewiki/js/mybrowser.js (browser layout handling)
Logo:
chrome/skins/OceanBlue/img/wiki/logowiki.png
Exe:
i586-mingw32msvc-gcc -o name.exe name.c
#include <windows.h>
main()
{
ShellExecute(GetDesktopWindow(), "open", "xulrunner\\xulrunner.exe", "application.ini", NULL, SW_HIDE );
}
Editing:
All names belonging to your new name including:
*.ico *.exe *.sh autorun.inf
Iso-ize:
mkisofs -o cd-name.iso -J -R /directory-of-cd
[edit] See also
- de:Wikipedia:Wikipedia-CD (old)
- de:Wikipedia:DVD
- fr:Projet:Wikipédia Junior
- Wikimedia and Mandriva
- en:Wikipedia talk:Version 1.0 Editorial Team
- Static version tools an effort to collect tools to assemble a CD
- en:Wikipedia talk:Pushing to 1.0
- Making offline copy from Ultimate Wiktionary and dicologos
- en:Wikipedia:Wikipedia-CD/Download


