Wikipedia on CD/DVD
|This page needs cleanup. Please help keep Wikimedia Meta-Wiki a useful resource by rewriting this page.|
|Static content group (talk)|
|CD/DVD on meta|
|WP 1.0 on meta|
|German CD on meta|
|Polska DVD on meta|
|Mandriva on meta|
|Offline task force
on strategy wiki
|Alt parsers (on MediaWiki)|
|GERMAN WP 1.0 (t)|
|de info in English|
|POLISH WP 1.0|
|ITALIAN WP 1.0 (it)|
|Malayalam WP 1.0 (ml)|
|ENGLISH WP 1.0 (t)|
|Bot (t) Criteria.|
|SOS Children DVD online browsable (t)|
|Version 0.5 (t) (bot)
|Core topics — Torrent|
|Work via WikiProjects|
|Wikipédia Junior (active)|
|FRENCH CD (very old)|
Almost all of the content of the Wikimedia projects is published under licenses which allow anybody to download the content and publish it any way they like. This has allowed various projects to republish our content, for example from the German, Polish, Portuguese and English Wikipedias.
Before any new project is planned for publishing Wikipedia on DVD, flashcard, or any other form of fixed digital medium, it should be clear what the goal of the distribution will be. Should it simply be a means of accessing Wikipedia offline, or should it offer a more feature-rich multimedia encyclopædia, such as the Britannica DVD, or Microsoft's Encarta? Will it incorporate interactive features?
Offline releases involve a clear series of steps:
- Determine the scope of the release, which is often dictated by the space available on the medium used and by the expected use of the CD/DVD. Will the release be a general one, or limited to (say) just articles on birds? Will it include all articles, or only the more important/better quality ones? Will it include text-only, or thumbnail pictures, selected pictures, or complete image/video files?
- If some selection process is involved, then criteria for selection must be determined, usually based on a combination of topic-importance and quality. Then the process itself needs to be determined - this needs to be scalable (experience has shown that manual selection is very labor-intensive and impracticable beyond 5000 articles) - and the relevant infrastructure put in place. In the English and French Wikipedias, selection has been successfully done using metadata compiled from WikiProjects; on the English Wikipedia, a selection of 31,000 articles was made from a pool of 1.6 million assessed articles.
In any case whether the offline version will be an HTML dump, or a more sophisticated application, it must also be decided how it will be compiled. Should it be possible to download a version of the offline Wikipedia at any time, getting an up-to-date version? Or should one be compiled say every week, or every month or year...?
After this has all been decided, one must decide on a way to package this (digitally) given how much space it will require and whether it should be targeted for CD, cd-rom, etc, and decide on what to bundle with it. Should it come with interfaces for an array of platforms, say Linux, Windows, Mac, or should it simply be a general data-only distribution where a client is required to be downloaded separately? This becomes less applicable if an HTML only approach is taken, but a decision must still be made regarding what file system to use on the distribution media.
Lastly, after ALL the work above is done, one can start talking about physical distribution. The most common distribution method is to simply allow the disk image to be downloaded online so that people can create their own media sets, since it is a free publication. If instead you wish to distribute pre-produced media it would be useful to determine what price people would be willing to pay before committing to the potentially substantial costs involved in producing physical media and shipping them. Relevant questions are: Would people be willing to pay for the additional cost of a more environmental disk? etc.
When and if Wikipedia is put on CD/DVD it should be made out of recycled material like the paper wikipedia should be.
<<Sorry, this section is a bit distant and has some unnecessary parts, but the page is overall well worth reading. Please keep on.>>
YES! I will only support a paper Wikipedia if its made from recycled paper. I will only support a CD/DVD Wikipedia if it's made from recycled plastic.
- Can CDs actually be made out of recycled plastic? I tried to find some information on the subject, but didn't get very far...surely the plastic would have to be of quite high quality to minimise optical distortion (but I don't know much on the subject)... Nick04 10:46, 13 Jun 2004 (UTC)
- Sanyo has some technology for making CDs out of Corn, but I forgot the name of it. 126.96.36.199 02:57, 16 Dec 2004 (UTC)
- A japanes company by the name of youko inc. has developed a CD made of recacled plastick but it only holds 1.02MB of data. Ther last one held only 89KB so ther giting ther.
- Recycled CDs ( If possible ) would cost a lot more than regular CDs. Not economically feasible for Wiki. Yes, it is possible to recycle CDs, but there is no such thing as a "recycled DVD-R". You can probably find recycled CD Sleeves, which I recommend if we are to package the DVDs/CDs. I would not buy a plastic case or anything other than a paper sleeve as it will 1) crack 2) get bulky and 3) costly. If bought in bulk, 10,000 CD sleeves can cost .01 cents or so... Not sure on true pricing. Cheers-Maxwell
- Of course, the ideal format for Wikipedia is something like a Young Lady's Illustrated Primer, or w:The Hitchhikers Guide to the Galaxy. But that may be a few years off :) Hell yeah, Wikipedia should be put on CD! I don't have any preference whether it's recycled. A current-version-only edition isn't likely to fill more than a couple discs anyway. Maybe we should talk to CheapBytes or other low-cost outlets about selling cheap Wikipedia CDs. -- Wapcaplet 16:29 1 Jul 2003 (UTC)
- what about using rewritable dvds? ...like with linux distros:
- download latest version
- record on a dvd-rw
- go to 1 or 2
--Esteban.barahona 04:20, 8 September 2008 (UTC)
Maybe it can be put on a bootable Knoppix like live CD... Wikipedix or Knoppedia. Probably bootable DVD's will pop up sometime soon, then there will be enough space to keep all the software that comes with Knoppix :) -- Guaka 21:47 5 Jul 2003 (UTC)
- I find this idea quite delightful! :) --Brion VIBBER
- This is already done! - http://www.copy4freedom.de/index.php?id=6 [dead link]
- This might be a job for Morphix. The mini-CD version takes about 200 megs, and comes with a very nicely set up version of XFCe, and Firebird as a web browser. -Elo.
- But, as a user, what would you prefer? Shut down your Windows, boot some "strange" Linux just to look up something, then reboot Windows, thus not being able to copy'n'paste stuff from the 'pedia to Word etc.? Or a Window$ program that you double click like everything else?
- The bootable wikipedia would be an option for showcases (?), libraries, schools etc., running on not-up-to-date machines, though. --Magnus Manske 15:50, 4 Dec 2003 (UTC)
- It doesn't have to be exclusively for booting, either. Something that can be run as a user application from within several operating systems and also can boot itself would just be a neat extra. --Brion VIBBER
- One could use Qemu on a host system to run the guest system with Wikipedia on it. --ugehrig
- Funny, we had on the german Wikipedia some months ago the same discussion, but it was archived... We did not think of the "showcase or not-up-to-date" Examples...
- It is a shame, that we don't really manage to get other languages onto meta... Fantasy 14:27, 5 Dec 2003 (UTC)
- I can help a bit in this area. I can prepare Knoppix-like Live Linux which will run also as "Application window" inside Windows, using Colinux . You can also use IE from windows to browse wikipedia runned via Colinux. If you think this is interesting, bug me (alekibango) using IRC at .
Any material which infringes someone's copyright would cause a major problem. It might mean a batch of CDs/DVDs would have to be withdrawn. It would also be virtually impossible to be sure that there was no infringing material on the disc.
- Malcohol 08:47, 8 Jul 2003 (UTC)
- The material for a CD/DVD needs to be checked for problems anyway, for an official release. Except for copyright issues there can also be NPOV issues, or even vandalism or just plain nonsense. I think it is better to leave articles like this completely out.
- What can be done is something CVS like, where there can be a 'stable' and an 'unstable' branch. We probably need a group of 'trusted' users that can mark certain states of articles as 'okay for official release' or 'decent' or... When there is the need for a release, this is announced, and then after a certain period, or after a certain number of 'decent' articles has been reached, or even after a specific size (400 MB?), there will be a release. Guaka 00:19, 25 Sep 2003 (UTC)
thanks a lot!
Cafe Press now offers CD burning. $4.99 is the base price, we could tack whatever amount on top. How big is Wikipedia (English) anyway? Minus the talk pages, if possible. - user:zanimum
- $4.99 is shockingly overpriced, though since cafepress deals in one-off items that's not surprising. Letting people burn their own CDs would be virtually free; a moderate-sized pressing of 1000 discs through a more typical house might cost about $1.50 each. (price list for the first hit I found on google), though there may be additional overhead for distribution. The easiest thing would be to put out ISOs and let the folks who already repackage public-domain, shareware, and freely redistributable stuff to retail channels deal with that if people want it.
- The compressed since of the English Wikipedia (wiki text only, current revisions only, includes talk pages) is 109MB. Uncompressed and in HTML it runs more towards 400 MB IIRC (there are people who've been playing with scripts to do this). A more efficient storage method (say, storing everything in a zip file) could easily pack it back down a ways. Images and media are a few hundred more megs; judicial culling, resizing, and recompression may be required to fit everything on one CD, but a multi-CD set (eewwww) or DVD-ROM would be no problem. --Brion VIBBER 02:32, 27 Aug 2003 (UTC)
- If you go to lulu.com I believe the base price for a DVD or CD is $1. As I recall. CC.
Guys, I don't know how the data for Wikipedia is organized. But this is what I'd really like to see -- and believe me, I've gone to the database section a few times and tried to make this happen; but I just don't have the technical understanding--
I'd like to be able to download the Wikipedia front-end; the thing that takes the raw data and turns it into a Wikipedia (or whatever-format) page.
Then -- well, you know how the "Advanced Features" of a custom install are organized? You go through and check boxes; and it gives you a quick synopsis of the section and the space it takes up?
With only a few dimensions of freedom -- turning images on or off; etc -- and a couple good categorization heirachies -- I think you'd have a real knockout distribution package.
So, in other words, I could go to the ACADEMIC section and say I want 1 GB of the basic info; then click down to LITERATURE and specify 2 GB; then MATHEMATICS for 1 GB; and then, back to the root, I could go to POPULAR CULTURE for 780 GB; or whatever.
The issue for me is I don't have online access at home. So, when I'm working on a project (for example, Shakespeare Studies), it'd be *incredibly cool* if I could query Wikipedia for "Shakespeare-related stuff" and have it sitting quietly on my laptop for later delving-into.
Honestly, I've tried downloading the data files (and I think I have a mirror sitting somewhere on my harddrive) but I just can't for the life of me figure out what to do with that file. It's a database format of some kind. I run Windows. I'm insulated from doing anything real.
Anyway, Wp is way-cool. CC.
I think that a good starting point would be to just provide some sort of database dump and let the world get creative with it. The ideal user interface is subject to individual needs and whims. Providing the actual database would not only allow designers to build various interfaces on top of it, but will also allow developers to use the information in various ways – not only encyclopedia type applications, but also semantic Web applications, for instance, which might extract taxonomies and such from the database.
Start with Wikipedia on harddisk?
Shouldn't there first be a simple .tar.gz (and maybe an .exe installer for Windows users). This way people can simply install it on their drives, and GNU/Linux distributions can offer packages. I'd love to have the 'pedia on my laptop... It could also decrease the strain on the servers, since people might first check their local version. Of course then it might be a good idea to not offer the Wikipedia files on the Wikipedia serves :) Guaka 00:28, 25 Sep 2003 (UTC)
- I have the Tagalog Wikipedia (tl) in my hard disk. --Bentong Isles 03:29, 29 October 2005 (UTC)
In addition to the above - it appears that flash cards are generally getting pretty cheap (in the UK, £20 will buy you a 2Gb SD card). Given a compressed Wiki is around 110mb, and that much higher capacity cards for the same price can't be that far (6 months?) away, perhaps putting Wiki onto a flash card should also be investigated. This would enable people to use Wiki in laptops, and portable devices. Of course, it's still rather expensive, but (IMO) a neat idea.
- This thing is already working. -- Mathias Schindler 19:18, 17 Jul 2004 (UTC)
- Is the dict file legal to distribute? I don't see any attribution at all?
en:Directmedia Publishing has released the Linux software for its reader software under the GPL. Digibux is shipped on the German DVD containing Wikipedia content by directmedia http://savannah.nongnu.org/projects/digibux -- Mathias Schindler 15:46, 26 May 2005 (UTC)
- In contrast to the windows software it cann
Scope and Compatibility
Should it simply be a means of accessing Wikipedia offline, or should it offer a more feature-rich multimedia encyclopædia
- Absolutely the former. I believe this is most peoples goal - accessibility of Wikipedia where/when Internet is not readily available. A "feature rich" version should come only once this type of content has evolved in the central version, otherwise you could not leverage the enormous collaboration which is probably the strongest point of Wikipedia. Requiring this type of content just for an offline version could also prevent this project getting off the ground for a number of years.
could this be created as a program that can use both the online wikipedia, and an offline datafile as a source?
- We could learn a lot here from the help system in Apple OS X. This appears to be HTML based, using a local copy. If it detects an Internet connection, it will transparently download and display the latest version in the help application, keeping it for future offline use. This would alleviate the nature of offline content becoming outdated; even if it runs off read-only media it could cache changes to hard disk. Another major advantage would be to provide Edit/Discussion etc. hyperlinks so users can still contribute to Wikipedia when a connection is possible. This way it would also give the user the choice of it becoming their standard way of viewing Wikipedia, wherever they are, rather than a fork which they need to switch between depending on which device/location they're accessing from.
Should it come with interfaces for an array of platforms, say Linux, Windows, Mac, or should it simply be a general data-only distribution where a client is required to be downloaded separately? This becomes less applicable if a HTML only approach is taken
- It should absolutely be cross platform in the interests of openness and accessibility, which I believe are the fundamental ideologies which made Wikipedia possible. From a technical aspect, HTML seems the obvious choice - a simple mirror of Wikipedia means it would work anywhere out-of-the-box (think handhelds, phones, one child per laptop) with no development and software installation required, with the option of the user viewing it in a platform-specific client which can perform the aforementioned Internet detection and syncing. This client could be a stand-alone application or as simple as a browser plugin, depending on platform availability. Xlynx 03:42, 24 November 2007 (UTC)
- I should acknowledege that an HTML only option would see the loss of the search feature. However, this should not be seen as a negative because platform specific clients can implement their own searching if we bundle a simple search index (sqlite?). Platforms without a client would still benefit because taking any other route leaves them with no access at all! In those cases one would fall back to using categories, redirections and cross references as demonstrated on http://static.wikipedia.org/. We can continue to improve these on the main site, allowing everyone to reap the benefits. Xlynx 10:37, 24 November 2007 (UTC)
Hope this is seen as a discussion-page.... otherwise please move my message anywhere... or delete. So, I just tried the DVD of the german Wikipedia.... and I was shocked. It was horrible! No images, no variable fonts... nothing that would be comparable to a Encarta-like encyclopedia (sorry to say that). But telling that something is worse is easy - so I thought of trying to writing a software on my owm. But before I spend a lot time on something that´s already been started by someone else it would be interesting to know if there´s someone out there who already did something in this direction...
- Available for the obove mentiones platforms.
- Good available (and usable in this application) free libraries:
- an implementable SQL-Engine with the possibility to make an database read-only - for CDs f.e.). No network access is made - not even loopback. It can be implemented as standalone built-in database.
- SWT (Eclipse)
- brings up a platform-independent API to the installed Web-Browser - can be IE or Mozilla or (as I know) Safari. It comes with a platform-dependend part, which is available for the above shown platforms. The browser-API supports listening to location-changes (click on a URL) and to feed the browser with a String, which contains the html to be displayed. (All Java-Browsers I tried aren´t usable in the currenty state, the JEditorPane has only support for HTML 3 and the looking of the rendered page is horrible)
- Lucene (Apache Jakarta)
- The one and only GOOD open-source fulltext search engine. Supports ranking, index written to files and all a developer in such a circumstance could dream about. Perhaps a fulltext search is not needed but it´s a possibility to think about.
So, I started already experimenting a little bit. Downloaded the cur-Table of the german Wiki (1 GB unpacked), deleted the content of the discussion-pages to speed up my MySQL-Server a little bit and started a HSQL-Database which only contains the Titles of all pages, the equivalent IDs and the is_redirect column. To speed up searches for the title I added a col 'utitle' which contains the title in uppercase to support case-insensitive search in a reasonable time. This... and the index on this columns increased the speed of an search with a wildcard *dramaticaly*. The search for every entries (which can easily be >> 5000) by a wildcard (where utitle like 'E%') endures only max. half a second. The table is about ~120MB now. This is not quite small but is beaten up by the fast search! I thought it would be possible to store the text-files in a directory-structure - each file shrinked by bzip or gzip. Perhaps a better solution would be to store this in the HSQL-Database, too but had no expreriences with storing binary (compressed) data there. Compression saves >60% data here. The format had to be as in the MySQL-table an no prerendered HTML for many reasons (possible to discuss that). As I think these text-files will take ~400-500MB for the german wiki. That brings me to a big problem: How to generate HTML out of the wiki-markup? Thought of implementing php as standalone (invoced as a executable by the java app) because it´s available for nearly all platforms, too. But I thinks that idea is bullshit. So I had a look at Parser.php ... think its possible to port it to java but a BIG job. No solution for that at this time. This makes me currently headaches. Then comes the Lucene-index. I have already some experiences with lucene-indexes about ~3000-4000 documents but 300.000 (as in the german wikipedia) is another league. Have no idea how big the size of the luceneindex would be. ... think about ~150MB could be possible. Then the images... tooooooo big. Does anyone know where I can get a download for the images of the commons`? Or is there none? I´ve only seen the images of the de-wiki. It´s about 10GB now. I´ll download it the next days to get an impression on that. There will be a way to filter unused pictures (which are not linkes to a page or only on user-pages...) but even that will be too much. If there comes additional the data of some commons-pics the size will exhaust. Don´t know if it´s realistic to resize and recompress every image. Think even big servers will need a lot of time for this job and currently I´ve no idea how big the result will be. Do one have to kill some pics? Or split on 2 DVDs?
So, I hope my entry didn´t bore you... If anyone has any suggestion, solution or further interest please leave me a message on my discussion page in the german wiki  or send an email to "m_p2 AT gmx DOT de"
Just some thoughts about distribution
- What about an official DVD Media distribution and up-to-date bleeding-edge CDs which are made on-the-fly (server-side) for people who are willing to donate to cover the bandwidth? We could then have others help cover the burden by using a P2P application such as BitTorrent or similar. Then, we could have a group of Wikipedians who are willing to send out burned copies of the encyclopedia to anyone who requests one (free of charge, or for a nominal shipping fee). We must consider though, that there should be a CD distribution due to the fact that a lot of third-world countries might not have DVD-ROM drives. Heck, I know a lot of people who still don't have a DVD-ROM drive and don't know why they would need one.
- If you release it on bit torrent I would recommend a rolling release, release it to certain members first extremely secretively, then expand to more users (which will probably leak), then maybe a public release (for non-savvy users). this could save bandwidth and make the torrent 'swarm' grow rather gracefully and exponentially. unless of cource there are some major bandwidth donations. --X1987x 01:06, 9 October 2005 (UTC)
w:Project Gutenberg is doing something similar to this. They are sending a free copy of their CD and DVD to anyone who asks. They also ask for a donation, and they would prefer that people download, but they will send out their discs free of charge. So far, donations have kept pace with requests.  --Cannona 17:00, 20 September 2005 (UTC)
- Gutenberg's entire collection was archived to one DVD ISO image as of June 2006 and made available through BitTorrent. This is a very simple and efficient way to distribute a set of Wikipedia DVD ISO images, assuming the collection is split into eight sections in order to archive each topic area into an ISO image that will fit on a DVD. Each section of Wikipedia (Arts, Biography, Geography, History, Mathematics, Science, Society, and Technology) will have its own DVD. These DVDs can also be sold as a collection to raise funds for the Wikimedia Foundation. The pages can be converted into HTML and would therefore become instantly accessible from the DVD with no need for a custom client-side application. -fisherm77
An idea a couple of friends and I were discussing would be to have a customizable version of the 'pedia. Meaning, for the *ix people out there, when you go to install most any distro, you can select the packages or groups of packages you want installed. This "custom" option could be available in multiple "module-packages" which would contain relevant topics/subjects in that one package. There could be multiple packages, that way someone can download and use whatever (little, as the full 'pedia is 23 GB's) storage space they have available from hard drives to flash media, to contain the most relevant material for their own use. If that person has a big hard drive, they could hold more general topics, and on the other hand, if they have a puny 512MB flash drive, they could stick with what they use most.
Another benefit to using the package-system would be system independance. If the files contained just the data for the 'pedia, then "front-end" programs could be made for the different operating systems and platforms so people can download what is relevant for their situation. If they want a nice, pretty front-end that would be larger for a more permanent location such as a hard drive, then there could be one for that; and on the other hand, if they want just a simple text frontend for a flash drive with limited room, then there might be one for that as well.
Tell me what you guys think, I would really like to be a part of whatever happens with porta-wiki. -User:Ciphercast
The oxford advanced learners dictionary uses mozilla based application development to make their product accessable across multiple operating systems. I think a smilar approach towards wikipedia on a cd or dvd or any other media would be an excellent choice. The reasons for the same would be 1) you get to use the plain old html plus any enhancements (video/audio) 2) the updates to the content pages could be made independent of the mozilla based application 3) users could just move effortlessly between the webbased and the offline version and keepup with the "unstable" wikipedia via a cvs or darcs or whatever versioning system is appropriate. 4) the application can me made to look like the real encarta's or britannica's of the world. (mozilla based but its not necessarily a browser ;) Hope that helps. Incase you want to have a look at a mozilla application as opposed to the browser try chatzilla ;) an irc client.
I believe we could utilize SQLite as an embeddable database engine, as this doesn't require an external process like mysql does. The database could then be simply shipped on the disc and an application with sqlite can access the data. When a page is loaded, the application pulls the pages markup from the database and converts it into html where it is written to hard disk temporarily. this html file can then be loaded by a html control in the applications main window, perhaps Mozilla ActiveX could be used for this. Images could be stored in an LZMA archive.
Contact joshua DOT morgan AT (nospam) gmail DOT com for comments, etc. delete (nospam).
- Wikimedia France is developing a free software to build a DVD version of Wikipedia;
- Wikimedia Italia is also developing a DVD, based on a web browser interface;
- italian user Emc2 is developing a Qt based visualizator for wiki content (could be used also for the dvd);
The ugly is trying to put all this efforts together...
I would suggest a program on the computer that would download requested raw files off the server and parse the file for links and download those x deep (like a website downloader spider). A database of md5 hashes on the server of the raw data would come in handy to compare your version's hash with the server's hash.
This would allow the user to download only what they want (choose images, or not, et cetera), or to download a precompiled archive of all the files of a certain topic, or even all the files. In order to make it completely cross-compatible, my suggestion is to use a simple text file (maybe a gzip of the raw data?) and to store the files on the CD (or harddrive).
There would then be a parser, similar to the MediaWiki php system, that would parse the raw files (unzip them first? would only have to store 1 file in memory. or however many wants to be configured.) and present it to the user in a way that they can interact with it.
As far as language concerns, there are any number of languages that are cross-compatible (python, java, ...). A system such as this would allow anybody to create one that ran on their computer. I believe that just setting up the system, besides the parser, would create an explosion of parsers that could run on any system on any architecture,
Or, an application like google earth, which would be installed on the PC and gets data from the internet can be quite handy.188.8.131.52 13:41, 3 July 2008 (UTC)
Given the current size of English Wikipedia, the only optical disc option left for a full release with images would appear to be Blu-ray. a convensional ISZ (ISO Compressed format) can store 20gb of text data (greatly and fast compressed) in less of 2gb of file... then, can be mounted with Daemon Tools for example for navigate the iso image with pedia'. --Beta
- isn't the point of this to make wikipedia more accesible?. how many people do you think have BR drives now, and how many of them lack internet?. if money isn't the problem then it should be distributed on 128gb flashdrives, or gold-made hard drives?
Kiwix is an offline reader for web content. It's especially thought to make Wikipedia available offline. This is done by reading the content of the project stored in a file format ZIM, a high compressed open format with additional meta-data.
- Pure ZIM reader
- Content and download manager
- Case and diacritics insensitive full text search engine
- Bookmarks & Notes
- kiwix-serve: ZIM HTTP server
- PDF/HTML export
- Search suggestions
- ZIM indexing capacity
- Support for MacOSX / Linux / Windows / Sugar
- DVD/USB launcher for Windows (autorun)
- See also
- (English) (Français) (Español) Official Web site
- (English) RSS/Atom Planet
- (English) Follow our last improvements...
- translatewiki:Translating:Kiwix for localisation
- Wikimedia endorsement (recent)
A software (currently only GNU/Linux version) for generating the CD/DVD of the selected Wiki articles.
- Software: http://github.com/santhoshtr/wiki2cd
- Documentation: http://wiki.github.com/santhoshtr/wiki2cd/
This project aims at distributing the Wikimedia knowledge to people in developing countries: MoulinWiki page
Code base was merged with the Kiwix one. Next releases will be technically based on Kiwix and published and advertised with the Moulinwiki name.
Concepts for a dedicated "MediaWiki" client application with live update and patrolling tools
Long comments, but lots of ideas exposed here that could be useful for the long term of Wikimedia projects (and other sites built with MediaWiki).
Summary of the concepts
- build two applications : one proxy application that hosts a local installation of Apache, MySQL, PHP, and MediaWiki, plus a proxy service communicating with other master MediaWiki servers with raw pages only, and a update scheduler that can be patrolled.
- an application that features a builtin HTML renderer (Gecko, WebKit?) and useful addons for local edits and rendering through this proxy application, it also contains the support for the authorization key needed to patrol the downloaded pages in the local cache (that will be rendered in local browsers or in the specific MediaWiki browsing application, using the HTML renderer built in the proxy application).
- put those two applications on the bootable CD/DVD, and allow the CD/DVD to host the application to run also from any other supported client OS.
- allow these applications to be installed and use even if the local database is empty (the database archive could be downloaded separatebly, including through the specific MediaWiki application).
- markup/tags added for helping building the CD/DVD/downloadable archive distribution (automatically excludes some non-cachable contents that will not be part of the distribution).
- new tags added to the MediaWiki syntax to mark contents in categories or directly in some pages (using standard content rating), and to enforce the national legal restrictions for some kinds of contents.
- support for live updates (controled by users themselves or the supervizor of the central proxy application running in a LAN).
- support for a manager of the Wikimedia sites policy (for sending page edits), built in the proxy.
- avoiding vandalism or other abuses (including copyright abuses or leakage) made "anonymously" from private company or school networks by students or workers.
- enforcing the legal national copyright policies (and restrictions about illegal contents that are legal in other areas, or contents not suitable for a company's work).
- better patrolling tools and cooperation in teams with a supervizor : allows relecture and correction by all local members of this team, and local supervision of updates.
- reducing the work load of the Wikimedia HTML renderer (in MediaWiki and PHP) and of existing slave caches, by allowing to do this task directly in the local proxy application (that just communicate with MediaWiki servers with raw pages and raw history queries): all this can be done locally.
- permitting collaboration in private working teams (intermediate work and edits invisible online, until they are validated by team supervizors and submitted by the proxy application).
- permitting the creation of serious relecture teams like scientific relecture groups, research centers, academies (that will patrol and select the best stable versions they want in their local distribution, and possibly submitting their selection online through the existing online patrolling tools and article edit tools).
- most Bots working now offline (and much faster) on their own local database through the cache of the local proxy, with supervizion before final submission by the proxy (enforcing the edit policy for the team or bot account).
- no more need for these bots to retreive and install a full copy of the database (updated automatically by the proxy application).
- no more need for team-specific or user-specific categories or pages to monitor their projects (patrolling and content supervizion made locally and offline).
- possibility offered now by the proxy for local-only discussion pages in teams (no need to send them in the online repository): less load for the common servers: they can use local-specific private namespaces for their local user pages, local projects and local discussions.
- teams can be formed also online (not just in LANs : just an URL to configure for pointing the team cooperation proxy), but can still contribute indirectly to the common Wiki projects: this is full decentralization of powers, less work for Wikimedia admins and patrollers, for teams that are corerctly managed themselves and easily contactable: the team supervizors read held responsible for all works made by their team members and submitted by their common proxy ; and teams can use their own prefered MediaWiki server-side extensions or client-side Gadgets for this task, also hosted by the proxy in their private namespaces.
Initial long post follows (still needs revizions and discussions)
It would be great if the CD/DVD was also coming with a dedicated browser application, that allows fetching articles from the local database, but also can check if updates are available onlines, and offer a way to download the updated version in a local cache on harddisk. Then when you visit a page, you'll be offered at first the validated version (on CD/DVD, or marked as patrolled in the online database), plus a link to see the live version.
When an article on the CD/DVD does not exist (including discussion pages or Users and Talk pages), the page will be retreived online (caching locally the user pages or talk pages could be left as an option, off by default).
Those pages that are in some cachable namespaces (i.e. the namespaces configured on the CD/DVD or downloadable package) but that shuold not be on the distributions (such as project pages, and some pages constantly updated to reflect the current online activities) should also be markable using a specific MediaWiki magic keyword like __NOCACHE__ (it will insert the page in a special hidden category), to help build the CD/DVD/downloadable distribution by excluding those pages (or all pages in categories marked in a similar way with __NOCACHECAT__, without having to mark individually all the listed pages or subcategories specifically with __NOCACHE__). This would require a minor addition in MediaWiki, very similar to what has been done already for __HIDDENCAT__.
The dedicated browser application could be stored either on the CDROM/DVDROM (or Flash USB key), and could also be used separately, either directly from the CD/DVD/Flash USB key, or from an installable package. This application would be built using a free HTML renderer (most probably built using the Gecko engine from Mozilla), or could be configured (when installed on PC's) to use the HTML renderer of the underlying OS where it is running (so why not the IE engine, or Opera, or Webit used by Safari and Google Chrome).
This application would be ported to Windows, Linux and MacOSX (at least), and probably too on MacOS 9.
It would manage the Wiki cache in a local MySQL installation, and would render the pages itself (using a local PHP installation), instead of caching only the pages rendered by the online web site (this would save server resources).
It would also allow users to configure which namespaces to cache (notably if they want to cache locally the images thumbs rendered by the online server, and if they want to cache also the full image media files, in which case the local cache of thumbs would no longer be necessary and locally purgeable automatically).
The application could also be used to prefetch automatically in its cache ONE set of pages that are directly referenced by the visited page (it would contain a downoad queue, with priorities, and would contain a policy manager to avoid abuses, for example, this automatic downloader would prefetch at most one page marked in its queue every 30 seconds. Users could manage this queue by deciding for example to insert a list of pages (for example by adding a category page) in a personal job to complete, competing with the default job of the automatic prefetcher (users could manage the priorities/shares of activity between the default auto-prefetcher and their personal download jobs or upload jobs.
It would also contain addition tools, notably the possibility to use an easier article editor (instead of the online editor) or an external notepad to edit the WikiCode, and new ways to explore the online and offline histories. It would integrate some patroling tools. It would work also, even if there is no local installation of the database (like the one built on CDROM/DVDROM).
Finally it would contain an update check tool that allows updating the local installation of PHP, MySQL, Apache web server, and of the MediaWiki web service.
This application could be used to allow users to make all edits offline, testing all pages, then a submission tool would allow to send the edits online, by marking them: this would be permitted provided that the set of pages to send in one operation is limited to 50 pages (and it would use the policy given for users that don't have the bot flag, i.e. one update per minute). Unless the application is configured to send updates to a bot-approved account (in which case, it will use the policy that has been setup for bots on the associated wiki project).
This application would save lots of work currently made by the server: the application would not download the rendered HTML pages, but directly the raw pages in wiki format, that the application would render locally using its own local PHP engine and SQL engine for its builtin MediaWiki server installation.
Because the application would contain its local database and a local offline cache of the online database, it will still work and render the pages, even if the Wiki site has problems (like HTTP errors 40x, 50x, or Internet connectivity problems, or no Internet connection at all).
Globally, even the online navigation on WikiMedia sites (using the application with no local database but just the local cache) would be much faster through this application that would then become the prefered way to work with Wikimedia projects, instead of using only a web browser.
Users could also manage the content of the local pages, by patrolling them for their own offline use (for example, a teacher could visit some Wiki pages, that would be cached automatically, but then would be able to decide which version (in the history) to keep as patroled by themselves. Once this is done, the Internet connection can go offline, and students can browse only those patrolled pages stored in the local cache.
The application would also serve as a standard web server accessible from a LAN, instead of just by the local application itself with its builtin browser renderer (acting as a proxy when it will download and cache the raw pages that it currently does not have locally), so that other computers on the LAN would be pointing to the private LAN host, using either a standard browser or the same application, instead of pointing to the online web service.
With this facility, collaboration in teams would be facilited: they would still be able to edit pages together, that would be stored on the local cache of the LAN central application holding the case. The central application could patrol those edits, and then mark those to send online using the features (and policies) described above. The perfect way for allownig schools to work collaboratively on articles, with the supervision of teachers !
This would mean: less vandalism seen online (the LAN in schools could be configured so that the online websites of WikiMedia projects are NOT directly accessible without using the central LAN proxy application hosting the local cache for downloaded pages and cache for the local edits that are not sent directly without central supervision by the teacher of school admins).
This proxy would also allow enforcing the copyright issues for pages edited on private school networks (because pages could not be sent directly to the online database without neing patrolled first on the LAN central application by a central supervisor). Such proxy for example could also force the addition of the missing or incorrect copyright and licence indications when their local users are creating pages or uploading medias.
Of course this is not limited to schools, and would be used also in enterprise networks, to avoid leakage of private information due to edits made by their workers. And with it, many companies (and also Chinese ISPs that are restricted by Chinese laws) would finally be able to completely reopen the link (through this patrolling proxy) to all Wikimedia sites (without having to ban them completely through severe and stupid firewall or proxy restrictions).
The best schools, that are correctly supervized and patrolled, could then gain a new flag for their own "Supervizor" user account (similar to the existing Bot flag, something like a "Team Supervizor flag" for this special account) if they have lots of students contributing: their update queue job could send pages faster than usual isolated users (that are limited to one update sent to servers every minute), because all what they send will have been patrolled locally by their central supervisor.
It would also become possible to every Wikipedians and MEdiaWiki admins to talk directly to the supervizor, and signal abuses in the team they are supposed to supervize and patrol. This LAN proxying application would also save local LAN ressources and internet link bandwidth (because of its builtin cache).
If the caching MediaWiki proxy application is correctly setup, it would even be unnecessary to download a local copy of the database to start interacting this way with MediaWiki sites: it could start simply with a preconfigured empty local cache, with the cache enabled, and the cache would update itself nearly automatically (without causing more traffic to the online web service, when compared to many users using their own local browsers to connect directly to the Internet.
This application could also be usable to work as well on private-only MediaWiki databases (without any interaction as a slave cache with a master online database like Wikimedia services). It would work as well for those external websites that are replicating the content of the Wikimedia servers, but are currently generating too much traffic on the Wikimedia servers, just to update themselves.
Note: The patroling features built in the proxy would also be usable for parental control as well (for contents in pages or categories that parents think are not suitable for their children), but I think that MediaWiki should better adopt the support for existing content marking and audience (about sex, violence, drugs, gaming, live discussions...) by generating the appropriate standard HTML header meta-tags for pages (or categories) containing some Wiki markup.
Such markup could also be used to mark the pages or categories that are considered illegal in some countries; this would please to China (however the use of such "politically sensible" markup should be made under supervision of the online community, to avoid that someone marks too many pages unrelated to the effectively illegal content) and would facilitate the work that supervizors need to do (if there are too many pages to patrol). It would facilitate the use of Wikimedia sites in companies at well (because it would be possible to reopen the link to specific subjects or categories that are not marked with some content rating tags, not just those that are listed in categories related and useful to the company's work).
What do you think about this great concept ? verdy_p 10:20, 16 November 2008 (UTC)
Comments about these concepts
You've made some excellent points here. The only thing is that this page is quite dormant of late. If you want to re-initialise discussion of this topic, perhaps try Wikimedia Forum. — This unsigned post was made by Anonymous Dissident 12:59, 16 November 2008 (UTC)
- The page is not supposed to be dormant, it is linked from the side-bar on the left (the link is named "DVDs" in the fram titled "beyond the web"). With those concepts implemented, creating CD/DVD distributions will be facilitated a lot (with less issues, notably those related locally illegal or unsuitable contents, or related to copyright issues). verdy_p 12:15, 16 November 2008 (UTC)
- Note: Instead of restarting the discussion, I've posted short pointers to this page in the Wikimedia forum, and in some talk pages (on Meta and on English Wikipedia) where some projects are made (but difficult to complete for now) for a Wikipedia 1.0 distribution. verdy_p 12:30, 16 November 2008 (UTC)
Why did I spoke about the possibility to use the supervizion tools for both the downloading (visiting) and uploading (edits) contents on MediaWiki projects ? Wikipedia and similar projects are criticized a lot for the lack of policy and legal enforcement (and the difficulty of applying a unique policy that is usable worldwide in all places).
This has caused Wikipedia to be banned completely in some countries or in organizations, instead of just blocking those contents that are not suitable for what the national regulators want. Instead of banning these sites completely, they would have their own user account (flagged as supervizor) with which they could add markup in pages or categories that they dont't or that are proven to be illegal in their countries, or forbidden at work, or unsuitable for children for parental control (such content rating is now mandatory even in free countries). This is a way to keep the Wikipedia sites open to everybody for MOST of its content that does not cause problems, and it will stop the non-productive discussions about what is suitable or legal in some places, but not in other places.
This means that we will be pleased to see again the many millions (or billion?) of Chinese contributors, and contributors from islamic countries, or even those currently leaving in dictatures with severe Internet restrictions (Tunisia, Belarussia, Burma, China, Iran... and even some US governmental sites) that have already cut the link completely because they really can't manage the huge flow of content available there.
- For this to work reliably, I propose that the user accounts with the "Team Supervizor" flag must be authenticated (no anymity, proof of identity), and must also have a reachable and responsive contact address (in case of problems, just like with bots). Their submission being authenticated, Wikipedia or other sites could not be accused of violating a legal rule for contents that they have approved themselves though their supervizion account used to submit them via their proxy (the MediaWiki history should be a proof of approval and patrolling by these team supervizors (or national Internet regulators) and could be flagged as such: in them we would find also the existing patrolling flags that they could be allowed to use, and they would be searchable specifically in the history.
- In case of change of their applicable policy, they should contact the Wiki project admins and speak with the community, and explain why they want to have some contents removed or no more approved: they should provide proof and references about why this change is necessary. It should not be acceptable for those supervizors to remove a content directly, however they may still add a tag in the pages, without having to remove them.
- Such added content rating or restriction tags/labels will be shown in the history and are reversible, per community decision, if they do not match the proven applicable policy (this should avoid private companies like ISPs to make the police themselves, based on unproven allegations of violation, instead of asking to their national regulator or to national content rating groups, to get their official advice about these allegations of violation).
- These new features would also offer better security for users in those countries or areas, if they did not fear (just because they are visiting some untagged and unprotected page) to be accused of ignoring a legal restriction (and then risk jail penalties, seizure of their computers, blocking/closure of their Internet access, financial fines, loosing their job, carrier restrictions... or even worse physical abuses by the police or military forces), or because they were just visiting and contributing to legal pages on Wikipedia, despite of the total ban of the site.
- With those concept implemented, Wikipedia would no longer need to be fully blocked at home by parental control programs, or by organizations filtering proxies and firewalls. This means more users online (in addition this allows better content for the long term on all wikis, due to better classification, rating, and notation by serious teams with their own supervision, with tools that can really help them make a better classification of the suitable/legal content).
- Note that the CD/DVD/downloadable distribution of the database should also contain the (hidden) rating tags/labels, the distributed compressed database itself could be encrypted, and decrypted automatically on the fly by the proxy application, and served to the user if the user profile matches these ratings labels (a country could be selected immediatetely when installing the application to enforce the national restrictions). If this is still not enough or encryption is not an option (according to the GFDL, but also because the protection will be insufficient for legal enforcement), specific selection distributions could be made for these countries, thanks to these labels/tags.
- There's nothing worse for the project than a complete local ban of Wikipedia (due to only a small part with problematic contents) that cuts the link to all the many other useful contents. This means more freedom for users, rather than less. verdy_p 12:51, 16 November 2008 (UTC)
- Thanks for alerting us at the English 1.0 page. I don't understand all of the suggestions here, because I don't understand all of the technical aspects - I'm not very good at such things, I'm afraid! If the goal is to standardise on software between languages, I think things are already moving in that direction. Kiwix is French software used to deliver the English release, with the German article format. We also consulted closely with the Polish 1.0 team during their work. At our last IRC meeting for the English 0.7 release, we had people from fr, en and de participating. I think we will end up with some standard software packages that all languages can use, and already several languages are using the standard article assessment scheme.
The purpose is not about standardizing things across language versions, because I know that it will not work as expected, due to huge differences between the administration levels across wikis. It's about tagging the pages (stored in the online database) about the legal issues and things related to the adult content rating. With those tags (that would ideally be inherited automatically by articles that are part of a category marked with such tags), it would become much easier to generate the offline distributions for some country or rated public, because it would be possible to exclude those articles from the distribution directly. The same concept could be used to tag the versions of articles that have been reviewed for inclusion in a offline distribution (this way, articles can continue to evolve, but the distribution is still stabilized). verdy_p 18:36, 21 November 2008 (UTC)
- Adding custom features to the software is certainly a good idea. Copyright issues are certainly a headache, and you're right - an offline release can handle such things better than a global website. I think we all aim to make our releases updatable, and our current approach to article selection on en is designed to facilitate that. But you seem to go well beyond simple updating- these ideas are certainly interesting. Have you discussed these ideas with Kelson or Pascal Martin? You might also discuss these ideas with User:CBM on en, who produced the 0.7 selection. Thanks, Walkerma 05:09, 19 November 2008 (UTC)
- I don't know them, and have not have any contact with them. But if you know them, you can give them a link to this page and ask them what they think about these concepts. I don't think that the two separate projects are complicate to implement. In fact thes will require three separate projects:
- updating MediaWiki to support the queuing/supervizion extensions needed for proxies (not needed for the live version, except possibly for per-user and per-project queues in the long term). These will have the form of a few new Special pages for handling this outgoing queue, and for managing the hierarchy of external databases and the import/export policy used by the proxy or set according to the user preferences.
- making an installable version of MediaWiki with all the tools needed: a web server, PHP, renderers (images/thumbnails, and streaming medias if needed for players embedded in pages), and MediaWiki itself (including its managed queue and local caches/databases).
- making a dedicated client (with addon facilities for edits and patrolling/reviewing), based on standard browser components (IE, Gecko) and some specific toolbars and local UI menus. This requires developers that know each target client platform (Windows, Linux, MacOSX) and how to create a custom borwser and add the preference panels needed to configure and manage the hierarchy of proxies/servers and manage the logons (user profiles and password for each one, and the optional security credentials needed for them, notably for proxies that may require strong authentication for protecting their local supervizor user accounts)... Note that dedicated clients for small mobile devices or accessility devices could be useful as well, but they are less essential for the WMF.
- building separate projects for creating distributions (CD/DVD/downloads) of the database. This means a team with its own local users and maps from these users to onlien users on the upstream remote central databases.
- The produced distribution could optionally include the two last softwares (on the CD/DVD or on the team's download site). For this separate project, teams may receive the help from the makers of the two software suites. Ideally, the WMF shuold be able to create and manage itself the distribution of the software and of snapshots of its own databases (like it already does). A small installable package containing the two softwares could be avaiblable and installable by anyone (such minimal package would probably not contain the extensions needed only by advanced users, like Python for scripts/bots).
- I've seen Google looking into starting such free project in its experimental labs, but other large content providers would be also helpful (think Yahoo or ISPs and other adverizing portals). The good question will then be: which third-party provider will make the best installable version of the two packages? And how can we make them collaborate for ensuring interoperability of their clients/proxies... verdy_p 18:44, 19 January 2009 (UTC)
The purpose of a dedicated MediaWiki client running on PCs is to facilitate the editing of articles, and allowing the online servers to do much less things:
- the dedicated MediaWiki proxy software (installed on PCs and running on the offline distribution, containing the MediaWiki software in PHP, an Apache server, the proxying extension, and the supervizion extension) will do that rendering itself, and will allow monitoring the changes, and storing them locally, before sending them online after they have been supervized. The distribution will no longer need to contain the rendered HTML, but can contain directly the raw articles in Wiki code format. It will still be possible to use this software without having to download and install a large local database, but the software will manage its own local database for the pending changes and for the cached raw pages that have been downloaded automatically by the proxy, when the user visits a page.
- the dedicated MediaWiki client software will connect by default though the installed proxy. Most users will still be able to edit articles and save them rapidly. These pages will be stored in the local database, and be added to a list of pages to update online. Once the user (or the supervizing user managing the proxy) has tested and validated the pages, these pages can be sent online automatically by the proxy, that will cnotain its own job queue, and a local history log to manage errors and edition conflicts (because these articles could have been updated online). It's up to the proxy update job to make sure that the page to send has seen no other version submitted online (when the proxy's nuiltni submitter is sending a page, it will send the hidden version id of the page, or the fact that it is supposed to create a new page). The client can still check this status, and allow the user to manage the conflict, exactly like what the online server is already doing.
- With these two softwares working together, the server will see much less edits, users will be allowed to save pages immediately without having to preview them, because they will be stored noly locally, until the user looks at its pending edits to commit them online.
- The MediaWiki client would then work very much like a CVS/SVN client managing edits and commits through the MediaWiki protocol implemented in the proxy (also, an optional shell-integrated GUI interface similar to TortoiseCVS/TortoiseSVN would be possible, allowing users to create pages locally, using various edit tools within local files, without having to use dangerous Bots that are also difficult to manage. In fact, this interface would deprecate many existing Bots.)
- The central online MediaWiki servers will not necessarily have to be modified to support user-specific databases for pending changes, but this is theoretically possible as well, if the servers can support and store those user-sepcific edits. Behind my idea, the central online servers would not only have to work mostly with raw pages (without rendering them itself, saving MUCH work in the existing online proxies, and much bandwidth and CPU ressources for the HTL generation), but should not have to deal with the many intermediate edits made by users. Through the dedicated client, it would become unnecessary to preview pages before saving: the users will directly be seeing their dedicated MediaWiki client the effect of their edits when saving pages. When a user sees that all is OK, he can look at his local queue of pending changes (just like with CVS/SVN clients) and decide to commit them (or ignore tham completely and revert their local edits to the copy that has been cached by the local proxy): these will be commited indirectly but automatically through the local proxy
- the local proxy's update job will also check for possible edit conflicts : the local proxy will not loose the local edits even in case of edit conflicts, but it will change the status of the page in the local update queue, so that the local client can see which pages in the queue are causing conflicts to resolve by the local user of the local proxy.
- the local proxy can also be used to hide locally all the local accounts that are performing edits through it, and just send updates from them under another online account (it could be for example the same online account for all submissions made by the participants in a working team, such as a relecture team, or a team working on the preparation of a CD/DVD distribution): users can work on the proxy under their own local accounts, with a stronger security for authenticating them (this will improve the user privacy for users participating in a local team with strong security). When a local user connects to a Mediawiki using the dedicated client, in fact he is just connected to the local proxy, and not connected online directly. However, for the simple CD/DVD/download distribution format, the local user account and and online user account would be automatically configured to be the same, and the role of the proxy would be preconfigured, by default to submit updates automatically without supervizion (supervizion for teams would have to be configured specifically, notably in a configuration panel in the dedicated client specifying the URL of the local proxy, that has a full internet access and is managed by a supervizor user, the only one that can review the pending edits made by local users, and mark them as transmissible online).
I don't think that this will require a lot of work to implement this!
- In fact, to write the dedicated MediaWiki proxy, most of the software is already written (MediaWiki itself, in PHP; the PHP engine; the MySQL client; and Apache for serving the generated HTML pages locally and run the Mediawiki's PHP software). The only thing to write (in PHP, as an extension to MediaWiki itself) is the management of a submission job queue for local users, a local manager for local users, possibly a security manager for authenticating the authorized local users, and the management of multiple local databases: for the local cache of online pages, and for the local cache of pending updates made by local users, and the implementation of a resolver for conflicts between local edits and the central online database.
- For implementing the local MediaWiki client, most is already available as well: we have embeddable Gecko renderers for rendering the HTML generated by the local proxy through its local cache or local database. It will basically ressemble to an existing web browser, and it could also take the form of a simple standard extension for Firefox, or other browsers, if there's no need to more specific tools (however this extension must still be able to manage the local user's queue for pending edits, the list of pending commits waiting for approval by a local supervizor, and the list of failed commits due to the detection of edit conflicts with the online database: the proxy would have to download the central remote version and cache it locally to allowing local users to see which conflicts occured and how to resolve them).
verdy_p 18:36, 21 November 2008 (UTC)
OK, I still don't understand the technical aspects, but I checked with someone who is familiar with both WP1.0 and the technical side, and this is certainly of interest. One part I do understand is the idea of using tags for categorization - I presume that you're referring to extending the parameters in the WikiProject talk page tags, which seems a very good idea. Some categorization is already being done, e.g., to categorize things by country, but the "adult only" or "child-friendly" parameter sounds excellent. So now we need to decide where to go with this. Bear in mind that some of the active 1.0 people on en, including myself, are tied up with Version 0.7 right now, so you may not get much action there until next month.
- What should be done first? Can you propose a series of specific tasks, and the order in which they need to be done?
- There's nothing to do first. Most those ideas are to be worked on and implemented separately. These are separate tasks:
- The dedicated server/caching proxy (that features a standard web server and the standard PHP engine and the standard Media Wiki software and a local cache). Most of this is already written, with the exception of:
- the project supervizor status for users (in managed projects, there may be as many participant users as wanted by the team, but siome users would have some privileges to mark updated pages that are ready to be sent to the online project from the outgoing queue).
- the managed submission queues that should be handled per user or globally per locally managed wiki project: the queue will be there to allow supervizion either by local users, if no supervizor is needed in this team, i.e. all local users have supervizor status, or by local users with supervizor status). This status would allow submitting the reviewed and approved modifications, using specific tools. Such thing could be used on the main online Mediawiki project as well to mark stable pages that are ready for archiving.
- The dedicated client (that features a standard browser component with edit addons and connects to the local proxy). Most of this is already written (excepting the submission queue management, that could first be implemented only in the server pages).
- On the main server (in the MediaWiki software, that can also become part of the local server/proxy), implementing only the submission queue, and local user profiles (for managing local members of a revizion team, or participants to a project team thjat contributes directly either on its own name on Media Wiki projects, or mapping local members to online MediaWiki users (with their uniified login).
- Nothing really new is needed in terms of tags (in MediaWiki project pages), but possibly, some revision teams could use their own tags in pages, that could be queued and sent online like other regular modifications that have been reviewed or patrolled). Everyone can become his own supervizor. verdy_p 18:53, 19 January 2009 (UTC)
- The dedicated server/caching proxy (that features a standard web server and the standard PHP engine and the standard Media Wiki software and a local cache). Most of this is already written, with the exception of:
- There's nothing to do first. Most those ideas are to be worked on and implemented separately. These are separate tasks:
- Where should the work be done? Even if this is a meta project, much of the work would only get done in the individual 1.0 projects, such as en, fr and de. At en WP:1.0, we work by using sub-projects (listed on the main page), and I would recommend that you start the appropriate sub-project, and coordinate it there yourself; there are plenty of people there who could help, as long as they can be persuaded that it's worthwhile.
- The main aspect of these ideas is that most of the edit/supervizion work and even the renderign can be done locally by each user on their own PC. This will save MEdiaWiki servers lot of bandwidth caused by edits and rendering, because the main servers will commnicate only row pages with the user's or teams' local caching proxy. In addition, the local caching proxy has the possibility to fetch pages directly from their local cache (or locally installed database, coming from a CD/DVD for example), instead of the central database. This creates a hierarchy of databases: the central one containing the last version that has been patrolled or reviews by users or team supervizors. The local client will seek first a page in the local personnal database, then on the locally installed database, then on the central server. When fetching a row page from the central server,it will automatically update the locally installed database (if the central page is newer) and the personnal database. So the personnal database will effectively become a local cache for most retreived pages, but it will not propagate immediately edits made locally, unless the user (or team's supervizors managing the team's proxy) approves these edits to be forwarded then to the next lower level of server. The dedicated client could have the option to list the prefered database to retreive and should be allowed to autodetect pages that have several versions: the most local pages should be updated locally automatically if it has not been edited locally (no update remaining in the outgoing queue for this page) when retreiving pages, but if there's been edits, there wil be an extra header displyed at top of pages allowing users to select which version to view.
- This same system can be used as well on the central server (navigated online from a classicv browser without using any dediacted client) to select the page to see: the patrolled/stable one or the last edited page. Note that patrolled pages and last edited pages are already managed on the server by the stored history. However in the complete system, this concept extends the concept of history into a hierarchy, with local branches created and stored only by the local proxy (but not necessarily on the server itself, so there's no imemdiate need to create the support for branches in the history or versions, unless MEdiaWiki is extended to allow all privately edits made by users to stay in user-specific branches until they validate them using their patrolling/reviewing tools; but my opinion is that those local branches should not go to the server but should be private to each user or team managing its outgoing edits queue themselves, on their own PC/storage, or local proxy server with supervizion).
- For schools, this would be really great: students can connnect locally ion the school's installed proxy, but have no direct access to the online project (the proxy will manage all communications with the server).
- For organizations, the proxy would be an efficient filter to allow prior approval of the submitted content (so, there would be no more leakage of private company information, the data forwarded by the proxy being assuemd to have been approved by the local supervizor managing the company's proxy).
- For users that are making lots of edits (or sending data massively generated) with bots ro scripts, all edits will first go through the local proxy (adding extra security for the online project): complex modifications can be tested and the complete set of related edits can be sent more coherently without breaking various parts for a long period. There will be much less edit conflicts for these large projects, and more coherence (and less errors produced by massive imports due to unrelated edit conflicts made by other users). The proxy will autodetect the queued edits that could not be submitted but will still contain its local copy of the edit that was to be submitted, separately fro mthe version used on the central server.
- For producing CD/DVDs: nothing is needed on the server; instead, a team is formed that will manage its own local proxy/cache for storing and preparing the CD/DVD. This copy can be tweaked, and after reviewed can be submitted back to the central server. This means that it becomes possible to prepare CD/DVD much more easily, from the content of the locally edited cache instead of the live central database. In other words, a CD/DVD version (or downloadable version) is just a local branch separate from the central history. This becomes easier to manage (and in addition, not all work made locally is lost for the next version, because you can still patrol the local edits made for the current CD/DVD/download and queue out them for submission back to the central "live" database). The total volume of edits stored in branches would then be limited to jsut what is immediately needed to ensure the coherency and self-containment of the prepared version for the CD/DVD/download.
- Local proxies need not be necessarily online: teams may choose themselves to create their local proxy either on open public websites or on their own private servers. The local proxy can also optionnaly filter the live content (to not store and cache locally the online live versions, but just the pages cached locally).
- With such thing, most of the content of live MediaWiki projects would not be completely blocked by dictatures: they could deploy nationnally their own local caching proxy, to patroll the pages they don't want to see without having to ban completely the Wikimedia servers from the web (blocking all contents including those that would be a lot beneficial for them and their citizens and schools). This means that China could reopen the link (indirectly) to Mediawiki servers on the web. For countries that have moer restrictive rules for copyright, the same thing can be deployed to block pages that are considered illegal in a country. No more need, on the central "live" server to filter out all contents just because there's a legal problem in a specific country. The central "lives" servers will be much more permissive (and this means also less legal risks for Wikimedia: no more legal action or "cease and desist" actions, except possibly in US only where the servers are located and open to anyone; but even in US, there could be a local proxy separate from the central "live" content made for international users: so the content that is legal in Europe and illegal in US could still be stored centrally even if they are filtered out in US on the US proxy).
- verdy_p 18:53, 19 January 2009 (UTC)
- Can we move this proposal and discussion to the discussion page? Thanks, Walkerma 05:10, 2 December 2008 (UTC)
- Which discussion page? Isn't it this page ? My concepts are in fact very simple in their root. The discussion is long because these two basic concepts (a caching proxy and a dedicated client) can have very significant interests, including for the WMF. My proposals offer interesting economies for the long term, and allow more collaboration with other third parties without breaking the rule about the free content with free licences promoted by the WMF: it allows building an external hierarchy of servers/clients supporting most of the work currently only supported (at expensive costs) by the WMF (notably, if the WMF servers could be supporting most of its bandwidth charges only for delevering the row pages, and for not being needed for ALL edits made on the central database, we could get a much higher throughput, faster esponses, and faster consolidation in addition with increased cooperation between users, teams, and free content providers). It also allows long-term preservation of the project (by avoiding legal international risks linked to the very complex copyright issues): instead of tryign to solve everything on a single site, most things can be delegated. verdy_p 18:53, 19 January 2009 (UTC)
- Have you made any progress? Please keep us updated on what you've achieved, over at en:Wikipedia_talk:Version_1.0_Editorial_Team. If you need to recruit help, that is also a good place to set up a subproject, something like en:Wikipedia:Version_1.0_Editorial_Team/ClientApplication. Please let us know how you're getting on. Thanks, Walkerma 07:57, 9 March 2009 (UTC)
- mw:Manual:Using content from Wikipedia
- de:Wikipedia:Wikipedia-CD (old)
- fr:Projet:Wikipédia Junior
- Wikimedia and Mandriva
- en:Wikipedia talk:Version 1.0 Editorial Team
- Static version tools an effort to collect tools to assemble a CD
- en:Wikipedia talk:Pushing to 1.0
- Making offline copy from Ultimate Wiktionary and dicologos