NEH Reference materials grant application/Narrative
- 1 Introduction
- 2 Significance
- 3 History, Scope, and Duration
- 4 Methodology and Standards
- 5 Work Plan
- 6 Staff
- 7 Dissemination
On January 15, 2001, Wikipedia (www.wikipedia.org) was started by Jimmy Wales and Larry Sanger as a scratchpad for Nupedia, an online encyclopedia founded by Wales in March 2000. Articles submitted to Nupedia had to be peer reviewed by a cadre of experts, and authors had to be "true experts in their fields [...] and possess Ph.D.s." Wikipedia, it was hoped, would allow anyone to start working on articles which could then be peer reviewed and published on Nupedia.
Birth of the Wikimedia model
In the end, only a few dozen finished articles were created through the complex Nupedia process, while Wikipedia quickly acquired a community of enthusiasts and passed 1,000 articles after its first month in existence. It was clear that the Wikipedia project would become the dominant one, and Nupedia was soon abandoned.
In response to strong user demand, Wikipedias were soon set up in languages other than English. Today, three-and-a-half years later, Wikipedia is at the center of an international community of learners devoted to the creation of a high quality online encyclopedia and other reference materials that people anywhere can access for free via the Internet. Wikipedia versions exist in over 50 editions with more than 800,000 articles in total - ranging from very large ones, such as German and Japanese, to emerging ones, such as Cherokee and Cornish.
The project's success is evident from its widespread use, its international community of users, and the volume and quality of articles that has emerged using the Wikimedia model. In the English-language Wikipedia alone, there are over 300,000 articles covering virtually every subject imaginable, from art history to software engineering, from theology to astrophysics.
An article often starts as a stub, only a few words long, and is developed into an in-depth treatment over time. Every week, the English Wikipedia community picks an article which is particularly lacking in depth and focuses on improving it. Within the week, many articles have grown to twenty times their initial size through this effort, and some are now among the finest articles in the encyclopedia.
The Wikimedia Foundation
The Wikimedia Foundation, Inc., a non-profit organization registered in the state of Florida, was founded on June 20, 2003 to facilitate and coordinate operations of Wikipedia and its sister projects. These other projects include Wiktionary (a dictionary), Wikibooks (a collection of free and community-written textbooks) and Wikisource (a collection of annotated and translated public-domain source materials).
The aim of the Wikimedia Foundation is to spread knowledge by creating free, high quality learning resources, and to build a community devoted to this purpose. The Foundation believes that anyone can be a teacher by sharing their knowledge with others, and as such, encourages everyone to "edit" Wikimedia pages. Regardless of the scope of their participation, each author enhances the quality of the material for the next generation of users.
The community has grown at an exponential rate since the founding of the first project, the English-language Wikipedia. Some 30,000 people worldwide have contributed to this effort in any number of ways, and in so doing, created a tightly knit community that transcends national, cultural, ideological, political, and religious boundaries. These divisions, too often over-emphasized in daily life, are superseded by the sense of commitment to a common goal: making knowledge free and accessible to everyone.
Recent growth surge
To date, participants have been able to keep up with the accelerating growth of the various Wikipedias and ancillary projects through extensive parallel volunteer efforts and sporadic fundraising drives. Until recently, the project's only real bottleneck has been the rate of new contributions. However, hardware and maintenance costs - and now the initial costs of infrastructure for stable growth and for cooperating with other likeminded ventures - are a pressing concern, and will require reliable financial support. While more and more people have shown themselves willing to donate time and knowledge to support these projects, Wikimedia needs additional servers, off-line organizational components, and one or two paid staff to handle responsibilities which are not easily distributed among the pool of volunteers.
Current projections suggest it will cost over $500,000 for Wikimedia's technical and other operations to smoothly keep pace with its increased use over the next two years, and to develop the infrastructure to effectively sustain these efforts in the future. We expect to be able to raise over $200,000 in community donations over the same time period, and are requesting $300,000 to help take Wikipedia and the other Wikimedia Foundation projects to the next level of development.
Wikipedia and the other Wikimedia projects are unusual as resource works in that they are being built by a community of users, representing a spectrum of age, expertise, and shade of opinion, without any real editorial hierarchy. The projects are all reference works, built by the very people who would otherwise refer to them -- often by recording the results of their research on a topic of interest, or in response to a request by community members -- and therefore offer a reflection of the needs of the learning community.
Wikipedia is the largest and oldest of the projects being developed by the Wikimedia Foundation. It is an online encyclopedia, built by users around the world. The English version already has 300,000 articles, while other languages combined have some 500,000. Each article is created and edited by the users, so that new articles are constantly appearing and older articles are constantly being improved; the average English article has been edited over 10 times. There is, however, little central control of how this process takes place. Growth is natural, reflecting the interests and ideas of any number of users on any day.
That is not to say that the structure is entirely random and chaotic. Experience has shown the opposite to be true. In just a brief time, users have organized themselves into "Wikiprojects," establishing guidelines for contributors and ensuring that similar articles follow similar formats. These guidelines in turn inform the overall Wikipedia style guide. Examples of this include standardized naming formats adopted for monarchs, which were hammered out after considerable negotiations among the community, and biology taxoboxes, which are being inserted for all articles about plants and animals. Nevertheless, there is a certain fluidity within the model, so that if a significant group of users decides that some other format would be more helpful to them, it will be discussed by the community and changes implemented accordingly. In this way, the reference material consistently reflects the needs of people who regularly refer to it.
While the Wikipedia article is the final product that end-users see, they can also investigate each article's "History" and its often extensive "Talk page," to see how the article developed, and how compromise positions acceptable to all of the participants were hammered out. In a metaphoric sense, they are able to go through all the "out-takes" that led to the final product, and to find out why a particular detail was included or omitted. Finally, readers can add to the conversation about an article to ask for clarification or more information -- and know that most contributors to the article will see these requests show up on their watchlists.
Similar to Wikipedia in objectives is the Wiktionary project to create comprehensive online dictionaries in each of the Wikipedia languages. These dictionaries aim to provide definitions, etymologies, examples of usage, historical usage, and translations into other languages. Each dictionary will eventually include defintions of words in other languages as well. As an online resource with limitless virtual space, Wiktionary could become far more comprehensive than a standard paper multilingual dictionary. Wiktionary projects currently appear in seven languages, and the frameworks for additional languages are already in place.
The existing Wiktionary format can be readily expanded to include other useful components such as a rhyming dictionary and a thesaurus. A related Wikiquote project, containing notable quotations organized by author and theme, is already underway.
Wikibooks, another project of the Wikimedia Foundation, has as its goal the compilation of information collected by Wikipedia and its sister projects and the transformation of this information into textbook-like formats that are accessible online. These textbooks, created by the same collaborative process as Wikipedia and Wiktionary, are planned on a broad range of topics from the Sciences, the Humanities, and Languages; and on a number of different levels. This will make it possible for visitors to Wikimedia projects to not only to look up specific information but also to learn particular subjects in an orderly fashion.
Some of these books will also refer readers to Wikisource, a collection of primary sources referred to by the various Wikiprojects. In the future, these and other sources will be translated into various languages so that such essential documents as the U.S. Constitution or the U.N. Bill of Rights will be available in full and in their native language to anyone visiting the Wikimedia Foundation website.
It is hoped that in the future, Wikibooks will form the basis of a Wikiversity project devoted to online learning, where people can employ the latest technologies associated with distance learning to take free courses on many different subjects.
Akin to Wikibooks are Wikireaders, two of which (Internet and Sweden) have already been launched in German. Wikireaders are small collections of related peer-reviewed Wikipedia articles on a particular theme, marketed by the Wikimedia Foundation and serving as a potential source of future revenue. Printed and sold via an online store, Wikireaders can provide a handy reference text on any number of subjects, and plans are underway to create similar readers in English on a variety of subjects.
"Neutral Point of View"
Wikipedia has contributors with a wide range of cultural backgrounds, religious beliefs, and political points of view. It is inevitable that such controversial topics as abortion and the teaching of evolution would spark considerable controversy on Wikipedia. We do not avoid this controversy; rather, we channel the differing views of contributors into a discussion of the various sides of an issue. The policy that defines this focus is the "Neutral Point of View", or NPOV policy.
Wikipedia's NPOV policy, simply stated, is that articles should be written "without bias, representing all views fairly" (). A common rule of thumb is that any reasonable person, regardless of their private opinions, should be able to agree that a Wikipedia article is accurate.
While this policy is intended as a matter of good academic practice, it is absolutely essential to the functioning of the Wikipedia community. Wikipedia unites people of vastly differing views who are willing to work with each other because they share the goal of providing a non-biased resource that is useful for everyone.
Rather than reject ideological conflict between users, we embrace it, so long as it is conducted with civility and mutual respect. The Wikipedia attitude is that such debate in Talk pages actually enriches the article. Open dialogue ensures that one position does not overshadow the other, and that the article reflects the views of everyone, without abandoning its intellectual neutrality.
This makes Wikipedia an invaluable tool for people researching contemporary American and international issues. All major aspects of controversial issues are presented, all arguments (or at least those seriously proposed by Wikipedia contributors) can be found. This is the result of contributors representing the intersection of a common interest (the improvement and enhancement of the project) and their partisan interests, whatever they may be. Fifty years from now, even the history and discussion evoked by an occasional edit war will be a valuable resource to researchers attempting to understand the personal positions that were at stake. As Wikipedia grows, the ability to trace those issues and their subsequent development will remain invaluable.
History, Scope, and Duration
History of Wikipedia
The idea of collecting all of the world's knowledge under a single roof has been a popular one since as far back as the ancient Library of Alexandria and Pergamon. The modern notion of a general purpose, widely distributed, printed encyclopedia dates from shortly before Denis Diderot and the 18th century encyclopedists.
The idea of using automated machinery beyond the printing press to build a more useful encyclopedia can be traced to H. G. Wells' short story World Brain (1937) and Vannevar Bush's future vision of the microfilm-based Memex in As We May Think (1945). Another important milestone along this path was Ted Nelson's Project Xanadu.
At the same time, the open source and free software movement founded by Richard Stallman had led to the concept of collaborative creation of works under copyright licences that ensured the equitable sharing of the end product. The most notable example of this is the Linux operating system, built from Stallman's GNU project and Linus Torvalds' operating system kernel.
Stallman had also proposed a GNU Network Encyclopedia, which would apply the same principle to the creation of an encyclopedia. However, nothing became of this proposal.
Wikipedia was established in January 2001 by Jimmy Wales and Larry Sanger. It was inspired by previous free encyclopedia projects such as the GNU Network Encyclopedia and the Nupedia project. None of these projects had succeeded in producing any substantial quantity of work, although the Nupedia project had produced a few high-quality articles over the course of the previous year. In an attempt to revitalize Nupedia, they decided to harness the new concept of collaborative Wiki software to the writing of an encyclopedia. This was possible because the Wiki software was already available under an open-source licence, and could be quickly adapted to Wikipedia's needs.
Initially Wikipedia was intended only as a scratchpad for drafting new articles for Nupedia's formal peer review process; however, Wikipedia was an immediate success, and many articles were rapidly created within it. Larry Sanger was recruited as an editor and mentor for the nascent encyclopedia and its community, and provided pump-priming text from his philosophy course notes.
Soon, Wikipedia had reached critical mass, and article numbers were growing exponentially. At the same time, it was noticed that article quality was improving, rather than declining.
Wikipedia's growth was helped along by regular attention from the active Slashdot community, and by the body of active contributors to preexisting, community-oriented wikis who were eager to try out the medium for a new kind of project.
- more narrative needed here...
Wikipedia's goals are ambitious: it aims
- to be an encyclopedia, in the normal sense of a collection of all human knowledge
- to be freely editable by anyone (except for banned users, and excluding protected pages)
- to be open content, using the copyleft GNU Free Documentation License.
- to do all of the above in all known human languages
Since there is little in the way of space limitation on Wikipedia, it also aims to subsume the functions of many specialist encyclopedias. Unlike a paper encyclopedia, Wikipedia can encompass articles for both elementary and advanced treatments of the same subject.
In addition to traditional encyclopedic topics, Wikipedia is able to react quickly to current events and to provide information almost as soon as it happens. It is arguably more accurate and unbiased than some regular media sources due to its POV standards. This is invaluable for educational uses, where teachers and students need to be up to date. As an example, the UK's Government's Butler Review on the conduct of the British security services on the Iraq War was embargoed until 12.30pm GMT on 14 July 2004. Links to the text of the full report were posted on the Wikipedia just over an hour later.
Wikipedia carries topics not found so comprehensively elsewhere on the Internet, and enables expert writers to share their knowledge with a modicum of effort. Indeed Wikipedia allows specialist scholarly material a far wider dissemination than any print media can achieve, and reach a much larger potential audience.
All modern print encyclopedias have to make space for new topics by jettisoning old ones. This is particularly true for science and technology topics. Wikipedia provides a vehicle for a more balanced record of such topics to be maintained, thus materially contributing to history resources.
Wikipedia is particularly rich in topics relating to IT, computing and computers, and the Internet. It is also strong in Media topics, such as cinema, television and music. Though the content of Wikipedia suggests something about its contributors interests and hobbies. As more focused initiatives to attract editors are launched, and as Internet access becomes more of a commodity around the globe, the scope of Wikipedia's content will continue to expand.
- more narrative needed here...See comments in discussion
Methodology and Standards
To be written
Preparation and processing of material
Wikipedia is accessed through a World Wide Web interface. All articles can be created and edited using any Web browser, without any additional software. This is accomplished through the use of wikitext, an intuitive and easy-to-learn markup system, to edit each article as plain text.
Image files to illustrate articles may be uploaded using the standard file upload function within Web browsers.
Within specialist articles certain kinds of specialist markup may be used. The most prominent is the use of embedded TeX markup, which allows the use the TeX mathematical typesetting system to create graphics for mathematical notation that cannot be represented in all web browsers.
- more needed here
When an article change is saved, the Wikipedia servers then render it as XHTML, including producing any images that may be needed, and serve this to the web browser, allowing the full typeset version of the article to be viewed.
Wikipedia makes extensive use of XHTML and Cascading Style Sheets to try to separate representation from presentation. This allows the maximum possible customizability and reuse of article material, as well as accessibility for devices such as readers for the blind. All of this is designed to be backwards-compatible as far as possible with early versions of Web browsers.
Open peer review and the removal of errors
Apart from the Neutral Point of View policy, another distinguishing characteristic of Wikipedia is open peer review. Every page is open to editing by any person, and those edits are visible to everyone. Any user can correct an error, enlarge an article, amplify a point or copyedit at any time. It is true that this open access permits the insertion of vandalism or bad information; but the open nature of the editing process, the complete audit trail kept of all pages, and auditing features built into the MediaWiki software, allow for the rapid removal of any bad edits. For instance, a "recent changes" feature allows readers to review new edits as they occur, and a "watchlist" feature allows readers to track changes to their favorite articles. In effect, the Wikipedia readership functions not only as a pool of editors, but as a review board.
Where there are differences of opinion that need to be resolved between editors, a system of talk pages allows changes to pages to be discussed (and mediated, if need be). Finally, persistently abusive users can be blocked from editing, although Wikipedia policy is to do this only as a last resort.
Organization of and access to material
The articles are organized as a series of single web pages, linked together by hyperlinks. Unlike with the rest of the Web, Wiki hyperlink syntax is very simple, and the address of the linked page is simply its title. (These are then resolved by the software into normal HTML hypertext links at render time.)
When an article grows too long to be comfortably read as a single web page, it will be split up into sub-articles. In this way, what would have been a single lengthy article in a paper encyclopedia will typically correspond to perhaps ten or so Wikipedia pages, with one master page acting as a summary page and table of contents for the topic.
However, the non-hierarchical nature of the hyperlinks within Wikipedia actually allow much more sophisticated linking patterns.
One common pattern is the linking of words within articles without knowing whether they yet refer to an article. In many cases, these links will not correspond to any article, leaving a "red link" which is shown to warn users that this link does not as yet correspond to any article. As time goes by, these unresolved links provide a stimulus for editors to write new articles covering their topics.
Alternatively, this "accidental" linking will lead to a link to a pre-existing article, providing readers and editors with links to further material which may be related to the topic of the article.
A set of naming conventions exist to attempt to make it as likely as possible that a new article will match pre-existing hyperlinks, or vice versa.
Storage, maintenance, and protection of data
All of the Wikipedia data is stored in a MySQL relational database. Multi-gigabyte backups of the database are taken at regular intervals to staging machines, and to guarantee protection against a natural disaster, like the burning of Library of Alexandria, a number of people worldwide periodically download and save these backups over the Internet.
Every version of every article is saved to the database, so each article has a complete audit trail of every edit. This is the principal measure against casual vandalism of Wikipedia; it takes longer to vandalize an article than it does to revert it back to a known good previous version.
As well as an audit trail for every article, there is a publicly visible record of "recent changes". This is monitored by a large number of users, who usually rapidly pick up on suspicious patterns of behavior and intervene to prevent vandalism.
Articles are written and stored in simple text format with additional formatting codings that is readable by humans and can be translated automatically into different formats like HTML or PDF. Metadata is provided in Dublin Core using RDF technologies. Efforts are being done to standardize the article syntax making it even easier for other projects to reuse Wikimedia content.
- to be written
Wikipedia has been edited by thousands of people (referred to as Wikipedians). There is no editor-in-chief, as such. The two people who founded Wikipedia are Jimmy Wales (former CEO of the small Internet company Bomis, Inc.) and Larry Sanger. For the first thirteen months, Sanger was paid by Bomis to work on the project. Sanger was said to have taken a role of mediator at times, making decisions on issues that aroused contention. This was based not on formal authority, but on demands from users at large. Funding ran out for his position, leading to his resignation in February of 2002. Other current and past Bomis employees who have done some work on the encyclopedia include Tim Shell, one of the co-founders of Bomis, and its current CEO, and programmers Jason Richey and Toan Vo.
Wikimedia expects to continue experiencing exponential growth in the near future. As the number of articles, editors, and projects continue to grow, a paid staff to coordinate, evaluate, and develop content in areas currently under represented in the projects will become necessary.
Needed staff members include:
A Projects Coordinator will oversee the work of the volunteer editors at each of the Wikimedia projects. The project coordinator will determine what subjects are ready to be published as printed topical encyclopedias or wikibooks, and will then be responsible to ensure the quality of the published work. They will also recommend areas for future growth in the various projects, and will work with the public relations coordinator to expand the volunteer community.
A Public Relations Coordinator will ensure that the public as a whole is aware of Wikipedia and the other Wikimedia projects. This person will also act as a press contact for the Foundation. The public relations coordinator, with consultation from the projects coordinator, will recruit volunteer editors from academic disciplines under represented in the project. The areas most in need of further editors at this time are the arts and social sciences.
Full time developers will be hired to work on the MediaWiki software that is used to receive and display contributions to the Wikimedia projects. Additional "programming bounties" may be offered to volunteer developers in order to complete particular tasks.
A full time systems administrator is also needed to maintain the Wikimedia servers, perform backups and coordinate necessary hardware purchases.
A Chief Financial Officer will be needed to coordinate the financial affairs of the foundation. Working with the CFO will be a Development Coordinator, who will coordinate all fund raising activities of the Foundation.
- to be further expanded
Wikipedia is distributed both online, and in print. Up until now, the print versions include only the German language WikiReaders. Wikimedia aims to expand this to cover a variety of topics in English, and other languages. Plans for printing a less selective number of articles in a print version are also underway. On and off line distribution of selected topics, and of the complete encyclopedia, to schools, both within America, and in developing countries, is expected to occur during the period of this grant.
- to be further expanded
Although Wikipedia is not maintained by professional librarians, a great deal of work has gone into trying to maintain consistency throughout the Wikipedia. In particular:
- multiple meanings of the same term are dealt with by so-called disambiguation pages
- multiple terms related to the same article are dealt with by redirect pages
A number of software tools are available to aid editors in maintaining this consistency, by checking and flagging possible errors for human review. This is also used to find possible broken or bad content, such as very short articles.
A category system has recently been added to the software, and categories are rapidly being added to existing articles. The category system allows multiple systems of categories, and will allow both the use of informal categories and standardized category systems such as the Dewey Decimal System and the Library of Congress classification system. Categories can themselves be categorized, allowing the creation of category trees. It is intended that automatic tools will be written to mine information from article categories.
A great many manual indices within Wikipedia have already been compiled in the forms of lists. Examples include lists of years, lists of people by profession, lists of inventions and so on. These are carefully maintained in simple formats. Many of these contain information that is available to be added to the category system.
Wherever possible, Wikipedia has used pre-existing standards. For example, it uses ISO language codes to describe the various languages supported, and keeps its timestamps using the UTC date/time scheme. Date and time formats within articles are automatically parsed, so that dates and times can be presented in any needed format.
Character sets and encodings
All Wikipedias support the Unicode character set, either directly through the use of the UTF-8 character encoding or by the use of HTML entities. The few remaining non-UTF-8 Wikipedias use ISO 8859-1, and are in the process of being converted one by one to UTF-8 encoding.