WikiScholar

From Meta, a Wikimedia project coordination wiki
Jump to: navigation, search
This page is a proposal for a new Wikimedia Foundation Sister Project.
Status Closed (could be re-opened under new policy).
Reason Inactive proposal. --Sannita (talk) 18:41, 17 September 2013 (UTC)
Prompt Response
What is the proposed name for the project? WikiScholar
Project description
What is the project purpose? What will be its scope? How would it benefit to be part of Wikimedia?
a new Wikimedia Foundation project dedicated to the creation of a universal bibliography
How many wikis?
Will there be many language versions or just on one multilingual wiki?
How many languages?
Is the project going to be in one language or in many?


Technical requirements
If the project requires any new features that the MediaWiki software currently doesn't have, please describe in detail. Are additional MediaWiki extensions needed for the project?
Development wiki
Interested Participants:
Important notice: This is a draft proposal that is actively being edited.
A more general proposal will be developed at the original Wikicite page

A free and universal bibliography for the world

Project proposal for the Wikimedia Foundation


"Imagine a world in which every single person on the planet is given free access to the sum of all human knowledge. That's what we're doing." -- Jimmy Wales

Wikipedia's mission is to synthesize every notable topic into an encyclopedic compendium of knowledge, free to the world. Wikipedia is not a source of original research and relies on the authority and quality of information found in books, scholarly articles, webpages, movies, maps and more. Eventually, every statement in Wikipedia will be able to be traced back to the sources from which it originated. Likewise, every notable topic in every source will be explained in Wikipedia. In order to know that a topic is truly notable it is essential that the source can be confirmed as authoritative. Likewise, in order to document every notable topic in every source it is essential to have a collection of all of those sources readily available. Clearly stated, free access to the sum of all human knowledge involves not just access to the metadata of every source from which this knowledge does or could originate, but meta information about those sources that guides one's interpretation of it and helps the community determine what information should ultimately be included in Wikipedia.

The solution to these challenges is a new Wikimedia Foundation project dedicated to the creation of a universal bibliography. This bibliography contains one article for every source which contains ideas that could eventually be documented in Wikipedia. These articles consist of the metadata for a given source and additional information created by the community. This includes summarizations, fair use content, recreations of materials, ratings along dimensions such as quality, reliability, notability and trustworthiness, in addition to other information. This bibliography also contains non-bibliographic entries that document information in the metadata such as authors and institutions, in addition to articles that consist of summarizations of collections of literature, and any other kinds of articles which the community votes into the policy pages of this project.

Motivation[edit]

A fact is only as reliable as the ability to source that fact, and the ability to weigh carefully that source. Wikipedia's community, in an effort to expand its useful sphere of users, increase its reliability, usability and credibility has held several related discussions on improving the scholarly apparatus of Wikipedia. The need to cite sources is now in the community standard's list, the desire to upgrade the citation of articles is the subject of the Fact and Reference Project, and the Encyclopediac Standards project has discussed automatic, or at least software assisted citations. There has also been a coding effort to support footnoting.

The reason for having such a system goes beyond the need to cite: there must also be the ability to annotate, and provide, at least, some summary for the user that does not have access to the book. This is particularly important if a work is obscure, hard to obtain, or in a different language from the reader's base language. This means that having the card "body" being a wiki space for editing is essential to tying together the functions of citation.

Moreover, this tool would have use in itself beyond Wikipedia, or even Wikipedia and wiktionary. It would allow, for example, the ability to search through the webs of paper citation, it would allow scholars to cite works against a public database. It would allow googling to find credible links within the citation web of the paper universe.

By this means Wikipedia could not only match the ability of external sources to have bibliographic tools, such as are commercially available, but leapfrog them by making the information live, and linked. Books annotation could have information which is critical of them, or extends or expands them.

Just as Amazon has user reviews and credibility rankings, so too should Wikipedia have ratings on the utility of a particular work for the purposes of judging the credibility of the citation.

Implementation[edit]

Van houtte octopus.jpg

A database of all the world's citations presents unique technological challenges to the Wikimedia Foundation's existing software and infrastructure. The total number of articles in this project will quickly surpass the combined number of articles across all of the other projects. Additionally, each article will have a template defining the metadata for that source containing between a few and 50 metadata fields. This information must be queryable so that it may support two functions. The first is the ability of this information to be cited by the other projects. The second is the ability of users to write boolean queries that generate arbitrary output.

While there are known issues with scaling Semantic MediaWiki to the scope of this project it was designed with many of this projects needs in mind. Any piece of software that satisfies the technical constraints of this project will necessarily have many of the features of Semantic MediaWiki, and so an example that uses it will be instructive.

Let us consider an example entry in the bibliography. The key system for this bibliography will be discussed at length in a later section, but let us assume we are using the simple key of Author1Author2Author3EtAl20101911. This key algorithm has ambiguity problems, but it suffices for an example. In this key we list the first three authors of the article, followed by an EtAl if there are more than three, followed by the date of publication. Let us also assume that the template used to define citations is the {{Citation}} template.

Our example article will be WatsonCrick1953, the publication on the discovery of DNA. The citation template is:

{{Citation
|title=The structure of DNA
|authors=Watson, J. D. and Crick, F. H. C.
|date=1953
|journal=Cold Spring Harbor symposia on quantitative biology
|doi=10.1101/SQB.1953.018.01.020
|volume=18
|pages=123-131
|keywords=DNA
|abstract=It would be superfluous at a Symposium on Viruses to
introduce a paper on the structure of DNA with a discussion on its
importance to the problem of virus reproduction. Instead we
shall not only assume that DNA is important, but in addition that
it is the carrier of the genetic specificty of the virus (for
argument, see Hershey, this volume) and thus must possess in some
sense the capacity for exact self-duplication. In this paper we
shall describe a structure for DNA which suggests a mechanism for
its self-duplication and allows us to propose, for the first
time, a detailed hypothesis on the atomic level for the
self-reproduction of genetic material.
}}

There are two predominant use cases for this data. The first is inline citations in Wikipedia articles, and the second is outputting this article along with other articles as the result of boolean queries. Both of these use cases have an underlying similarity - we will write some kind of a query that requests the data be formatted in a particular way and placed inline with the text at the location of the query.

A familiar example query that uses a {{cite}} template might look like this: {{cite|WatsonCrick1953|APA}}. The analogous query using Semantic MediaWiki would look like:

{{#show:[[WatsonCrick1953]]
|format=template
|template=APA
}}

Similar to the <ref> template, in some use cases we will want a small [1] to appear at the location of the citation, and the APA formatting to appear at the end of the article, like so:

Watson, J. D. & Crick, F. H. C. The structure of DNA.
   Cold Spring Harbor symposia on quantitative biology, 18, 123-131.
   doi:10.1101/SQB.1953.018.01.020

This particular use case can, for the most part, be handled by existing technologies. But consider an alternate use case - an article that lists all publications that have emerged from the Cold Spring Harbor symposia on quantitative biology in 1953, and that gets dynamically updated when a new article that fits the requirements is added to the bibliography. A query that uses Semantic MediaWiki to output such a list would look like this:

{{#ask:[[journal::Cold Spring Harbor symposia on quantitative biology]][[date::1953]]
|?title
|?authors
|?date
|format=template
|template=MyCustomQueryOutput
}}

For each result of the query Semantic MediaWiki then proceeds to pass the title, author and year as template parameters to the MyCustomQueryOutput template, like so:

{{MyCustomQueryOutput
|The structure of DNA
|Watson, J. D. and Crick, F. H. C.
|1953
}}

This system gives the community unlimited flexibility in reformatting the world's bibliography into subcollections that are defined by arbitrary boolean queries. It also makes the creation of custom user bibliographies rather simpler. Additionally, it resolves issues regarding exactly how articles ought to be uniquely specified when cited in Wikipedia. For instance, one might like to use {{cite|WatsonCrick1953}}, or one might like to use {{cite|10.1101/SQB.1953.018.01.020}}. Semantic MediaWiki allows any possible set of template fields that uniquely define an article to be used.

Need for Live Data[edit]

These projects need not only to be joined together but to be joined together in a live manner, which allows for the creation of bibliographic apparatus. The Library of Congress is working on such a project for its purposes, it is the purpose of this project to create an open wiki system which will allow:

  1. Software assisted citation. To make it easier for editors to cite, and to make citations comprehensive to include a link to an author article, the book's card and the date as a wikilink.
  2. Card catalogs which will allow users to annotate the work, and to link to other works, which could include later editions, bibliography and textual apparatus. To make the card catalog live data, rather than dead data.
  3. Support a footnote system in wikimedia. To improve the ability to assess credibility and standards compliance of articles and their information.


Additional Functionality[edit]

  1. Add journal articles, at least for the major journals, this is particularly important in the case of many fields in the sciences where the paper, rather than the book, is the basic means of information distribution.
  2. Edition linking, so that editions of the same book could be compared.
  3. Bibliography project, to add the bibliographies of books themselves, so that searches can go down, and not just up, the chain.


Referencing in an Article[edit]

  • [[cite:type:identifier]] Link to card in database
  • {{cite:type:identifier}} Expands to bibliographic reference

type would be defined by look ups to the database, and would include at least ISBN.

Additional fields:

  1. edition

Expansion[edit]

The macro would expand to the following fields if present:

  1. Author Link to Wikipedia article in local namespace.
  2. Author Card Link to wikicite author card, listing all works by that author
  3. Editor Link to Wikipedia article in local namespace
  4. Editor Card Link to wikicite author card, listing all works by that editor
  5. Title Link to wikpedia book article in local namespace. All book entries should have a link at the top to their wikicite added.
  6. Publisher Link to wikicite publisher card.
  7. Date Link to Wikipedia article in local namespace
  8. ISBN Link to wikicite book card. Or primary key if no ISBN
  9. Source Link to wikisource, if any, for cited material, or external link if no wikisource.

Other feature requests[edit]

BibTeX[edit]

Entries in the WikiScholar database should be "compatible" with BibTeX entries. BibTeX is the citation format of choice for math, physics, computer science and some other areas. This means that a BibTeX entry should be structurally a subset of the WikiScholar entry, so that some trivial text processing with perl or gawk can extract BibTex from the WikiScholar entry. Possibly a BibTeX entry by itself could be a minimal WikiScholar entry. This is a design issue.

WikiScholar would have numerous advantages over a raw BibTeX entry. One could it to direct readers to specific sections of a book, warnings about pitfalls in other sections, summaries, of what you "really need to know" to understand a paragraph, POV's and so on. This is impossible in BibTeX.

Namespace[edit]

WikiScholar entries should be in their own namespace equally accessible to all wikiprojects.

ISBN support[edit]

One level of wikiScholar implementation would be to make available ISBN numbered books, or some subset.An ISBN is a 10-digit number that identifies a book, it is used for commercial and citation purposes. It has 9 significant digits and a check digit. The check digit is base-11: that is 0 through 10, represented by the digits 0 to 9 and an "X" for 10.

The ISBN code is broken into 4 parts. The standard says hyphens are to be placed between these four parts, but often they are not.

  • Group/Country id
  • Publisher id
  • Title id - by format
  • Check digit

One can purchase commercial data for ISBN databases from the Library of Congress and other sources. However, Wikimedia would probably begin by taking MARC Records downloaded from Z39.40 sources.

A MARC record is a card catalog format, and the Z39.50 protocol is used to for records in this format. This means an initial test implementation could be "sparse" by creating records which are linking to the ISBN page as citations. See also: Wikipedia:List of ISBN ranges.

Project history[edit]

Related projects have been proposed at various times by Stirling Newberry, Alterego, Sj, and Kneiphof, among others.

Alternate names

  • Wikibibliography, Wikigraphy (http://www.wikibibliography.org once claimed for this)
  • Wikicite (currently used on Meta to indicate a different project focused on citing facts)
  • OpenScholar

Related projects[edit]

from German Wikipedia

from English Wikipedia

Proposed on Meta

Mailing list links[edit]

2009

  • there were a couple discussions in the fall.

2010

Demos[edit]

see other projects above


People interested[edit]

see also the English-Wikipedia project

  1. Stirling Newberry
  2. +sj | Translate the Quarto | +
  3. jleybov
  4. Nichtich
  5. Kneiphof
  6. master_flosse
  7. Computerjoe
  8. Syndicate
  9. Kv75
  10. Alterego
  11. Roy
  12. Random
  13. Deyyaz
  14. Aubrey
  15. COGDEN
  16. -- Jtneill - Talk
  17. Skippy le Grand Gourou (talk)
  18. ··gracefool |
  19. Daniel Mietchen
  20. econterms (talk)
  21. HLHJ (talk)

External links[edit]