Wikicite (2006 proposal)

From Meta, a Wikimedia project coordination wiki
This is a proposal for a new Wikimedia sister project.
Wikicite (2006 proposal)
Status of the proposal
Statusclosed
ReasonInactive proposal.--Sannita (talk) 17:14, 17 September 2013 (UTC)[reply]

This version of the project proposal refers to an idea for standardizing how facts are cited. For the former proposal about a global bibliography wiki, see this revision from November 2005.

Introduction[edit]

Citation: A many-armed beast

A fact is only as reliable as the ability to source that fact, and the ability to weigh carefully that source. Wikipedia's community, in an effort to expand its useful sphere of users, increase its reliability, usability and credibility has held several related discussions on improving the scholarly apparatus of Wikipedia. The need to cite sources is now in the community standards list, the desire to upgrade the citation of articles is the subject of the Fact and Reference Check Project, and the Encyclopedic Standards Project has discussed automatic, or at least software assisted, citations. There has also been a coding effort to support footnoting as well as article validation, and stable version designation.

Wikicite incorporates features from all of these efforts in an attempt to achieve the following core objectives:

  • Facilitate the citation of all factual assertions. Ideally every non-obvious factual assertion should connect to evidence which corroborates it. Some major reference books hire researchers to check facts. Most rely instead on the contributors to check their own article. Most reference works do not have an elaborate citation scheme as proposed here.
  • Improve the quality of cited material. A source may not be "state-of-the-knowledge" within its particular field for various reasons: lack of original research, obsolescence, association with an untested school of thought or methodology. By mapping the authority relationships within a particular literature, editors can more easily discover authoritative research material.
  • Organize article review around fact and citation-checking. Number of cited vs. uncited facts, number of correct vs. made-up citations; these data should play central roles in determining the quality of an article and whether a particular revision deserves "publication" (e.g. stable, or version "1.0" designation) by the community.

Article Development Example[edit]

This section illustrates the main features of Wikicite in the context of a typical article development process, using Spider as our example article.

Step 1: Identifying Factual Assertions[edit]

At this stage an article is either newly created or has just begun the refinement process. In any case its distinct factual assertions have yet to be identified. This is done through special fact-point mark-up ("++fn") somewhat similar to the footnote. For example:

  Columbus was most likely Genoese++fn, although ++some historians
  claim he could have been born in other places, from the Crown
  of Aragon to the Kingdoms of Galicia or Portugal++fn, or in the
  Greek island of Chios++fn among others.

Logically, the fact-point mark-up delimits the following 3 assertions within the text (see Wikicite Fact-point for a formal explanation of how they work):

  1. Columbus was most likely Genoese
  2. some historians claim he could have been born in other 
     places, from the Crown of Aragon to the Kingdoms of Galicia 
     or Portugal
  3. or [he was born] in the Greek island of Chios

Once a fact-point is created, evidence may be connected to it in the form of a citation to a book, journal article, web page, etc. A single fact-point can have multiple citations: this is useful when strengthening a claim or when a fact-point actually contains several distinct factual assertions that cannot easily be broken into meaningful sentence fragments- for example, fact-point 2. from above, which asserts that some historians claim Columbus may have been born in the Kingdom of Aragon, AND some historians claim Columbus may have been born in Galicia, AND some historians claim Columbus may have been born in Portugal.


In the case of our Spider example article, an editor would mark-up the article as follows:

Fact-point markup

Clicking the PREVIEW button, the editor would now see every factual assertion just delineated flagged in red as they are (as of yet) unsourced:

Logical fact-points

The editor may choose to save the article at this point: though no evidentiary sources were added, at least he's identified which ones are needed. Subsequent viewers of the article will now see the uncited fact-points flagged in red, alerting readers to which sections of the article are possibly untrustworthy, and editors to which sections are in most need of attention.

Step 2: Adding Citations[edit]

At some point citations will be supplied for the various fact-points. This can be done in conjunction with Step 1, or at a later time by a completely different editor. For every fact-point created, input fields are automatically generated on the Preview Page to record citation information:

Adding citations

These capture the source's identifier (ISBN number for books, webcited URLs for web pages, etc.), as well as page number or its equivalent and the cited, evidence text (as opposed to in-article, paraphrase text). Note that for most printed material only a unique identifier such as ISBN or ISSN number is needed; all other bibliographical information (title, author, date of publication, etc.) will be taken from the Wikicat bibliographical database. Note also that associating multiple sources with a single fact-point is possible.

Once the user clicks SAVE, the citation data is stored in its own citation/text relation database. Fact points with source information are no longer flagged by the page renderer.

Step 3: Article Review[edit]

Once an article reaches an acceptable level of quality, a revision is designated as a stable/publication candidate by user vote, bureaucrat action, etc. Users can then rate the article on such overall, subjective characteristics as readability and thoroughness. They will also, however, be asked to verify its accuracy by checking the correctness of each cited fact-point. Does the source actually prove what the paraphrase text asserts? Does the cited text even exist in the source? In practice, the votes of only a few users should be enough to establish whether a citation checks out or not.

Keep in mind that different sources may have different or contradictory information—it is not Wikipedia's role to be umpire among them, but editors should choose the most likely statements of fact.

Article review page

If the article is of sufficient quality, it is designated as stable or "published", giving it certain view privileges on the site. For example, the stable version of the article appears by default on all visits to its page, even if there have been subsequent edits. Or there is a stable Wikipedia site/URL featuring only verified articles.

Designation as stable, however, does not mean an article is frozen and no further work may be done upon it. At the very least, it will need to be updated to reflect recent developments or advances in scholarship.

Step 4: Source Review (in The World of Tomorrow)[edit]

Work done using the tools provided by Wikicite will seed a text relationship database which, among other things, will contain a record of citations between texts. Which book or article is most often cited regarding female spiders eating their mates? Which is now widely considered out-dated or methodologically flawed? Are there two schools of thought on the issue? Are both schools mentioned by the article?

By providing a map of the authority relationships within a particular scholarly literature, Wikicite empowers editors to improve their sources.

In addition, one could use the tool to systematically verify claims about a particular source. Similar to the "What links here" feature, it would become feasible to track all Wikipedia articles that cite a particular source, and to see the associated claims. A single reviewer familiar with the source could then quickly verify whether the claims attributed to it are indeed found there—for frequently cited sources, this would be a much more efficient methodology than reviewing source statements on a per-article basis.

A central source repository would also allow tracking citations across languages, and thereby allow addressing cultural biases: Are there important French sources on the Normandy Invasion that are not cited in the English Wikipedia article, or vice versa? As such, it would play an important role in stimulating scholarly discourse without violating the prohibition of original research present in Wikipedia. Sources that were not previously considered relevant or even known in a language or culture could gain visibility simply through being cited in Wikipedia, incentivizing scholars to enrich the encyclopedia with useful sources (and, on the negative side, promoting themselves).

Technical Components[edit]

Technical components.

At a high level, Wikicite consists of the following technical components:

  • Wikipedia Extensions: A set of extensions to Wikipedia which, among other things, add fact-point support to the article view and edit pages, and article review/stable candidate designation capabilities.
  • Wikicat: A bibliographical catalog, implemented as a Wikidata dataset.
  • Bibliographical Catalog Server: Used to seed Wikicat with basic bibliographic data on an "as-needed basis" (i.e. whenever a source is cited from within Wikipedia). Assuming it agrees to the extra load, the Library of Congress's SRU-enabled Voyager Z39.50 MARC server would be an ideal candidate for this role.
  • WikiTextrose: A text relationship database, implemented as a Wikidata dataset. Among other things, it will record citation information between Wikipedia articles and their evidentiary sources (books, journal articles, etc.)


See Wikicite (2006 proposal)/Technical Design for implementation details for all these components.

Adapters[edit]

Wikicite will be able to support various adapters for the purposes of data visualization and transformation. For example, an adapter could be written to generate bibliographic data for an article in BibTeX format. Or a References section could automatically be generated for an article based upon the works it cites. Indeed the entire typographic apartus of the footnote could be reproduced. All this begs the question, though, of whether it should be reproduced, particularly given the superior capabilities (e.g. hyperlink support) of the standard Wikipedia user agent- i.e. the web browser. It is strongly recommended, therefore, that new approaches to data visualization be explored.

Project Milestones[edit]

The Wikicite project will advance according to the following milestones:

  • Wikicat "dress" implementation: Basic, "dress" design and implementation of Wikicat to support storing only those data sufficient for citation (e.g. title, date of publication, author name, etc.) This means only the Wikicat Manifestation entity will be modelled initially, and with only those attributes that can be populated through MARC import capabilities.
    • Status:
      • Datamodel design effectively complete
      • implementation and integration in progress...
  • MARC import capabilities: Support importing bibliographic data into Wikicat through Z39.50
  • Wikipedia Fact-point extensions: Add extensions to support fact-points within Wikipedia
  • Article Validation Extensions: Add fact-point-centered article validation extensions to Wikipedia
  • WebCite Support: If an URL is the primary reference, it should be cached/archived using WebCite, which has a XML-based API. The original URL should be replaced with the WebCite link (www.webcitation.org/ID, where ID is the WebCite identifier or the cited URL and a timestamp), or the WebCite link should be added after the cited URL. Cached URLs make sure that the cited information remains accessible.

Project Members[edit]

See also[edit]

Discussion/Historical[edit]

Related Projects[edit]

External links[edit]

  • BEAT: Library of Congress Bibliographic Enrichment Advisory Team
  • ISBNDB.com: ISBN database of 1.5M titles.
  • DRUMS: Proposal for a third-party centralized database of authored content.
  • Wiki Research Bibliography: Bibliography of scientific literature on wikis.