Jump to content


From Meta, a Wikimedia project coordination wiki
(Redirected from WikiTextrose)
This is a proposal for a new Wikimedia sister project.
Status of the proposal
Reason--Sannita (talk) 17:22, 17 September 2013 (UTC)[reply]

WikiTextrose (a portmanteau of "text" and "(compass) rose") is a text relationship database for mapping the various interactions between interpretable artifacts (i.e. "texts"). Though the project is inspired by long-established theories in the field of citation analysis, it expands upon these by considering all the ways in which one text may interact with another.



WikiTextrose elaborates upon traditional citation analysis by treating a citation as simply a specific type of text relationship with the following values for its Type and Mode attributes:

  • Type: the relationship is a positive evidentiary citation- that is, an appeal by one text to the authority of another
  • Mode: the relationship is immanent and explicit, with one text openly and knowingly referring to another

Yet there are many interesting text relationships with different values for both these attributes. Most obviously, a citation may be negative, with one text attacking the authority of another. In a particular literature such a relationship may not be just common but prevalent. For example, in the field of psychiatry the psychoanalytic theories of Freud have come under increasing attack. In the field of historiography methodologies and established interpretations are constantly being challenged.

In addition to Type, the Mode of a relationship may differ from what is assumed in traditional citation analysis. A relationship may be implicit, rather than explicit, and transcendent (outside of either text) rather immanent. As an example of the former consider plagiarism or derivativity, where it is in the author's interest to hide or ignore any dependencies. As an example of the latter consider the case of independent co-discovery, which historically occurred in the fields of calculus and evolutionary theory. Darwin and Wallace's papers introducing the theory of evolution are related (say, by the "independent co-discovery" relationship), yet this relationship exists outside either text.

The next section describes different WikiTextrose use scenarios, each with different set of typical text relationships. Relationships other than those listed are entirely possible for a particular use scenario, and may in fact give rise to new and valuable uses of the wiki. Users will thus be allowed to create their own relationship Types, and by categorizing these aggressively WikiTextrose should remain searchable and consistent.


WikiTextrose is a generalization of the Wikicite project proposal, and should be fully compatible with its main objectives; indeed, it is strongly recommenced that the two projects be merged if at all possible.

As users create citations within the pages of external wikis, WikiTextrose will record these text relationships, thus serving the original purpose of the Wikicite proposal, which is to document the connections between Wiki articles and their evidentiary sources. As Wikimedia projects become more authoritative and are themselves cited (i.e. as inbound links to their articles are created in addition to all the outbound links currently originating from them), interesting new use cases with their own Types of text relationships may arise. Until then, though, the primary utility of WikiTextrose for these other projects will be as an auditing tool. For example, metrics may be run on articles to determine not only how extensive (say, as a percentage of total text within the article) are their citations, but also how credible these citations are, based upon authority metrics computed for their immediate sources.

Use Scenarios


Citation Analysis


By modeling evidentiary citations as general text relationships, more granular and accurate citation analysis becomes possible. As discussed above, it is important to distinguish between negative and positive evidentiary citations when mapping the authority relationships within a literature. It is also useful to model the weight of each citation, which can be done by introducing different relationship Types based upon the general level of a citation's strength. A work that is referred to as "promising" is clearly not of the same authority as one that is called "definitive", a fact ignored by citation analysis schemes which measure influence only by frequency of reference.

A citation/authority relationship text map.



The wiki will support evidentiary citations within historiographical texts at a level usable by the professional historian. That is, in addition to recording citations between secondary sources of interest to the general reader, the wiki can record dependencies on ultimate/primary sources (including unpublished ones such as manuscripts, archival materials, pottery sherds, etc.) and thus provide a catalog of all the ultimate source "texts" within a particular historiographical discipline.

Typical historiographical text relationships.



Though explicit citations are rare in the aesthetic genres of literature, many notable text relationships still exist in such categories of work as novels, plays, and poems which are of interest to both the literary scholar and general reader. Some possible relationship Types include:

  • Influence: Works of literature rarely escape the influence of predecessor texts, as when they reuse the latter's plots, characters, or themes. Mapping such influence relationships allows not only the development of popular tropes to be better understood, but also re-situated in their original interpretative contexts. For example, the plots of nearly all of Shakespeare's plays were taken from pre-existing sources and that of Hamlet is from a popular genre of the time called the "revenge play".
  • Textual Evolution: A text can undergo many changes before its final publication or expression. For example, the current, "canonical" version of Hamlet which we possess is in fact a composite of two different texts, one of them almost certainly an unauthorized, pirate printing.
Literary text relationships

News and Media Analysis


Evidentiary citations are an immensely important, if informally executed, part of news journalism. Nearly every credible newspaper enforces strict evidentiary standards on its reporters, such that almost all non-obvious assertions are supported with reference to either primary source texts (e.g. transcripts, official memos, eye-witness accounts) or appeals to expert/scholarly authority.

Yet much discretionary editing and interpretation go into the creation of a news article. Though primary sources may be cited, these almost always appear as excerpts from much longer texts, as in summaries of proposed legislation which mention only the bill's most "pertinent" features, or quotes from speeches by public officials which select only their most "significant" (usually defined as controversial) moments.

By mapping the text relationships which exist in news journalism and its derivate literatures (op-ed pieces, policy analysis essays, and the various other types of opinion journalism) the following use cases become possible:

  • Authority Analysis: When one news outlet obtains exclusive information on an important news story other outlets will cite it in their own reports. Recording such citations allows authority relationships within news journalism to be mapped and perhaps even quantified, such that questions like "Which is the most influential U.S. newspaper?" may be answered. Another type of authority relationship is "prescience", where credit may be given to news outlets which diligently cover a particular issue well before its importance is belatedly acknowledged by other, perhaps more famous, news outlets.
  • Credibility/Bias Analysis: Some primary sources are of questionable authority. For example, a scholar may be controversial within his field, while think tanks (whose press releases often directly result in news stories) may be noted for their ideological bias. By backtracking to those texts which either attack or support the credibility of a particular source, the veracity of news stories relying upon it may be better understood. Another possible use case is to document (mis-)usage of primary sources within opinion journalism, with new citation Types introduced for different scenarios; for example, when a primary source is willfully misconstrued this may result in a relationship Type of, say, "tendentious citation", being asserted between the two texts.
  • Historiographical Cataloging: Mapping citations to such primary source texts as interview transcripts, policy memos, etc., produces a convenient catalog for the historian gathering his data
  • Engaged Reading: Links to primary sources are of use to active/engaged readers (such as bloggers) who desire a less mediated news consuming experience, or who wish to engage in criticism/analysis of dominant media outlets

Inter-Wiki Collaboration




WikiTextrose may be used in conjunction with Wikipedia to improve the quality (particularly the authority) of the latter's articles, with several types of usages possible depending upon how closely the two wikis are coupled. At the loosest level of integration, WikiTextrose may simply serve as a reference work, allowing users of Wikipedia to discover the most authoritative sources on a particular topic and then incorporate them into relevant Wikipedia articles.

Much closer couplings are possible as well. Ideally, every text block within a Wikipedia article containing a factual assertion (in principle nearly every text block) will cite the evidence upon which it relies. This text block may then be rendered- say, using alternate colors- according to the strength of its evidentiary citation: grey for a block with no citations to back up its assertions, red for a section which relies on a discredited work, etc. A possible citation syntax may be:

[[cite:scheme:key:index|cited text|paraphrase]]

For example:

[[cite:isbn:067943593X:p11|"In the Second Century of the Christian Era, the empire of Rome
comprehended the fairest part of the earth and most civilized portion of mankind."|
Gibbon describes the Roman empire at the time of the Antonines in very favorable terms.]]

Note that the syntax completely encloses the citing (paraphrase) text, allowing those sections of an article that are with evidentiary backing to be easily distinguished from those that are without by the page renderer. Note also that the cited text (unparaphrased and as it appears within its source) may be included in the citation mark-up as well, allowing the reader to verify its accuracy if they so wish.

As part of new style guidelines, editors could be encouraged to always include factual assertions within this citation markup, using special stub citation mark-up for facts they cannot immediately establish:


Such evidentiary "holes" can then be made conspicuous by the renderer, providing an immediate visual overview of both the quality of an article as well as which portions of it are most eligible for improvement:

Citation "hole" rendering.

See the following discussion of citation mark-up on the Wikitech Mailing List

The drawback of any close integration is the increased computational resources necessary to render each page. One way to mitigate this impact is to cache citation information, recomputing it only a scheduled or explicitly requested basis.



There is the potential for highly synergistic collaboration between WikiTextrose and Wikinews. As described in the use case section for "News and Media Analysis", WikiTextrose can be used not only as a reference tool when writing new articles, but also as a media analysis tool whose results may in themselves be newsworthy.

External Competition

  • Amazon.com: Amazon has begun cataloging citations between the printed works that it carries. This cataloging is automated, and so while WikiTextrose will not be able to (at least initially) match Amazon in the breadth of its coverage, it can exceed it in detail.





WikiTextrose depends upon a highly structured datamodel, and so has Wikidata as its chief technical prerequisite. An optional technical requirement is a graphics library for rendering graph/tree structures so that text relationships may be more easily visualized.

Data Model


The following are the primary entities within WikiTextrose:

  • Texts: i.e. interpretable artifacts/sites of interpretation
  • Text Relationships: a relationship between two Texts, with its primary attribute being Type
  • Text Relationship Relationships: relationships between Text Relationships, the most important being "sub-category" or "subclass"; the purpose of this relational table is to categorize new relationship Types so that the texts they join will appear for searches done for more general relationship Types
  • Containers: a logical or physical grouping of Texts; for example, a newspaper article could have as its Container as particular issue of The New York Times, which is itself contained within all issues of Times for that year, as well as all issues of the The New York Times ever published, etc.

Strictly speaking, only the Text Relationship and Text Relationship Relationship entities are needed. Indeed, not only is the Text entity suitable for hosting within its own Wikidata dataset, for efficiency's sake it may be necessary to pursue integration with the WorldCat catalog.

In addition, the "Functional Requirements for Bibliographic Records" recommends modeling such entities as People, Places, and Events so that Texts may be searched according to their relationship with these entities. Obviously such entities will be of use to other Wikimedia projects and should be hosted in their own Wikidata datasets.