WMDE Technical Wishes/extending references

From Meta, a Wikimedia project coordination wiki
Tracked in Phabricator:
Task T100645
Example Refined Book Referencing

The wish[edit]

Currently, when editors reference different parts of the same work, they have to repeat all the information about this work in each reference. This is cumbersome, lengthens the wiki text, clutters the references section by repeated appearances of the same source, and it also obscures if many references actually come from the same work.

This wish is about extending references so that when several pages from the same work are used as references in an article, it’s not necessary to name the whole work for each reference. This wish has existed in the community for more than 10 years now and was voted a top wish in the Technical Wishes surveys 2013 and 2015 and on #24 in the international Community Wishlist survey 2015.

Status[edit]

  • Available for testing on the beta cluster (example article)
  • Support for Visual Editor is still lacking
  • July 2021: No further work on the project by the Technical Wishes team.
  • 2022-2024: Plans to return working on this and other improvements: WMDE Technical Wishes/Reusing references

Why this wish is cancelled[edit]

We have been working on this project for a long time and very much would have liked to finish it. From repeated interested inquiries we could see that the function is eagerly awaited at least by some. However, after considering all the possibilities and risks, we came to the conclusion that there is more to be said for not continuing the project, because overall there are too many uncertainties here and not good prospects for success:

  • The implementation for the VisualEditor would be very extensive and could not be accomplished by the Technical Wishes team outside of a focus area.[1] However, it does not make sense not to make a central function available to VisualEditor users for the foreseeable future, because everyone should be able to participate equally in the wikis. Moreover, making it available only for the source text would further divide the usage experience of VisualEditor and wikitext users, which in turn could increase conflicts among editors.
  • It is also difficult to foresee how much more effort it would take to implement the source editing feature. It's well advanced, but in the past we've often run into problems that set us back months, and we have to expect that again now. We want to prevent a lot of time from continuing to flow into this wish, which is then lacking in other projects. For example, delays are already looming again, such as the coordination with the editing team of WMF (because changes to the source code editing always have an impact on the VisualEditor as well).[2]
  • In addition, the wish only received 10 points each (in the 2013 survey) and 17 points (in 2015). Although overall participation in the surveys was even lower then, these are very few votes compared to the focus areas (Templates: 298; Geo-information: 280). Instead of spending a lot of time on this request, we would rather focus on the topics that can provide improvements for many users.

In light of all this, we want to make a clean break rather than keep the project in limbo for another few months or years. However, it is conceivable that the wish could be taken up again in the course of an upcoming survey, provided that a suitable topic wins the vote (for example, "working with references" or "maintenance") and this project turns out to be the biggest problem in it. Also, the wish could be submitted to the Community Wishlist of Team Community Tech (WMF). A related wish made it to #54 there last year.

A small consolation, perhaps, is that the work of the Technical Wishes team has not been in vain nonetheless, because we have reworked and modernized the Cite extension (the part of the software that generates itemizations) from the ground up while working on this wish. This will make it much easier to generate improvements in this area in the future.

Footnotes

  1. The implementation for the VisualEditor would be a large standalone project. There are no Phabricator tickets for this yet.
  2. Even if we were to provide support for wikitext only, there would need to be minimal integration for the VisualEditor so that, for example, extended references created in wikitext could not be accidentally broken in the VisualEditor. Some challenges are described in T245299.

The planned solution[edit]

Please note that this project has been cancelled. The following description is about how the feature was supposed to be.

Our solution aims to reduce duplicate content both on the input side and on the output side:

Output: What would it look like on the rendered page?[edit]

A mockup of what a rendered page could look like with book referencing
  • Refinements are shown as indentations below the main reference and get their own subordinate reference number. So e.g. when the work called "Pierson" is reference number 1, the refinement "pp. 123–163" could be shown as 1.1.
    • The format 1.1, 1.2 is only one example of how this could look. We expect that each wiki will be able to decide individually how to format their refinements, like they already can with multiple uses of the same footnote.
    • There will be only one degree of indentation, meaning that e.g. 1.1.1. won’t exist because adding more degrees would cause a lot of complexity and create many edge cases.
  • The proposed solution doesn’t restrict how references are being refined. It can be used for page numbers just as well as verses, chapters, etc.
  • This solution works independent of language, script and writing direction.

Input: What does this mean for referencing with …[edit]

… wikitext / <ref> tags[edit]

A mockup of how this solution could work in wikitext. Please note that the red highlights are only here for demonstration, they are not part of the technical implementation.

If you’re using wikitext for references, the solution would look like this: You’re defining the work that is referenced multiple times once with a name attribute, e. g. name=Pierson. Then, each time a part of this work is referenced, you use the new attribute, extends. So for example, <ref extends="Pierson">pp. 123–163</ref> would add the refinement “pp. 123–163” in the references section under the reference with the name “Pierson”. In the screenshot, the main reference is defined in the reference section. The final version of the feature will also allow to define the main reference within the wikitext body without creating an unused jump mark. It’s on our to do list to come up with a syntax for it.

… templates[edit]

If you’re using templates for references, it depends on the template’s maintainers if something changes or not. Templates build up on wikitext syntax, and since the wikitext syntax for our proposal is optional, template maintainers can decide whether they want to adapt their templates.

… the Visual Editor[edit]

A first mock of what book referencing could look like in the Visual Editor

If you’re using the Visual Editor, you will be able to use the new model as well. For example, it could look like shown in this mock.

You don’t have to change your working mode, but you can.[edit]

The suggested solution doesn’t force anyone to change their working mode. All previous methods for referencing are still possible without restriction.

Please note[edit]

General improvements to the Cite extension[edit]

While working on this feature, the Technical Wishes team made some improvements to the Cite extension in general, e.g. some broken code was removed, and the entire code base was restructured, for better performance and maintenance.

Research[edit]

The Technical Wishes team researched how editors use references. Key insights were:

  • There are very different styles of referencing.
    • Some people prefer long references. Benefit: When you move parts of the text, the references are still defined. Also, when reading through the wikitext, you don't need to scroll anywhere to find out what is referenced.
    • Others prefer short references. Benefit: The wikitext syntax is much cleaner and shorter. Also you don't need to repeat what you have written, which means less error probability.
  • Oftentimes, referencing is done with templates.
  • The problem applies also to articles of a blog, religious verses etc., and not only to pages.

Based on these insights, a solution was conceptualized and presented to the German-speaking audience in 2016 and to an international audience in 2018 (see User feedback).

Technical feasability[edit]

Discussions with stakeholders at the Wikimedia Foundation showed that this solution would be doable from a technical viewpoint.

User feedback[edit]

Summary of the feedback round, May 2018[edit]

39 people from 12 different wikis participated in this feedback round, most of them (about 66%) are primarily active in English Wikipedia. The feedback showed a mixed picture: Nearly half of the participants (~46%) were in favor of the suggested solution, and more than one quarter (~28%) opposed it. Another quarter (~25%) couldn’t be clearly associated with a Support Support or Oppose Oppose.

Would the proposed solution help you in your daily work?[edit]

A lot of the feedback that was given was based on preferences: For example, some participants find {{sfn}} easier to edit and its output easier to read, others consider {{sfn}} hard to edit and to read. The same can be said for other solutions, including the one proposed in this round.

Relation of pages with citation templates to all content pages of 10 different Wikipedias (Status: September 2018).
Example: English Wikipedia has a total of 5,717,161 content pages. 76,616 (1.34%) of them use the {{sfn}} template, 30,641 (0.54%) use {{rp}}.
Support Support votes[edit]

Preferences aside, the main reason for pro was that the proposed solution is an integrated functionality, not based on templates. These issues with templates were mentioned:

  • Templates are hard to learn for new editors.
  • Templates increase server load.
  • Templates aren’t machine-readable, which makes processing for bots harder.
  • Not all wikis use templates. An integrated solution would work in all wikis, independent of which templates they use. The graph shows the distribution of different templates in the primary wikis of the participants in this feedback round.

Other reasons

  • Several people just voted pro, without giving a reason.
  • The proposed solution is based on an existing system.
  • For readers, it’s more transparent how often one source was cited, both in the reference section and in the article text itself.
Oppose Oppose votes[edit]

The main reasons for oppose were:

  • Introducing yet another citation system makes citing even more complex.
  • Subsidiary footnotes like 1.1, 1.2 etc. are not known in the academic world.
  • A subsidiary numbering system can make the reference list inconsistent because there will be sources that were cited only once and others that were several times.
  • The proposal talks about grouping citations, but it is actually about grouping notes, which can also contain explanatory text, charts, …. This mixup of terminology is common and might indicate an underlying conceptual problem.
  • The proposed solution is too shortsighted: You are concerned with individual trees, but there is a forest out there. Not having to repeat the reference in one article is nice, but what about other articles. And how would we be able able just how many articles cite the same book or paper? Look at CiteSeerX on how to do it right.
For wikitext users: Which name do you prefer for the attribute within the <ref> tag?[edit]

Comparatively few people (12) participated in the naming question. There was no clear majority for any of the suggestions.

Conclusion[edit]

Weighing in the pros and cons from the feedback rounds, we think that the benefits of implementing an integrated functionality prevail. That’s why we decided to go ahead with the proposal.
One big benefit of the solution is that the definition of a name attribute reduces the need to repeat information. Furthermore, the bundling allows you to see at a glance if the article is based on only one source. What we especially like about the solution is that it’s not keeping anyone from using another style: Because it is enhancing an existing MediaWiki feature, it allows people who can’t or don’t want to use templates to group citations from the same source. At the same time the existing templates can still be used and the communities can (but don’t have to) adjust them to make use of the new formatting. Although the proposal changes the order in which notes are displayed, we believe that this will not be very problematic. Contrary to academic works which are primarily made for print, readers of Wikipedia articles will follow the linking of the citations and are therefore not depending on a strict ordering of references.

Open tasks to be tackled next:

  • come to a final decision what the name of the attribute should be,
  • find a way to define a reference and a subreference at the same time,
  • investigate how citations with many subreferences can be displayed in a good way.