User:Okeyes (WMF)/Localising page curation

From Meta, a Wikimedia project coordination wiki
Jump to navigation Jump to search

A consistent problem with new software for some process are that different local solutions and policies exist. Incorporating these solutions and policies into a replacement software, even when feasible, can take up a lot of Engineering time and produce vastly different forms of the same software depending on the wiki.

Some possible solutions to this problem are discussed below, concluding with the argument that the simplest way to allow us to work with community solutions without taking on the onerous burden of constantly writing and maintaining custom software is the introduction of a new backend schema that can identify editor-modifiable elements of those bits of code that relate to community solutions, and allow for modifications in a user-friendly fashion.


The Editor Engagement (EE) team has scheduled a large number of projects to help fix the editor retention issue. Many of them focus on core areas of a website's workflow where MediaWiki has either traditionally been deficient, or merely hasn't kept up to speed with the rest of the internet. In both cases, we need to deploy the new software as widely as possible to see maximum impact: the English-language Wikipedia is one of over 400 projects we maintain. In both cases, the problems with existing software mean that local communities have built process, policy, template and javascript-based frameworks around them to augment the places where they are weak. These frameworks exist in subtly different forms on any project, and may be as close as different template names for what is functionally the same thing, or as distinct as to feature completely novel processes, or ignore processes other wikis possess.

This makes localising our projects rather difficult; instead of it merely being about language, which we have an excellent framework for, it's about process and social distinctions. This is complicated by the relatively small language base at the Wikimedia Foundation compared to the number of projects we support. As well as being difficult, this is also vital: we cannot build what is essentially core software for the English-language Wikipedia alone. We have a responsibility to focus our efforts on things that can help all of our projects that are in need. This also can't mean merely working on things that do not interact with existing processes: we need to find a middle ground in which we can both fix broken bits of the existing setup and distribute these fixes widely.

As well as being merely an ethical problem it's also a practical one. Our challenge is to grow editor retention rates overall. If Enwiki has a pool of say, 200 new editors a month who can be hit with Page Curation and Echo and all the rest, we have a chance of converting some subset of 200 into power users. If we have an easy way of releasing this software to French, Spanish, German or Japanese projects, that number of 200 jumps substantially because we're engaging with far more distinct users: a greater bang for our Engineering buck.

High-risk projects[edit]

Page Curation[edit]

The Page Curation/Page Triage extension (currently deployed on the English-language Wikipedia) is an attempt to improve on the software and workflow used for the triaging of new pages. Prior to the release of Page Curation, enwiki patrollers were using only Special:NewPages: a part of the core MediaWiki software that was released in 2005 (and has maintained a fairly consistent form since then). Due to growing deficiencies with the software, and with MediaWiki generally, the community on enwiki built a large amount of markup-based architecture and process around patrolling. This included policies for governing what could be deleted and when, templates as a form of notification and marking of what was legitimately tagged, and JavaScript tools to semi-automate the process of tagging and patrolling new pages.

Our solution: the Page Curation extension, which can be seen live in the link above. This has been praised by large numbers of enwiki users as at least a partial solution to the problems they were struggling with, and one that integrates well into the local workflows. This is less due to a nuanced weighing of the price of localisation, however, and more to do with the fact that we built it to deal with precisely the situation enwiki has to deal with. It's not something that integrates easily into other sites, being based at least in part on code that must be manually updated by developers. Localising it to other sites has proven tricky (we have several outstanding requests for deployment, all of which have gone unfulfilled because the engineering time necessary makes them unattractive when compared to the "not done" projects we have to work on).


The upcoming E2 project is Echo, an interwiki notifications system aimed at providing real-time updates to editors as to the status of their contributions, their involvement in the wider Wikimedia movement and their place on the site they call their home. Featuring deep granularity of notification types and classes of notification aimed at everyone from new users to the most long-term administrators, it aims to provide a new way of keeping engaged with Wikimedia sites day-in, day-out.

Some examples of the notifications it plans to provide are:

  1. Informing users of the Teahouse, a friendly venue for new editors to ask questions of experienced community members;
  2. Telling people when their files or pages have been nominated for deletion;
  3. Informing people of discussions on the Village Pump or Administrators' Noticeboard;
  4. Letting people know when their contributions have been classed as "good" or "featured";
  5. Links to "how to contribute" pages.

All of these features have something in common; they are highly dependent on (a) local processes and templates and (b) us knowing that a localised version of each process exists, and how to call it. This is another area where we're going to stumble into the same trap as with Page Curation: building stuff that could make a real difference to editor retention and the work demanded of editing, but that is highly labour-intensive to localise.

Possible solutions[edit]

Build for localised workflows, manually localise on each wiki[edit]

The most obvious mechanism (and the one we've been sort-of using so far) is to build a version around the localised workflows on, say, enwiki, and then localise the code to different projects with different processes as and when it becomes available. This has all of one advantage: it reduces the burden on our communities. They don't have to do anything except sit there and wait for deployment, short of answering a few questions. The disadvantages are substantial:

  1. The immediate burden of developers is increased. Instead of writing a piece of software, they have to write a piece of software and N hundred potential fragments of that software for future use;
  2. The long-term burden of developers is increased. Their maintenance moves from "something is uniformly broken, fix" to "a specific project has changed part of their community workflow, which you are now responsible for";
  3. We lack language skills. We've got projects in hundreds of languages, and the staff do not collectively or individually speak hundreds of languages, unless we expand the definition to include Clojure. The result is that localising can take an incredibly long time because we're, at least in part, reliant on waiting for local volunteers to get back to us on what template they use for X and Y (or if they use X at all);
  4. It artificially reduces the projects to which we can push things. In practise we're dependent on waiting until local volunteers show up to help translate their process into our way of working, which means we're limited to "those projects that possess someone technically knowledgeable who can use bugzilla and speak English". It also means we tend to wait for consensus to develop on specific projects (which is not appropriate for some of our software) and that because of the burden localisation puts on developers, the number of projects to which we can push things is substantially reduced from the ideal of "all of them".

Build for localised workflows, produce a stripped-down version[edit]

Another way of working things is to build for, say, enwiki, and then released stripped-down versions for other projects. Examples would be page curation consisting only of the generaliseable New Pages Feed elements, and without the Curation Toolbar, or Echo with only those notification classes that are based on features in MediaWiki. This has some advantages:

  1. It means we can afford to deploy far more widely, theoretically to all of our projects. The only elements that need localising are literally in need of localising: they can be handled by translatewiki;
  2. Long-term burden on developers remains where it is today: they are responsible only for general software and one localised workflow, not for a plethora of different, localised variants on software;

Disadvantages are twofold: first, it will be some effort to develop versions of existing and ongoing projects that do not contain anything referring to local social norms or practises. Second, it dramatically reduces the quality and quantity of software we offer to the vast majority of our projects. In 90 percent of cases, they will be getting something inferior to the Platonic form of that software, with fewer features or editorial aids. In the remaining 10 percent, we will be developing something purely social (for example, Erik's plan to software-ify Wikiprojects) and most sites will simply get none of the software. In both cases we are effectively creating two tiers of software support.

Build for general workflows, release widely[edit]

A third option is simply to cease building for local workflows at all. Produce one, uniformly generalisable piece of software that is dependent on and interacts exclusively with MediaWiki and its extensions. For Page Curation this would mean stripping out the Curation Toolbar and some elements of the metadata: for Echo, all elements of the software that interact with community workflows, processes or templates. The advantages to this:

  1. Developer burdens, long-term, are actually reduced. At the moment devs have to handle generalised + 1 localised variant: under this, they'd merely have to handle one generalised version.
  2. Our software development timetables can either be slimmed down, or can include more general-purpose features. Because we're only building generalisable software we have a void in time that would otherwise be occupied with localisable fragments: we can use this to pursue additional "nice to haves" or to move on to a newer project.

The disadvantages are fairly obvious; we're writing software that sucks for everyone uniformly instead of software that sucks for most people. Page Curation without the Curation Toolbar is barely an improvement; Echo without the interactions with community workflows leaves out a lot of the things we should be pointing new users to. We're not going to see as much of an improvement as if we'd built uniformly localisable or partially localisable software, and there are going to be tremendous areas of the wikis that we cannot touch without either writing irrelevant software (see the WikiProjects example above) or by trampling community processes, angering people, reducing takeup of the software and potentially introducing what is, locally, an inferior product.

Build for localised workflows, move localised elements into community-editable formats[edit]

One idea is to build for say, enwiki, and then move localised (or localisable) elements into a community-editable place. An illustration: for Page Curation, we could host and call the individual Curation Tool elements from a MediaWiki namespace file, which could list each entry, what template it linked to and the description to be presented in JavaScript. Any administrator could then go in and tweak things when policy or social norms change and criteria need to be added or deleted. When it comes to deploying it on a new wiki, we have a commented-out example of how it would work which the translatewiki team translates. Advantages of this:

  1. It moves the localisation burden on to the community, by making it their task to localise things;
  2. It means we can deploy widely while still writing localisable elements;
  3. It reduces long-term and short-term burden on devs: as well as making initial installation the task of the users at the other end, it does the same to changes that are necessitated by policy or template alterations.

Disadvantages, however, are substantial, simply because it involves relying on people who know how to use JavaScript and are familiar with the core code. These are few and far-between: we cannot expect to find them on every project (or even on most projects) and, should they run into difficulties, the best case scenario is that the extension is improperly deployed. The worst-case is that they occupy a lot of developer time setting it up. This possible solution has actually been tested before: Kaldari's "WikiLove" extension was designed with this kind of localisation in mind. JavaScript files that pertained to things the community might want to replace were stored in the MediaWiki namespace, allowing individual wikis to set up WikiLove without Kaldari's direct involvement. In theory, this was elegant. In practise, the demands it put on the community meant that Kaldari, instead of spending his time writing JavaScript, spent his time explaining to people who didn't necessarily speak English very well how they should write JavaScript.

Build for localised workflows, introduce a more user-friendly modification process[edit]

So, we can't build without localised workflows if we want our software to be truly generalisable and maximise possible impact per-project. On the other hand, we can't build with localised workflows if the localising process takes up a substantial amount of development time and leaves us responsible for a wide variety of types of future maintenance. The solution to this would seem to be to build with localised workflows in mind and put the localisation burden on the community. As explained above, past attempts at that have failed...not because nobody was willing to take on the burden, but because the burden was made artificially high by the complexities of the mechanism for modifying community-centric bits of the software.

Users were asked/expected to know JavaScript, and implement it via the mechanism of a Wiki page. Their prompts for what each element consists/consisted of were variable names, which link through to code they can't necessarily see, and the general experience of localising the product under such a setup is going to be both initially galling (reducing the number of people who try it to start with) and difficult to finish without external support - external support that completely negates the point of running software deployments like this in the first place.

The solution is to simply implement a better way of making code malleable by local communities: a nicer interface for them to deal with that hides the code and simply presents them with the things they'd actually be interested in tweaking - the addition of new options, the removal of old options, the alteration of strings used to describe options as part of the user interface. This would require three things: first, the development of a schema which can be used to identify "alterable" strings, templates or elements of code. Second, the development of an extension that can "read in" schema elements in other extensions which feature modifiable code, and present them to the community in a user-friendly format. Third, the refactoring of Echo and Page Curation to make the extensions schema-friendly (and the taking into account of the schema for future software projects).

The advantages are fairly clear: it moves the burden of localising on to the host wikis without making it insurmountable. It also moves the long-term maintenance burden on to them, for every project (including enwiki) which is actually an improvement over the status quo where we remain responsible for enwiki. It allows us to write localisable and localised software and deploy it far wider than we currently can. The disadvantages are that there will need to be initial engineering work to write the schema and update existing extensions, coupled with the need to write future extensions to be compatible with the schema. Kaldari and Ori have talked through this, and probably have more to add on the scope and time requirements for this. It is recommended that this approach, being the only one that solves the localisation problem without requiring a massive and endless outpouring of engineering effort, be the preferred option when it comes to resolving the issue.