Learning patterns/Proofreading large amounts of text

From Meta, a Wikimedia project coordination wiki
A learning pattern forwiki design
Proofreading large amounts of text
problemProofreading large amounts of text is a very demanding task. The process should be planned intelligently and operate in an "always learning" mode.
solutionThe best advice is to classify the errors we want to correct in different types as soon as possible depending on the kind of supervision they need.
creatorJaumeortola
endorse
created on10:50, 5 June 2016 (UTC)
status:DRAFT

What problem does this solve?[edit]

Catalan Wikipedia needed a thorough linguistic review (spell checking and grammar). This was a daunting task. With the help of the proofreading software LanguageTool we have made some progress. The most important lesson we have learned in the process is apparently trivial. But the more strictly you follow this advice, the better.

What is the solution?[edit]

The errors should be classified depending on the kind of supervision they need.

  • Errors that can be corrected always automatically. Of course, you must be absolutely sure that it is always fine to apply the correction. This implies that you don't change words in other languages (in Catalan, we have to take care specially of words in Spanish, Portuguese, French and Italian) or in non-standard language (old or dialectal).
  • Errors that need supervision. It is enough to look a few words around the error in order to know if the correction is appropriate.
  • Errors that need very careful supervision. You need to read probably the whole paragraph or even the whole article. For example, in Catalan, "hivernar/hibernar".

Moreover, some simple errors can be found in the online Wikipedia, but errors that need a full morphosyntactic analysis are to be found in the Wikipedia dump.

Things to consider[edit]

When to use[edit]

Endorsements[edit]

See also[edit]

Related patterns[edit]

External links[edit]

References[edit]