Web2Cit/Docs/Tests

From Meta, a Wikimedia project coordination wiki

Translation tests define a series of translation goals for specific target webpages indicating their expected translation output. They are one of the configuration files that the Web2Cit community defines on a per-website basis, along with translation templates and URL path patterns.

Translation tests may help Web2Cit collaborators with writing translation templates, in a test-driven approach where tests are defined first, and templates are written afterwards to make tests pass.

In addition, translation tests are used by the Web2Cit monitor to regularly check the health of the Web2Cit system.

Translation goals[edit]

A translation goal is a list of values indicating the expected translation output for a given translation field and target webpage. Translation goals must follow the corresponding translation field validation pattern. See the Fields documentation for a list of supported translation fields and their validation patterns.

An empty translation goal (i.e., an empty list of goal values) explicitly indicates that no translation output is expected for the corresponding field. Note that empty translation goals defined for translation template's mandatory fields are not allowed (they will be ignored).

If the expected output is unknown, the corresponding test field should be omitted instead. Note the difference between unknown expected output, and empty expected output.

Translation goals may include multiple values, for translation fields supporting such multiple-value outputs (e.g., Author last names). Note that additional values will be ignored if defined for translation fields not supporting multiple-value outputs.

Score calculation[edit]

A translation score between 0 and 1 can be calculated for a target webpage, which results from comparing the translation output that Web2Cit returns for that webpage, and the expected output defined in the corresponding translation test.

For any given translation field, if a test goal has not been defined, the score is undefined too. Conversely, if the translation output is undefined (no translation procedures defined) or empty, and a test goal has been defined, the score is 0. If both translation output and translation goal are empty, score is 1.

Given a pair of translation output and translation goal arrays, items from the first array are compared against items from the second array, in the order they are given. That is, first item in first array vs first item in second array, etc.[note 1]

There are three item-to-item comparison functions, depending on the specific translation field:

  • Edit distance: 1 - dist / maxLength, where dist is the Levenshtein's edit distance between items, and maxLength is the length of the longest item.
  • Boolean: 1 for identical items, 0 for non-identical items
  • Date: YYYY(-MM(-DD)) are split on - and individual components are boolean-compared. An average is returned. Examples:
    • 2012-12 vs 2012: (1 + 0) / 2 = .5
    • 2010-12 vs 2012: (0 + 0) / 2 = 0
    • 2012-12 vs 2012-12: (1 + 1) / 2 = 1
    • 2012-12-01 vs 2012-12: (1 + 1 + 0) / 3 = .66

Notes[edit]

  1. Note that an additional "unordered" scoring strategy was available, where items were compared irrespective of their order. To do that, the first array was compared against all possible permutations of the second array, and the highest score was kept. Finally, "orderded" and "unordered" scores were averaged and returned as final score. However, this had to be turned off because it became computationally expensive with array length. See T314198.