User:TMg/autoFormatter

From Meta, a Wikimedia project coordination wiki

Auto-Formatter is a user script that semi-automatically fixes more than 200 common errors in the wiki markup. The script was originally created for the German Wikipedia but can be used in all languages. Please note that the script may make mistakes. Please report errors (you can write English or German).

See de:User:TMg/autoFormatter for the full documentation (German).

Installation[edit]

Copy and paste the following line either to your local common.js subpage, e.g. your common.js on the English Wikipedia, or to your global.js on Meta to activate it on all Wikimedia wikis.

mw.loader.load( '//de.wikipedia.org/w/index.php?title=Benutzer:TMg/autoFormatter.js&action=raw&ctype=text/javascript' );

Additionally I highly recommend my cleanDiff script. It makes it a lot easier to review the changes made by the script when pressing the "Show changes" button.

Configuration[edit]

Usage on other wikis[edit]

In other languages[edit]

The label and tooltip of the "Auto-Format" button can be localized:

var autoFormatterButtonLabel = 'Auto-Format';

On non-Wikimedia wikis[edit]

If you want to use the script on external wikis that don't support linking to Wikimedia wikis with the short syntax [[:en:Wikipedia]] you can disable the feature set:

var autoFormatWikimediaLinks = false;

Normalize keywords[edit]

If you don't want deprecated keywords like Image: replaced you can turn this feature set off:

var autoFormatLocalisation = false;

Uncover hidden link targets[edit]

Links like [[New York|New York City]] are most probably wrong or at least misleading and therefore replaced with [[New York]] City. You can turn this feature off:

var autoFormatMaskedLinks = false;

Year ranges[edit]

In the German Wikipedia year ranges like "2001–02" are expanded to "2001–2002". This feature is disabled in all other languages. Please tell me if your local manual of style recommends that too and I should turn it on for your language. If you want you can turn it on for yourself:

var autoFormatShortYearRanges = true;

Drop default parameters from infoboxes[edit]

Example:

var redundantTemplateParameters = [
	'Infobox single|Name',
	'Infobox company|name'
];

To disable the feature:

var redundantTemplateParameters = [];

Align infoboxes and other templates[edit]

Example:

var autoFormatTemplates = [
  { name:   'Infobox example 1',
    format: '|_________________ = _\n'
  },
  { name:   'Infobox example 2',
    format: '|__________________ = _\n',
    parameters: {
            'Old parameter': 'New parameter',
            'Deprecated parameter': false
    }
  }
];

For now, see the German documentation for the full explanation.

User-defined replacements[edit]

Example:

var autoFormatReplacements = [
	['Ph.D.', 'PhD'],
	[/ +<ref\b/g, '<ref']
];

The first rule is a simple "string to string" replacement. It replaces all "Ph.D." abbreviations (including "Ph. D." with spaces and non-breaking spaces) with "PhD".

The second rule uses a regular expression to remove all spaces in front of footnotes (ref tags).

See the German documentation for more examples.

What does the script do?[edit]

You need to hit the "Auto-Format" button (Auto-Format) in the toolbar to run the script.

Preparation[edit]

  • Some rules depend on the content language (wgContentLanguage) of the current wiki.
  • Some rules are disabled if var autoFormatLocalisation = false; is set in your common.js.
  • Some rules are disabled on disambiguation pages.
  • Some rules are disabled if there is a selection in the editor window.

Protect sections[edit]

The following tags are protected from almost all replacements except for a few special character replacements: <code>, <hiero>, <html>, <includeonly>, <math>, <nowiki>, <pre>, <score>, <source>, <syntaxhighlight>, <timeline>.

Protect file names[edit]

File names are protected:

  • In regular image inclusions like in [[File:…]]
  • In template parameters if the file name ends with .jpg, .ogg, .pdf or an other common extension.
  • In galleries (with and without the File: prefix).

File names are cleaned:

  • Localize the namespace (if the localization is not disabled).
  • Replace all underscores, non-breaking spaces and stuff like %20 with single spaces.
  • Remove spaces like in File : Example.jpg.

General[edit]

  • Remove newlines from the start of the text.
  • Remove spaces, non-breaking spaces and other invisible characters from the end of all lines (no exceptions).
  • Remove control characters and undefined Unicode numbers (U+0000 to U+0008, U+000C, U+000E to U+001F, U+007F to U+009F).
  • Remove the byte order mark (U+FEFF).
  • Replace all U+00AD, &#xAD; and such with &shy;.
  • Remove the zero width space (U+200B) if it is between two Latin characters (U+0000 to U+024F).
  • Remove the left-to-right mark (U+200E) if it is next to a left-to-right character ([A-Z\]ªµºÀ-ÖØ-öø-\u02B8]).
  • Replace all remaining left-to-right marks (U+200E) with &lrm;.
  • Reduce multiple empty lines.
  • Change the prettytable CSS class name to wikitable.
  • Remove spaces from HTML attributes like attribute = "value".
  • Add non-breaking spaces to paragraphs (§).

Character entity references[edit]

  • Decode decimal- and hexadecimal-encoded characters, except for spaces and control characters.
  • Decode some commonly used named character references, most notably single and double quotation marks (e.g. &quot;), dashes (e.g. &mdash;, &ndash;), and daggers (e.g. &dagger;).
  • Replace all possible notations of non-breaking spaces with &nbsp;.

HTML and XML tags[edit]

  • Replace <source> with <syntaxhighlight>.
  • Replace all <strike> with <s>.
  • Simplify empty <nowiki /> tags.
  • Drop <font family="…"> if it contains default fonts only (Arial, Helvetica, Helvetica Neue and sans-serif).
  • Drop <font size="…"> if it's the default font size.
  • Remove all HTML and XML tags with no content except for <br>, <hr>, <nowiki> and tags with style="clear:…;".
  • Remove useless inline elements <font> and <span> with no attributes.
  • Replace <font color="…"> with <span style="color:…;">.
  • Replace <font size="…"> with <small> or <span style="font-size:larger;"> if applicable.
  • Merge nested <span style="…"><span style="…">.
  • Unify the syntax of <br /> including optional attributes.
  • Remove useless <br /> if there is a break anyway.
  • Remove <small> tags inside and outside of <ref>, <sub>, <sup> and other <small> tags.
  • Drop the navigation bar wrapper class="BoxenVerschmelzen" if it contains only one navigation bar.

Headlines[edit]

  • Attempt to fix broken headlines with different numbers of equal signs.
  • Format headlines with spaces inside the equal signs.
  • Replace all non-breaking spaces in headlines with regular spaces.
  • Remove bold formatting from headlines.
  • Remove colons from the end of headlines.
  • Unify the headline "External Weblinks".

Localization[edit]

  • Localize most keywords and namespaces like DISPLAYTITLE, File:, thumb and so on.
  • Revert localization of CSS keywords like (vertical-align:) baseline, middle and so on.
  • Remove useless right.
  • Set the order of upright|thumb to thumb|upright if it's not the only change in the line.
  • Change miniatur to mini if it's not the only change in the line.

Templates[edit]

  • Remove the namespace from template inclusions.
  • Remove underscores from all template names.
  • Remove empty lines from infobox templates.
  • Switch the template Commons with Commonscat if applicable.
  • Unify redirects and capitalization for many often used templates. See the "cleanTemplates" section in the source for a full list.
  • Switch the template B to Bibel if applicable.
  • Drop useless navigation bar wrapper if it contains only one navigation bar.
  • Fix outdated Normdaten templates.
  • Simplify Waybackarchiv template if applicable.
  • Remove useless /00 from the end of coordinates.

References[edit]

  • Unify the capitalization and spaces in all <ref> and <references> tags.
  • Simplify empty <ref /> and <references /> tags.
  • Close <references> if the closing tag is missing.
  • Remove empty lines between a headline and an empty <references /> tag.
  • Force an empty line after a <references> block in some specific cases.
  • Remove whitespace at the start and end of <ref> tags, but not inside a <references> block.
  • Remove whitespace between punctuation marks and references or between two references.
  • Fix duplicate punctuation marks before and after a reference.

Categories and sorting[edit]

  • Localize the DEFAULTSORT keyword (if localization is not disabled).
  • Replace many special characters in DEFAULTSORT and category lines with the proper ASCII replacement (most Latin languages, Greek, Russian).
  • Remove duplicate spaces from DEFAULTSORT.
  • Drop DEFAULTSORT if it's identical to the article name (not case-sensitive).
  • Upper case the first character in every category.
  • Add an empty line between navigation templates and DEFAULTSORT or the first category.
  • Split all categories to separate lines.
  • Remove empty lines between DEFAULTSORT and the first category.

External links[edit]

  • Remove double brackets from external links.
  • Remove pipes from external links, but only if there is no space in the link.
  • Add slashes to the end of domains.
  • Lower case domain names.
  • Use protocol relative URLs for all remaining internal links.

Internal links[edit]

  • Replace weblinks with the Special:PermanentLink/… syntax if possible.
  • Replace weblinks to other projects and languages with prefixed internal links if possible.
  • Replace the permalink template with a link to the special page.
  • Shorten some fullurl: links.
  • Remove useless prefixes to the own local wiki from internal links.
  • Decode encoded anchors.
  • Decode encoded internal links.
  • Remove underscores from internal links.
  • Change [[Link|Label]]s to [[Link|Labels]] because it's more readable.
  • Change [[Link|Links]] to [[Link]]s because it's shorter.
  • Change possibly misleading links like [[New York|New York City]] to [[New York]] City (can be disabled by the user).

Remove duplicate links[edit]

  • Remove links from dates that start with a year, e.g. [[2001-01-01]] (ISO format) or [[2001/1/1]].
  • Remove all links from dates in the Persondata template (currently only in the German template).
  • Remove duplicate links to years except for linked years in infobox templates.

Typography[edit]

The following rules do not apply to interwiki links.

  • Replace wrong double quotes with the proper typographic characters (only in the German Wikipedia).
  • Replace double quotes in citation templates with single quotes.
  • Replace three dots with the proper character.
  • Replace comma with semicolon in living dates if applicable.
  • Put the proper character in page ranges.
  • Replace the ASCII dash with the proper en dash if applicable.

Dates[edit]

The following rules do not apply to interwiki links.

  • Change the bad German date format 1.1.2000 to 1 January 2000 (depends on the language).
  • Use a spaced dash or the German "bis" in date ranges.
  • Use the proper en dash in year ranges.
  • Expand 1901–02 to 1901–1902 (only in the German Wikipedia by default, can be enabled by the user).

Units[edit]

  • Always add a non-breaking space in front of the following units: CHF, cm, EUR, g, GB, GHz, GiB, Hz, JPY, KB, kB, kg, kHz, KiB, km, m, MB, MHz, MiB, ml, mm, TB, THz, TiB, US$, USD, €, ¥. This does not apply to "US$1", only to "1 US$". The unit $ is not included because of too many false positives.
  • Replace non-breaking spaces in percentage values with regular spaces (only in the German Wikipedia).
  • Replace <sup>erscripted characters with the Unicode characters ² and ³ (only in the German Wikipedia).

ISBN numbers[edit]

Format ISBN numbers, simplify the prefix and add dashes to the proper places depending on the language (currently only English and German books). Does work for both the ISBN magic word and template parameters called ISBN = or similar.

Remove redundant template parameters[edit]

Remove useless "name" or "title" parameters from many often used infoboxes and other templates if the parameter is empty or equal to the article name. See the "cleanRedundantTemplateParameters" section in the source for a full list.

Clean templates by user-defined rules[edit]

This can be used to make infoboxes and all other templates well readable in the source of the articles, remove deprecated parameters and rename parameters. Check the German documentation for the required syntax. By default only the German Persondata template is processed.

Apply user-defined replacements[edit]

This can be used to apply all kinds of user-defined replacements either by using simple string-to-string replacements or complex regular expressions with optional replacement functions. You can find some examples above and in the German documentation.