Mix'n'match/Manual

From Meta, a Wikimedia project coordination wiki
Jump to: navigation, search
Translate this page; This page contains changes which are not marked for translation.

Other languages:
English • ‎español • ‎suomi • ‎français • ‎italiano • ‎polski • ‎русский

Mix'n'match is a tool by Magnus Manske, which contains various lists of topics from outside sources. It lets you match these with Wikidata entries, identifying which exist in Wikipedia and which still don't have articles - "think of a red-link list on steroids".

It currently contains over 1000 catalogues, such as the Oxford Dictionary of National Biography (completed), the Australian Dictionary of Biography (completed), or the National Portrait Gallery's catalogue (42.5% matched).

In this way, it will be easy to see which items are missing from a particular Wikipedia, or which language has the best coverage of a particular topic.

How does it work?[edit]

Mix'n'match divides items into five categories:

An example of the statistics for one of the catalogues.
  1. Manually matched: a user has matched this catalogue entry to a Wikidata item (this includes entries imported from Wikidata);
  2. Automatically matched: the system has guessed a possible match for the entry in Wikidata, but a person needs to verify or reject it;
  3. Not on Wikidata: this catalogue entry is known to have no matching Wikidata entry;
  4. Not applicable to Wikidata (N/A): the entry has been marked as not relevant to Wikidata (for example, it is a duplicate, a placeholder, a redirect, or simply not an appropriate topic);
  5. Unmatched: this entry has not yet been matched, and there is no automated suggestion available.

The aim is, of course, to mark as many possible entries as manually matched (or confirm that there is no possible Wikidata match). To use the tool, you need to register an account on any Wikimedia project, and authorise the WiDaR tool.

Now that you have authorized Widar, you can choose two different modes: semi-automatic or manual.

Semi-automatic mode (Game mode)[edit]

An example of game mode.

If you choose the semi-automatic mode, the top of the page you have the ID of the catalog (Catalog ID), the title of the catalog (Catalog Name) and possibly a minimal description always supplied from the catalog (Catalog description). This should help you figure out who or what it is.

Below you have four choices:

  • Set Q (blue button): If you have identified which Wikidata item matches the catalogue entry, you can paste the Q-number in this box.[1]
  • No Wikidata entry (orange button): if you are confident there is no matching Wikidata item.
  • N/A (red button): for cases where there will never be an appropriate Wikidata item for this entry.
  • Skip (grey button): in case of doubt or uncertainty, just go to the next element.

Further down are some suggested links from en.wikipedia, with its link to the item on Wikidata. In the event that the correct item is present there, you can just click on the link to the right (eg. "Q384941") and this will register a match. If the correct item is not among the suggestions, you still have the chance to search through Google on all versions of Wikipedia or Wikisource or on Wikidata.

Whenever you make a connection between a catalogue entry and a Wikidata item, the system will automatically update Wikidata. This will show up as an edit in your contributions.

(Please note that a few entries on Mix'n'match may not have a property already set up - if you're working on one of these, the match will be saved and updated later, if appropriate)

Manual mode (Manual)[edit]

An example of manual mode.

If you choose the manual mode, you'll see a list of fifty entries from the catalogue that you have selected. On the first line, you will see the name and (where available) the description of the entry. Each card will feature a colour that indicates the status of the item:

  • red: the item is currently unmatched.
  • lilac: the system has proposed a possible match, but a user has to approve or reject this suggestion.
  • green: the item has been confirmed by a user.[2]

Items needing to be validated manually (red)[edit]

For items with no suggested match, the second line will present various links that will allow you to make an automatic search on Wikipedia, on Wikidata or Google (limiting the results only to Wikipedia or Wikidata), or even create the item. In the right column, you will have three choices:

  1. Set Q (green link): clicking here brings up a dialog box where you can enter the number of the Wikidata item (with or without the Q in front of the number).
  2. New item (red link): clicking here will create a new item on Wikidata for that entry, that will automatically get name, description (if present) and ID from the catalogue.
  3. N/A (yellow link): clicking here will confirm that the entry should not exist on Wikidata, and can be discarded.

In all three cases, once you have made your choice, the colour will change from red to green. If you have provided a Wikidata item number, the system will automatically update the corresponding Wikidata entry using WiDaR, as in game mode.

Elements combined automatically (lilac)[edit]

For items with an automatically-suggested match, the second line will have a link to Wikidata along with an auto-generated summary of the Wikidata entry. In the right column, you will have three choices:

  1. Confirm (green link): clicking here confirms that the proposed entry is correct.
  2. Remove (red link): clicking here will confirm that the entry does not exist on Wikidata (but it might be appropriate in future)
  3. N/A (yellow link): clicking here will confirm that the entry should not exist on Wikidata, and can be discarded.

Again, the system will make the corresponding edit via WiDaR on Wikidata, if you have made a confirmed match.

Elements validated manually (green)[edit]

For items which have already been matched, the second line will have a link to Wikidata along with an auto-generated summary of the Wikidata entry.

On the right column will be the name of the user who made the link, along with a red "Remove match." This link should be used only if you believe that the combination made by someone else is wrong. When combined properly, leave everything as it is and move on.

Note that while making a match causes the Wikidata item to be updated, removing a match (currently) does not. If you remove a match on an item, you may want to open that Wikidata item in a new tab and remove the property there as well - otherwise, it may find its way back into mix'n'match in the future.

Creation candidates[edit]

Many entries from catalogs are not (yet!) on Wikidata. Some may not meet the criteria for a Wikidata item, but others are listed in several catalogs, and thus have several external sources, which helps their "noteworthiness" significantly. Entries that have the same name in multiple (>=3) catalogs, but have no associated Wikidata item, can be found via Creation candidates.

An example of creation candidates.

The listed entries have the usual search options, to ensure that no item already exists on Wikidata. One can then create a new Wikidata item, with the (English) label pre-filled. Then, the new item can be matched to the applicable entries via Set Q. One can also search Commons for that label; sometimes, an image of that person already exists there!

Caution: Just because these entries have the same name, does not mean they all refer to the same entity. Please check carefully with the individual catalogs!

Matching tips[edit]

When matching entries to Wikidata items please bear the following tips in mind:

  • Don't guess: guessing will introduce errors into the data. If in doubt follow the link on the catalogue entry, check other catalogs at the bottom of the entry or other information (e.g coordinate location). You can always skip entries and let someone else match it, you can even move to a different catalogue you have more knowledge of.
  • Don't be afraid to create new items: If it isn't exactly the same concept please create a new item. It is much easier to merge two items after the matching has finished than seperate an item into two separate items. E.g a World Heritage site for a city often does not cover the same area as the city itself, so a new item should be made.
  • Don't match to disambiguation items: Wikidata items exist for Wikipedia disambiguation pages. These items act as a list of links, rather than a concept to be matched to. Eg Bambaia (Q4853316) should not be matched, Agostino Busti (Q395600) should be.
  • Don't match from disambiguation items: some authority databases have disambiguation or alias pages.
    • Eg RKD Artists used to have an entry for "Bambaia" that was wrongly mapped to Wikidata. (Now RKD Bambaia properly redirects to RKD Augustino Busti)
    • Never match to GND "undifferentiated names"
  • Check the automatic matches: Whilst the automatic matching is often correct it can still get confused between similarly named items.

Sorting the catalog list[edit]

By default, the catalog list is sorted alphabetically. The sort_mode parameter can take one or several keywords to alter this:

  • sort_mode=groups groups catalogs by type/subject area, largest groups first, sorted alphabetically within the respective group. Completed catalogs have their own group at the end
  • sort_mode=groups,by_easiest same as above, but "easiest" (#auto-matched+2*#unmatched) to complete first
  • sort_mode=by_easiest,no_complete ungrouped sorting, but "easiest" to complete first, hiding completed catalogs (as they would be "easiest" by default)
  • sort_mode=groups,complete_inline grouped, but with completed catalogs in their respective subject area.

If your favourite catalog is "unknown" or in the wrong group, please let Magnus Manske (talk) know.

Creating new catalogs[edit]

You can create a new catalog and either provide a list of mapping candidates (best to paste them from a spreadsheet) or create a scraper to automatically harvest mapping candidates. Otherwise, ask Magnus Manske (talk) to import a catalog for you.

Tips[edit]

  • The field Wikidata property is for when a property exists for external identifiers. You can propose an external identifier property at Wikidata:Property proposal.
  • Create detailed descriptions for the Entry description field where possible, it will often make it much easier for people to match the catalogue, leading to less incorrect matches and higher data quality.

References[edit]

  1. You may paste the Q-number as "Q123" or as "123". The software accepts also other characters, such as parenthesis or commas, as long as the Q-number you provide is valid.
  2. This means that the item has been matched to a Wikidata item (in this case, a link to the latter will appear on the row below) OR has been defined as absent on Wikidata (in this case, a red "Not on Wikidata" advice will appear on the row below) OR as not relevant to Wikidata (in this case, a "This entry is not relevant to Wikidata" advice will appear on the row below).

Links[edit]