Mix'n'match/Manual

From Meta, a Wikimedia project coordination wiki
Jump to navigation Jump to search
Translate this page; This page contains changes which are not marked for translation.
Other languages:
English • ‎asturianu • ‎català • ‎dansk • ‎español • ‎français • ‎italiano • ‎polski • ‎suomi • ‎русский • ‎日本語

Mix'n'match is a tool by Magnus Manske, which contains various lists of topics from outside sources. It lets you match these with Wikidata entries, identifying which exist in Wikidata and which still don't have items - "think of a red-link list on steroids".

It currently contains over 2500 catalogues, such as the Oxford Dictionary of National Biography (completed), the Australian Dictionary of Biography (completed), or the National Portrait Gallery's catalogue (52.5% matched).

In this way, it will be easy to see which items are missing from a particular Wikipedia, or which language has the best coverage of a particular topic.

How does it work?[edit]

Mix'n'match divides items into five categories:

An example of the statistics for one of the catalogues.
  1. Fully matched (formerly Manually matched): a user has matched this catalogue entry to a Wikidata item (this includes entries imported from Wikidata);
  2. Preliminarily matched (formerly Automatically matched): the system has guessed one or more possible match for the entry in Wikidata, but a person needs to verify or reject it;
  3. Not on Wikidata (deprecated): this catalogue entry is known to have no matching Wikidata entry;
  4. Not applicable to Wikidata (N/A): the entry has been marked as not relevant to Wikidata (for example, it is a duplicate, a placeholder, a redirect, or simply not an appropriate topic);
  5. Unmatched: this entry has not yet been matched, and there is no automated suggestion available.

The aim is, of course, to mark as many possible entries as manually matched (or confirm that there is no possible Wikidata match). To use the tool, you need to register an account on any Wikimedia project, and authorise the WiDaR tool.

When you open Mix'n'match, a list of catalogs is shown (you may also select a specific catalog). You may then:

  • Search for a specific name using the search box in the header bar. This will bring you to a search result page.
    • See also List mode below for how to use the list of results.
    • In the search result page, you can also limit the search to a specific catalog.
    • You may also search a Qid; this will return all entries that the item is matched to. Searching external ID is not supported.
    • It is not guarentee that the search result page contain all entries matching a specific name; in particular the list may be truncated if there're too many results.
  • Select a specific catalog and then go to a catalog page.

In a catalog page, you will see the number of entries in each category, and the history of number of matches. Clicking a specific category brings you to the List mode. You will also see an "Action" menu, including the following:

  • Fully matched, Preliminarily matched, Unmatched, No Wikidata, Not applicable to Wikidata - links to List mode for all entries in this specific category.
  • Multiple matches - links to List mode for all preliminarily matched entries with multiple automatically-suggested matches.
  • Site stats
  • Download
  • Match mode - see below.
  • Recent Changes in this catalog
  • Aliases
  • Jobs
  • Search only in this catalog
  • Names in other catalogs
  • Manually sync catalog
  • Catalog editor
  • Mobile matching
  • Visual tool
  • Find images
  • Changes last week
  • Catalog report



Match mode[edit]

An example of game mode.

If you choose the match mode (formerly known as semi-automatic mode or game mode), the top of the page you have the ID of the catalog (Catalog ID), the title of the catalog (Catalog Name) and possibly a minimal description supplied from the catalog (Catalog description). This should help you figure out who or what it is.

If the entry is unmatched, you have three choices:

  • Set Q (blue button): If you have identified which Wikidata item matches the catalogue entry, you can paste the Q-number in this box.[1]
  • New item (green button): if you are confident there is no matching Wikidata item. This will create a new Wikidata item for this entry -
  • N/A (red button): for cases where there will never be an appropriate Wikidata item for this entry. The description of newly created item may be inappropriate for Wikidata and may be improved manually

In case of doubt or uncertainty, or there is no matching Wikidata item but you do not want to create an item immediately, you may skip this go to the next element by clicking "Next entry".

If the entry is preliminarily matched, you have two choices:

  • Confirmed (green button): Confirms that the proposed entry is correct.
  • Remove (red button): Confirm that the proposed entry is incorrect. The entry will then become unmatched and may be matched to another (potentially new) item.

If there are multiple automatically-suggested matches, only the first of them is shown, and will be used if "Confirmed" is clicked. You may browse or select other matches using the link to the right of the entry name.

Further down are some suggested links from en.wikipedia, with its link to the item on Wikidata. In the event that the correct item is present there, you can just click on the link to the right (eg. "Q384941") and this will register a match. If the correct item is not among the suggestions, you still have the chance to search through Google on all versions of Wikipedia or Wikisource or on Wikidata.

Whenever you make a connection between a catalogue entry and a Wikidata item, the system will automatically update Wikidata. This will show up as an edit in your contributions.

(Please note that a few entries on Mix'n'match may not have a property already set up - if you're working on one of these, the match will be saved and updated later, if appropriate)

List mode[edit]

An example of manual mode.

A list of entries will be shown when:

  • You click a specific category (e.g. "Unmatched") in a catalog page - all entries in this category will be shown with fifty entries per page.
  • You browse a search result page.

Formerly it is known as manual mode and it may show fifty entries among all categories; this option was removed.

On the first line of the list, you will see the name and (where available) the description of the entry. Each card will also show the status of the entities.

Unmatched[edit]

For items with no suggested match, the second line will present various links that will allow you to make an automatic search on Wikipedia, on Wikidata or Google (limiting the results only to Wikipedia or Wikidata), or even create the item. In the right column, you will have three choices:

  1. Set Q (green link): clicking here brings up a dialog box where you can enter the number of the Wikidata item (with or without the Q in front of the number).
  2. New item (red link): clicking here will create a new item on Wikidata for that entry, that will automatically get name, description (if present) and ID from the catalogue.
  3. N/A (yellow link): clicking here will confirm that the entry should not exist on Wikidata, and can be discarded.

If you have provided a Wikidata item number, the system will automatically update the corresponding Wikidata entry using WiDaR, as in match mode.

Preliminarily matched[edit]

For items with an automatically-suggested match, the second line will have a link to Wikidata along with an auto-generated summary of the Wikidata entry. In the right column, you will have three choices:

  1. Confirm (green link): clicking here confirms that the proposed entry is correct.
  2. Remove (red link): clicking here will confirm that the proposed entry is incorrect. The entry will then become unmatched and may be matched to another (potentially new) item.

Sometimes, a list of alternative matches is available.

Again, the system will make the corresponding edit via WiDaR on Wikidata, if you have made a confirmed match.

Matched[edit]

For items which have already been matched, the second line will have a link to Wikidata along with an auto-generated summary of the Wikidata entry, or have "Not applicable to Wikidata" shown.

On the right column will be the name of the user who made the link, along with a red "Remove" This link should be used only if you believe that the combination made by someone else is wrong. When combined properly, leave everything as it is and move on.

Note that while making a match causes the Wikidata item to be updated, removing a match (currently) does not. If you remove a match on an item, you may want to open that Wikidata item in a new tab and remove the property there as well - otherwise, it may find its way back into mix'n'match in the future.

Creation candidates[edit]

Many entries from catalogs are not (yet!) on Wikidata. Some may not meet the criteria for a Wikidata item, but others are listed in several catalogs, and thus have several external sources, which helps their "noteworthiness" significantly. Entries that have the same name in multiple (>=3) catalogs, but have no associated Wikidata item, can be found via Creation candidates.

An example of creation candidates.

The listed entries have the usual search options, to ensure that no item already exists on Wikidata. One can then create a new Wikidata item, with the (English) label pre-filled. Then, the new item can be matched to the applicable entries via Set Q. One can also search Commons for that label; sometimes, an image of that person already exists there!

Caution: Just because these entries have the same name, does not mean they all refer to the same entity. Please check carefully with the individual catalogs!

Matching tips[edit]

When matching entries to Wikidata items please bear the following tips in mind:

  • Don't guess: guessing will introduce errors into the data. If in doubt follow the link on the catalogue entry, check other catalogs at the bottom of the entry or other information (e.g coordinate location). You can always skip entries and let someone else match it, you can even move to a different catalogue you have more knowledge of.
  • Don't be afraid to create new items: If it isn't exactly the same concept please create a new item. It is much easier to merge two items after the matching has finished than separate an item into two separate items. E.g a World Heritage site for a city often does not cover the same area as the city itself, so a new item should be made.
  • Don't match to disambiguation items: Wikidata items exist for Wikipedia disambiguation pages. These items act as a list of links, rather than a concept to be matched to. Eg Bambaia (Q4853316) should not be matched, Agostino Busti (Q395600) should be.
  • Don't match from disambiguation items: some authority databases have disambiguation or alias pages.
    • Eg RKD Artists used to have an entry for "Bambaia" that was wrongly mapped to Wikidata. (Now RKD Bambaia properly redirects to RKD Augustino Busti)
    • Never match to GND "undifferentiated names"
  • Check the automatic matches: Whilst the automatic matching is often correct it can still get confused between similarly named items.
  • N/A status is exclusively for entries that can never, ever be a Wikidata item, or for known duplicates within the same catalog.
  • Use the 'jobs' option: The 'action' drop-down menu on any catalogue has a 'jobs' option. This gives you a list of tasks that will help with matching. For example, 'auxiliary matcher' will check the dataset for additional identifiers such as VIAF IDs and check them against existing records in Wikidata. If the automatching process has thrown up a lot of low-quality matches, there is the option to 'purge automatches'.

Sorting the catalog list[edit]

By default, the catalog list is sorted alphabetically. The sort_mode parameter can take one or several keywords to alter this:

  • sort_mode=groups groups catalogs by type/subject area, largest groups first, sorted alphabetically within the respective group. Completed catalogs have their own group at the end
  • sort_mode=groups,by_easiest same as above, but "easiest" (#auto-matched+2*#unmatched) to complete first
  • sort_mode=by_easiest,no_complete ungrouped sorting, but "easiest" to complete first, hiding completed catalogs (as they would be "easiest" by default)
  • sort_mode=groups,complete_inline grouped, but with completed catalogs in their respective subject area.

If your favourite catalog is "unknown" or in the wrong group, please let Magnus Manske (talk) know.

Creating new catalogs[edit]

You can create a new catalog and either provide a list of mapping candidates (best to paste them from a spreadsheet) or create a scraper to automatically harvest mapping candidates. Otherwise, ask Magnus Manske (talk) to import a catalog for you.

Tips[edit]

  • The field Wikidata property is for when a property exists for external identifiers. You can propose an external identifier property at Wikidata:Property proposal.
  • Create detailed descriptions for the Entry description field where possible, it will often make it much easier for people to match the catalogue, leading to less incorrect matches and higher data quality.
  • You can add aliases to items to help with the matching process. To import aliases, go to the catalogue and use the drop down 'action' menu in the top right. The 'aliases' option takes you to a page where you can import alternative labels for entries in the mix'n'match dataset. It will need to be in a tab separated format, and will use the dataset's external IDs for matching.

Managing catalogs[edit]

There is a catalog editor, accessible at mix-n-match/#/catalog_editor/<id> for the catalog creator and a subset of users (“catalog editors”). There it is possible to change some of the catalog properties (name, description, URL, type, language and Wikidata property) and to disable a catalog.

Scraper-based catalogs can be updated by following the catalog creation process, and entering an existing ”Catalog ID”.

References[edit]

  1. You may paste the Q-number as "Q123" or as "123". The software accepts also other characters, such as parenthesis or commas, as long as the Q-number you provide is valid.

Links[edit]