Talk:Mix'n'match/Archive 1

From Meta, a Wikimedia project coordination wiki
Jump to navigation Jump to search

Before using

You have sent messages about Mix'n'match over 172 Wikipedias but practically without any explanations. I have some questions but first of all I would to ask whether this tool will be maintained and developed for a sufficient time period. Otherwise it is not worthwhile to bother. — Ace111 (talk) 18:35, 10 October 2014 (UTC)[reply]

I've not added many explanations because those can be added here (and translated, if necessary). There's also a blog post, linked from the page. Your question is interesting because I would never have thought of writing something about that, no matter how long a message I had written. :-)
Can you explain better what makes you think a tool is not worthwhile using if it's not maintained for long? We have hundreds of tools on Toolserver and Labs, there are no promises on their maintenance or they wouldn't be tools. If you specify what you want maintenance for, then perhaps Magnus can tell you whether he can promise that; or we can look for more maintainers to be able to reach that level.
If I were a user worried about the tool disappearing tomorrow, I'd just use it for the pretty statistics ( more complete than how many women scientists in my wiki? how can lack hundreds of articles on medieval authors etc.?) and, as the blog post says, for the "lists of red links on steroids". If you create an article following a red link, you usually don't ask about preservation of the red link, only of the article. :) --Nemo 21:24, 10 October 2014 (UTC)[reply]

Updating Wikidata from catalogs

Once we've added a Q number to a catalog item, does a bot follow along and add the catalog statement to the Wikidata item, or do we need to do that by hand? - PKM (talk) 21:02, 12 November 2014 (UTC)[reply]

Not all the catalogs even have a property on Wikidata, so the answer can't certainly be "yes" in general. I have no idea about the catalogs which do have a Wikidata property for their identifiers. --Nemo 21:13, 12 November 2014 (UTC)[reply]
Okay, thanks. - PKM (talk) 01:09, 15 November 2014 (UTC)[reply]
Such props are "instance of" Wikidata property for authority control, eg see ULAN identifier. Most of them also have "subject item" pointing to the respective catalog, eg Union List of Artist Names. But it's either not possile to query for properties in WDQ, or my WDQish is not good yet.
Here's the list that Reasonator uses.
Yes, it's possible to update Wikidata from Mix-n-match: press "Y" next to the catalog name. Eg for ODNB the link is: --Vladimir Alexiev (talk) 12:57, 14 January 2015 (UTC)[reply]

Now Mix-n-match adds the coreferenced id as claim immediately. --Vladimir Alexiev (talk) 18:50, 13 February 2015 (UTC)[reply]

British Museum person-institution thesaurus

@Magnus Manske: The British Museum person-institution thesaurus has 176461 entries that are not coreferenced to anything in the world. I think they'd see it as a major win if the community helps them to coreference.

This could be followed by importing the 2.5M cultural objects of the BM.

Cheers! --Vladimir Alexiev (talk) 15:40, 25 January 2015 (UTC)[reply]

      • Use the data above, not the query below***
  • SPARQL query returns url name otherNames type gender profession nationality birthDate deathDate note
prefix ecrm: <>
  select ?x ?name ?otherNames ?type ?gender ?profession ?nationality ?birth ?death ?note {
    ?x skos:inScheme id:person-institution.
    {select ?x ?name (group_concat(?other; separator="; ") as ?otherNames)
      {?x skos:inScheme id:person-institution; skos:prefLabel ?name.
        optional {?x ecrm:P131_is_identified_by/rdfs:label ?other filter(?other != ?name)}}
      group by ?x ?name}
    bind(if(exists{?x a ecrm:E21_Person},"Person","") as ?type)
    optional {?x bmo:PX_gender ?gender1. 
      bind(strafter(str(?gender1),"gender/") as ?gender2) 
      bind(if(?gender2="m","male",?gender2) as ?gender)}
    {select ?x (group_concat(?prof1; separator="; ") as ?profession)
      {?x skos:inScheme id:person-institution; bmo:PX_profession [rdfs:label ?prof1]} group by ?x}
    {select ?x (group_concat(?nation1; separator="; ") as ?nationality)
      {?x skos:inScheme id:person-institution; bmo:PX_nationality [skos:prefLabel ?nation1]} group by ?x}
    {select ?x (group_concat(?birth1; separator="; ") as ?birth)
      {?x skos:inScheme id:person-institution; ecrm:P92i_was_brought_into_existence_by [ecrm:P4_has_time-span [rdfs:label ?birth1]]} group by ?x}
    {select ?x (group_concat(?death1; separator="; ") as ?death)
      {?x skos:inScheme id:person-institution; ecrm:P93i_was_taken_out_of_existence_by [ecrm:P4_has_time-span [rdfs:label ?death1]]} group by ?x}
    optional {?x ecrm:P3_has_note ?note}
 "nationality": { "type": "literal", "value": "" }, 
 "name": { "type": "literal", "value": "Francisco Pizarro" }, 
 "profession": { "type": "literal", "value": "" }, 
 "gender": { "type": "literal", "value": "male" }, 
 "type": { "type": "literal", "value": "Person" }, 
 "deathDate": { "type": "typed-literal", "datatype": "http:\/\/\/2001\/XMLSchema#date", "value": "1541-01-01" }, 
 "note": { "type": "literal", "value": "Conquistador; born Trujillo, Castile, conqueror of Peru and founder of Lima (1533). \n\nHe is subject of ..." }, 
 "otherNames": { "type": "literal", "value": "Pizarro, Francisco" }, 
 "x": { "type": "uri", "value": "http:\/\/\/id\/person-institution\/113757" }

Cheers! --Vladimir Alexiev (talk) 18:09, 16 January 2015 (UTC)[reply]

A CVS option

The nice folks at have provided CSV dumps of the thesauri at

  • bmPerson-InstitutionThesauri.csv provides fields URI,Label,ScopeNote,Gender,Nationality while the above provides more, in particular otherNames and years.
  • Nationality is a multivalue field, which is not handled properly. We can find instances with multiple values on the SPARQL endpoint:
        select * {?x bmo:PX_nationality ?n1,?n2 filter(str(?n1)<str(?n2))}
        grep 142680 bmPerson-InstitutionThesauri.csv
@Vladimir Alexiev: I have imported the CSV set into Mix'n'match as "BMT" (British Museum Thesaurus). Dumb automatching finds ~15% of entries. Does this have a Wikidata property? If not, should it? --Magnus Manske (talk) 16:15, 20 January 2015 (UTC)[reply]
Ah, I saw (and supported) your property proposal! :-) --Magnus Manske (talk) 16:32, 20 January 2015 (UTC)[reply]
@Magnus Manske: So quick, great! But looking in MnM, many entries have just a name, so it's a poor basis for making decisions. Don't you need also at least the years and otherNames? --Vladimir Alexiev (talk) 15:47, 21 January 2015 (UTC)[reply]
Well, I imported all of the CSV, and the JSON is invalid format, so... --Magnus Manske (talk) 10:53, 22 January 2015 (UTC)[reply]

Twitting and Blogging

"#wikidata has the power to make large-scale #coreferencing between Authority Files work"

Tweeted about this affair, please retweet:

Filter out Disambiguation entries and Un-notable Persons

I wrote about this to Jane:

Posted as

@Magnus Manske: We should match persons to persons, not disambiguation pages to persons or to disambiguation pages.

  • RKD artists has disambiguation pages: both RKD artists:Il Bambaia and RKD artists:Bambaia say "See: Busti, Agostino". Don't know if you can recognize them, but if you can: please filter them out.
  • The matching algorithm should not select Wikimedia disambiguation pages as candidates
  • The Mix-n-match UI should not allow coreferencing to Wikimedia disambiguation pages (or at least should warn "Please read the disambiguation page and select one of the persons there")

The BM Thesaurus has a bunch of entries where the description contains "Issued tokens", eg

  • Henry Turner: Person; male; Issued tokens. Baker and possible member of the Bakers' Company.; retailer/tradesman; English
  • H Tuttle: Issued tokens (Lowestoft).; English
  • Simon Turner: Person; male; Issued tokens. Possible member of the Grocers' Company. Possibly associated and possibly neighbouring a tavern called The Pie or The Magpie.; retailer/tradesman; English

These are minor tradesmen or pub owners that coined some tokens. 100% of the 20-30 ones I checked are not on WD, not on Wikipedia, and unlikely to have any notability. So please filter them out from this catalog.

— The preceding unsigned comment was added by Vladimir Alexiev (talk) 19:15, 13 February 2015 (UTC)[reply]

This feature has "always" existed: each entry has a button to mark it unsuitable for Wikidata. --Nemo 06:42, 14 June 2015 (UTC)[reply]
@Nemo bis: the problem is that a big percent of BM entries are such minor tradesmen. Would be great if @Magnus Manske: can filter out entries having substring "Issued tokens" automatically --Vladimir Alexiev (talk) 11:36, 20 January 2016 (UTC)[reply]
Well, there is an issue specifically with RDK artists: their xrefs (not: disambiguation) have id's of their own and they well may sum up to more than 50% of all identifiers. Fortunately tens of thousands of xrefs are already marked as N/A but their sheer number makes it impossible to leaf through the N/As in order to spot and inspect entries declared N/A for any other reason. So specifically for RKDartists one would wish that these xref entries would already have been kept out of the original import of the data set or if one had an "xref" tag different from "N/A" but identical in functionality...
WRT disambiguation pages on Wikidata: Many of them are wrongly tagged in Wikidata and some datasets like DMNES have a strong affinity to disambiguation pages and I don't see any means of "solving" that. -- Gymel (talk) 13:09, 14 June 2015 (UTC)[reply]

Coreference AAT

@Magnus Manske: Can we coreference AAT, which is a concept (not person) thesaurus?

Oh, I now see AAT is already added! I can swear it wasn't there 2 days ago :-)

  • 11% are auto-matched, I will look at how good they are
  • Parent levels in brackets are omitted, probably because you don't escape the brackets Yes check.svg Done. Eg AAT:dimidiating rhyta
    • has: "rhyta, drinking vessels, <vessels for serving and consuming food>, <containers for serving and consuming food>, culinary containers, <containers by function or context>, containers (receptacles), Containers (Hierarchy Name), Furnishings and Equipment (Hierarchy Name), Objects Facet"
    • but you only show: "rhyta, drinking vessels, , , culinary containers, , containers (receptacles), Containers (Hierarchy Name), Furnishings and Equipment (Hie"
    • Maybe don't cut-off the parents?
  • it would be too much to show the Scope Note at the first go. But we need it in a tooltip or something!
    • Eg for AAT:underlayments, it's not enough to see parents "floor components, , surface elements (architectural), "
    • To figure out which item (if any) on the disambiguation page is a match, we need the scope note.
    • It's way to inconvenient to get the scope note ("Plywood, hardboard, or other material placed on a subfloor to provide a smooth, even surface for applying the finish.") from Getty's site because it's 3 clicks away, and the language is not indicated (so I first hit the Chinese note that is no use to me).
    • So please load the scope notes from my export file

--Vladimir Alexiev (talk) 10:56, 2 April 2015 (UTC)[reply]

@Magnus Manske: said adding the scope note as a tooltip would require major changes. Then please show it in addition to the parents, it really is important for accurate matching --Vladimir Alexiev (talk) 13:46, 9 April 2015 (UTC)[reply]

I made an evaluation of the current automatic matches at

  • Looked at the first 25 matches, which seem to be randomly distributed
  • Precision is about 50%
  • 2/3 of the incorrect matches (8 of 12) are due to WD named entities (albums, locations, books). AAT includes no named entities, so such matches are outright impossible. WD has no explicit class for "named entity" but one could try to filter by high-level classes such as Human, Location, Work.
  • In addition to Wikipedia, correct matches include Wikipedia categories and Wikisource (a dictionary article)
  • Recall estimation: if half of the 11% WD auto-matched are correct, that makes 5.5% or 2.2k
    • That's 15% of AAT-Wordnet corefs (15k)
    • It's 7-9% of the potential matches (I believe that 25-30k of all AAT concepts are present in WD)
  • Could also add alternative labels, and labels in other languages: will this help the matching?

I'll try to salvage AAT-Wordnet-Wikipedia corefs through BabelNet: . --Vladimir Alexiev (talk) 14:35, 3 April 2015 (UTC)[reply]

Faulty autodescribe?

Mix-n-match describes Q190928 shipyard (in match against AAT:shipyards) as "Construction site, dock, and organization in Russia; places where ships are repaired and built".

But only the second of these descriptions is in Wikidata. So where does "Construction site, dock, and organization in Russia" come from?

It is auto-generated from the statements. If it's wrong, fix the statements :-) --Magnus Manske (talk) 18:29, 5 April 2015 (UTC)[reply]

@Magnus Manske: Margaret Busby is auto-described as "born in 2000" or "*2000". But her birthdate statement says "20. century". 2000 is the last year of the 20th century, to be sure, but it's quite misleading. Can you auto-describe using '*20. century'? Runner1928 (talk) 17:49, 20 November 2015 (UTC)[reply]

Just another example where the auto-describe of a century date makes things look funky: Leocadia. Born 300 (really 3rd century), died 304. --Dcheney (talk) 12:25, 5 November 2018 (UTC)[reply]

Problem with Catholic Hierarchy's catalogues?

It seems the two CH catalogues - the one about bishops and the one about dioceses - do not automatically transfer the manually-confirmed data to Wikidata, while this happens with other catalogues. Can/Should it be fixed? --Sannita - not just another sysop 00:12, 3 April 2015 (UTC)[reply]

I didn't even know of such a sync. Is that what d:user:Reinheitsgebot does? --Nemo 06:45, 3 April 2015 (UTC)[reply]
If you hover over a specific catalogue, there should be a sync option somewhere. Simply gives a filled in QuickStatements-page. You can also sync values from Wikidata. Sjoerd de Bruin (talk) 18:16, 3 April 2015 (UTC)[reply]
@Nemo bis: It seems that it isn't. :/
@Sjoerddebruin: Believe me or not, if you hover on those two particular catalogues there's no "sync" link at all. --Sannita - not just another sysop 20:17, 3 April 2015 (UTC)[reply]
The sync link only appears for catalogs that have a "direct" property, and for those that have a "hacked property" (via qualifier, e.g. CE1913). I try to not use the hacked one anymore; if there is a property for CH (bishops or dioceses), please let me know! --Magnus Manske (talk) 18:28, 5 April 2015 (UTC)[reply]
@Magnus Manske: Thank you very much for your answer. :) Actually there's P1047 for the bishops, for the dioceses I'll check if there's any property proposal, and if not, I'll ask it. --Sannita - not just another sysop 09:32, 7 April 2015 (UTC)[reply]

Sort the catalogs

@Magnus Manske: 54 catalogs, wow that is great progress! But please sort them, since it becomes quite hard to see what's there. --Vladimir Alexiev (talk) 13:48, 9 April 2015 (UTC)[reply]

They are sorted by the catalog ID; AAT is first, YourPaintings last :-) --Magnus Manske (talk) 13:53, 9 April 2015 (UTC)[reply]

Search in a few catalogs

Hi Magnus, it's great that we can now exclude certain catalogs from the search, but could you also add a similar link that defaults to checking only one catalog instead of all of them? For some names, like "Smith", it would be nice to be able to search in a single catalog at a time. Thx Jane023 (talk) 06:53, 5 May 2015 (UTC)[reply]

On the catalog page, in the top row with all the links, there is now "search in this catalog only". Example. --Magnus Manske (talk) 14:04, 6 May 2015 (UTC)[reply]

SBN identifier

@Magnus Manske and Sannita: On Wikidata we already have the SBN identifier, the authority code from the Italian national library service. However, there are still few items with that code, probably all added manually. How about add this dataset to mix'n'match? All existing records can be extracted from here putting 2015 in the second form (the one with "a:") of "Morte/Fine". An example is the Aafjes Bertus record that has a litte bio plus the ID: IT\ICCU\SBLV\026326. --AlessioMela (talk) 13:54, 12 May 2015 (UTC)[reply]

Importing them now. Will take a while, as I have to scrape every page individually, and there are ~65K... --Magnus Manske (talk) 17:16, 12 May 2015 (UTC)[reply]
Yes, no API and thousand of pages to scraping, but you're great! Thanks! --AlessioMela (talk) 17:41, 12 May 2015 (UTC)[reply]
Note, that's actually only 4 % of the total SBN identifiers, i.e. those which are public (and that VIAF is supposed to have as well). Vedi anche [2] in italiano. --Nemo 19:15, 12 May 2015 (UTC)[reply]
True but I haven't found (there is?) the full list of identifiers. The site exposes only authority files, our 65k records, the 4%. Moreover the WD property works well only with "the 4%" because links to those records. The other 95% hasn't really (the trick showing book lists was rejected on WD) a frontview in the sbn site. --AlessioMela (talk) 19:38, 12 May 2015 (UTC) I mean, like I read in the ML you linked, that is more a sbn's problem, because they expose just a little part of their identifiers (and in such an unfriendly way).[reply]
@Magnus Manske: Thank you very much. I'm working to convince ICCU to release more of those identifiers. This will definitely work on our side. ;) --Sannita - not just another sysop 12:26, 13 May 2015 (UTC)[reply]


Would it be possible to add to mixnmatch? I have been in contact with John Alroy and he welcomes the incoming links. One can get the XML information by using:

The ID that is inside of the XML tags already has a property: d:Property:P842. Currently 8538 of 319948 taxa on Fossilworks are matched. But we probably only have a fraction of that number on Wikidata. --Tobias1984 (talk) 19:12, 12 May 2015 (UTC)[reply]

Hmm. They have an extensive download form, but I can't seem to get a simple list of all their taxa with IDs. Ideas? --Magnus Manske (talk) 15:52, 28 May 2015 (UTC)[reply]

NNDB links to others authorities, including Wikipedia

@Magnus Manske: I noticed that nndb has a "bibliography" for each entry. For example Patrick Stewart has an entry and a bibliography. On the bottom of the last one, there is a list of authority's links, including Wikipedia. May we use these infos both for matching with Wikidata and both to obtain matches with others ID in one shot? --AlessioMela (talk) 13:55, 20 May 2015 (UTC)[reply]

Could do, though that means scraping >40K pages... I'll put it on the list. --Magnus Manske (talk) 15:46, 28 May 2015 (UTC)[reply]

Issue with Dictionary of Art Historians sync

The matching now is reported as complete again, however #2 of the duplicates shown at the DAH sync page is Q84305 which therefore should have two instances of propertiy P1343 but hasn't got any! -- Gymel (talk) 05:53, 29 May 2015 (UTC)[reply]

Feature Request: Assignment "coupled" entries

If you search for "Amy Katherine Browning", there currently are three entries from three different sources, quite obviously the same person and wether there exist a wikidata item or has to be created, they'll hopefully end up at the same item. When an item does already exist, I can assign them individually to that item, that's a mechanical repetition of an easy operation. It would now be nice to have at this place also an easy possibility to create a corresponding wikidata item, like under "Creation Candidates" (or to have a method to steer "Creation Candidates" to a specific name): Marking the entries as "Not in Wikidata" would substantially increase the risk of creating duplicates in Wikidata (different catalogues will by sync'ed at different times and of course one does not re-check all the entries already marked as "not in wikidata" over and over again) which may go unnoticed for a long time, and creating the appropriate item by hand is quite an effort one not always is inclined to take right at the moment. Actually, I decided to search for the name from the example in the context of sifting through the RDK artist's list of entries marked N/A, resetting that name to "unmatched": The name deemed so significant and the RKD information so rich that I tried the search in other catalogues whether someone did happen to find a matching wikidata item. Thus my "main context" was working through a specific list and during that I didn't want to spend too much time creating even "important" new WD items or deciding which of the catalogues would be in shape for a "spontaneos" item creation run. Besides, also ULAN knows her as "Amy Katherine Dugdale", equally unmatched. So perhaps what's really needed is a tool to "connect" entries from different catalogues first and then finish with "instant creation" - or to equip entries with identifiers from additional catalogues on creation - as I understand these will turn into matches upon the next sync of the respective catalogues. -- Gymel (talk) 13:42, 31 May 2015 (UTC)[reply]

Trying to extract from large blob of text above: You suggest a "create item" link under search results, for an item with the name you just searched. I've just added that, it's straightforward enough. As a remark, I usually use "Search Wikidata" as the final step before creating an item, to ensure it doesn't already exist, and there is a "create item" link on the search results page. I'm not quite sure how you want to identify iidentical items across catalogs to "connect" them before creation though. --Magnus Manske (talk) 14:31, 31 May 2015 (UTC)[reply]
Thanks, this already helps a bit. What I was aiming at, however (and sorry for the bloabber), was the possibility to create an Wikidata item for one of the entries I'm seeing in the search result. The steps were: I did try all the searching possibilities and could not match. But additionally I have the strong suspicion that other catalogues should also contain this entry, and a (quite loosely formulated) text search supports this: Several other catalogues have that item, either still unmatched or marked as "not in WD" (or sadly "N/A"). When in that situation creation of a meaningful item is made simple enough, I would be able to perform several, parenthetic, "cross-catalogue", assignments from the result list at hand. -- Gymel (talk) 18:10, 31 May 2015 (UTC)[reply]
So what exactly would you want me to build? A link with every search result, to search for that entry's full name? Or something more generic, like just the last name (if it is, indeed, a name; there are other entry types in the catalogs)? How could I automatically collate "Johann Sebastian Bach", "Johann Bach", and "J. S. Bach"? --Magnus Manske (talk) 18:52, 31 May 2015 (UTC)[reply]
Probably more in the lines of "Create item from this" for each result entry which is either not matched or marked "not in WD". My expectation is that this would prefill English label and description from the entry and insert the identifier of the selected entry into the item created: That would be a good starting point to manually connect other results from the query. I see that this might be a potentially dangerous offer to the casual user... -- Gymel (talk) 21:55, 31 May 2015 (UTC)[reply]
Done. Warning in "hover title". --Magnus Manske (talk) 22:04, 31 May 2015 (UTC)[reply]
Sorry for the delay. Testing it with RKDArtists it turned out that having dutch phrases in english labels is not perfect, but changing the prefilled label is still easier than memorizing things and fill out blank forms, and personally I find the new solution really helpful. Thanks a lot. -- Gymel (talk) 22:25, 3 June 2015 (UTC)[reply]

Syncing credits or responsibility

Earlier today I was quite surprised to be suddenly marked as responsible for a triplet in the NPG matching for "W. Allen" (Q6840743), because I still remembered to have removed those matches the day before. Turned out to be kind of a tracking glitch, Mix'n'match seems to have missed my changes on Wikidata. When I synced the "connections on Wikidata, but not here", Mix'n'match "read" them back and maked me as responsible for the "match". Removing the matches again in Mix'n'match and performing an arbitrary edit on the wikidata item resolved the issue, but wouldn't it make more sense to credit matches created by syncing Wikidata to M'n'm rather to a "Wikidata sync" pseudo user than to the user who happened to perform the sync? I mean, there is no chance to assess those matches before the sync and blaming Wikidata instead of the executing user might be more on the spot when it comes to cleaning up some mess. -- Gymel (talk) 22:36, 3 June 2015 (UTC)[reply]

Unsyncable entries, please stand up

RKDartists has tons of cross reference entries (Peter Miller see Miller, Peter), each with an identifier of its own. Fortunately enough %gt; 75.000 are already marked as "N/A". However some of them seem to have been matched to Wikidata before they have been reassessed and marked as N/A. I have worked through the constraint report ("single value" violations) for the corresponding Property P650, the residue should be in correspondence to the 78 "double Q's in this catalogue". However the announcement "287 items not in this catalogue but in wikidata" stays constant when trying to sync.

My theory is, that these are cases where only one identifier is stored in Wikidata and this identifier is marked as N/A in Mix'n'match (because it's a cross reference identifier or an identifier yielding a 404 error page or someone made an error). More generally perhaps, because syncing would create a conflicting assignment in Mix'n'match (unlikely in this case, because then syncing into the other direction would provoke a constraint violation on Wikidata). If so, it would be nice to have a list of these identifiers for substituting the values on wikidata by the appropriate ones one gets when following the cross reference. Alternatively on can just wait until subsequent independing matching of the appropriate entry makes itself visible by triggereing a constraint violation message.

To get this list one would have to dump the Mix'n'match data for the given catalogue, export a beacon file for the corresponding property, diffing/cancelling out all the entries with coinciding values to get a list of those Q-items with "surplus" values on Wikdiata, which then could be pasted to autolist to yield something clickable to operate on (here I'm rather thinking of inspecting the items, not bulk-deleting the property). I imagine it would be quite easy to provide an "inspect" or "preview" link next to the "sync" button in Mix'n'match - activating that link would simply show a list of Q-items (and corresponding values) for the entries Mix'n'match announces as "syncable". -- Gymel (talk) 23:05, 3 June 2015 (UTC)[reply]

Manual for Mix'n'match

Just to notice you all that I wrote a manual for using Mix'n'match, since I received some requests for a how-to. Just a couple of advices:

  1. unfortunately, for the moment it's only in Italian, 'cause it is my mother languages and all the requests I had for a manual were from Italian-speaking users, but I promise I'll do my best in the next days to start translating it also in English. :) I also marked it for translation, so that more languages may follow;
  2. it just deal with the semi-automatic and manual games, all the remaining stuff is still to be done, so any help is welcome. :)

Let me know your thoughts about it! --Sannita - not just another sysop 15:46, 11 June 2015 (UTC)[reply]

I've put an English translation at Mix'n'match/en - I don't know if we want to keep this at a subpage or include it in the main entry, though. Andrew Gray (talk) 18:11, 19 June 2015 (UTC)[reply]

Merged items

What happens when a Wikidata item matched to a mix'n'match entry is merged to another item? Does mix'n'match follow redirects, once created? Does it continue believing in the deprecated ID? Does it report the change in some way? While working on BEIC ids I regularly find a few duplicates and merge them, but I worry that I'm "losing" them; same (and worse) for items I split, but that's rarer. --Federico Leva (BEIC) (talk) 06:35, 14 June 2015 (UTC)[reply]

Adding the property to the redirected item will yield an error on syncing from M'n'M to Wikidata. Syncing from Wikidata to M'n'N for the identifier moved to the new target item won't happen as long there is the old association in Mix'n'Match. Thus merging items will increment both the "not here" and "not in Wikidata" counters without any direct indication what the items or identifiers involved might be or that these are coupled in the sense of concerning the same id. There definitely is room for improvement, see my comment #Unsyncable entries, please stand up above.
However I'm not sure if Mix'n'Match should implement some automatic remedy feature: How long should it wait (a erroneous merge could be reverted ~shortly~ after)? Who should be recorded as the person or process responsible for the match: The original match has been performed by X, the merge has been executed by some Wikidata user Y with no record on Mix'n'Match, the merge then has been tracked by some automated Mix'n'Match subcomponent Z: There is no clear answer (cf. my comment #Syncing credits or responsibility above).
Repairing associations in other cases may be tedious and frustrating too: Usually they will be visible in either "double Q's in this catalog" or the corresponding constraint report in Wikidata: But assume the constriant violation is resolved by some Wikidata user not aware of Mix'n'match, there is the dange of the next sync re-inserting the same value into wikidata, triggering the problem report for the same problem again. Generally, removing associations in Mix'n'Match does leave the id in the corresponding Wikidata item (you won't know what this was after the removal) and the next sync may import it back to mix'n'match. So perhaps (for logged in users) on removal of a match the tool should ask "remove from Wikidata, too?" or even make this the default action... -- Gymel (talk) 13:31, 14 June 2015 (UTC)[reply]

Adding the USDA NDB catalog for matching

Would it be possible to add the USDA NDB catalog for matching on Mix N'Match ? The Excel catalog is available at (column A and B are relevant)

The USDA NDB property (P1978) to match the entries has just been created. --Teolemon (talk) 23:18, 6 July 2015 (UTC)[reply]

Importing now. --Magnus Manske (talk) 10:39, 7 July 2015 (UTC)[reply]
Thank you so much :-) Now, if they could have chosen more legible labels :-S --Teolemon (talk) 18:10, 29 July 2015 (UTC)[reply]

Adding the O*NET-SOC 2010 catalog for matching jobs and occupations

Thanks a lot for the USDA NDB catalog: the Open Food Facts community (including me) has started the matching work :-)

Would it be possible to add the O*NET-SOC 2010 catalog for matching on Mix N'Match ?

The SOC Occupation Code (2010) property (P919) to match the entries has existed for a while.
thank you so much, --Teolemon (talk) 09:50, 12 July 2015 (UTC)[reply]

Adding the Klassifikation der Berufe 2010 (KldB 2010) for matching jobs and occupations

Would it be possible to add the "Klassifikation der Berufe 2010" catalog for matching on Mix N'Match ?

thank you so much, --Teolemon (talk) 10:16, 12 July 2015 (UTC)[reply]

@Teolemon: Done. --Magnus Manske (talk) 19:45, 5 August 2015 (UTC)[reply]
@Magnus Manske:: Thanks Magnus. I'm trying to get the major job code systems (I've proposed the US for Mix N'Match above, and I will try to get ISCO, which is the international one). They come in really handy for statistical comparison between countries, and to break down barriers in job searches.--Teolemon (talk) 12:05, 21 August 2015 (UTC)[reply]

UK Parliament bio

Could you please add d:Property:P1996 to the tool? A basic list of all MPs is available at and Lords at and although there are many more pages for former members of the legislature which it would be useful to link to, eg. Gordon Brown. Unfortunately I can't find a single comprehensive index of all of the pages but there may be a way of extracting such a list. Google finds around 1800 pages with URLs including the stem Thanks, Rock drum (talk · contribs) 16:36, 17 July 2015 (UTC)[reply]

@Rock drum: Imported ~1400 here. --Magnus Manske (talk) 08:56, 16 September 2015 (UTC)[reply]
@Magnus Manske: Thank you. Rock drum (talk · contribs) 15:52, 17 September 2015 (UTC)[reply]

ORCID in Mix'n'match

What kind of ORCID-indexed people are selected for inclusion in Mix'n'match? I suppose it is not the entire ORCID that is in Mix'n'match? — Finn Årup Nielsen (fnielsen) (talk) 12:35, 4 August 2015 (UTC)[reply]

Just a tiny ORCID subset. Pre-selected to match names in Wikidata, IIRC. --Magnus Manske (talk) 19:46, 5 August 2015 (UTC)[reply]

Bug: Search for labels gets cut off at apostrophe

I’m working on the FRS list, and some of the fellows have an apostrophe in their name (for example, William O'Shaughnessy Brooke). For these names, the Search links are all broken: they’ll cut off at the apostrophe (in this case, search for William O). Can this be fixed? —Galaktos (talk) 11:13, 3 September 2015 (UTC)[reply]

Should be fixed now. --Magnus Manske (talk) 13:14, 16 September 2015 (UTC)[reply]
Great, thanks! —Galaktos (talk) 21:51, 16 September 2015 (UTC)[reply]

Open Library identifier (d:Property:P648)

Please add support for Open Library on Mix'n'match. Their data is accessible within API or data dumps. Maybe Mix'n'match will be useful also on OL/IA front, since there are many duplicated entries for each person (some listed at d:Wikidata:Database_reports/Constraint_violations/P648#.22Single_value.22_violations). Lugusto 17:00, 13 September 2015 (UTC)[reply]

The entire set is a little, well, large... I'll start with the authors. --Magnus Manske (talk) 09:14, 16 September 2015 (UTC)[reply]
OK, there are 6.8M authors in OLID. That's more entries than in all other mix'n'match catalogs combined. Too many. I have imported ~333K entries that have both birth and death dates, for now. --Magnus Manske (talk) 11:17, 16 September 2015 (UTC)[reply]

Recognise URLs with explicit protocol on game mode

It would be nice that addresses with "http://" or "https://" could also be recognised as valid ones on game mode so that Q number could be set easily. --abián 19:21, 20 September 2015 (UTC)[reply]

Paris street code

Would it be possible to add Paris digital street code to (d:P:P630) to mix'n'match ? There is a list of entries at Sadly, it appears that it is no longer updated, as there is a slightly longer list at but still it would be be useful (a little more than 10% of entries are still missing in Wikidata, based on numbers returned by wdq). --Zolo (talk) 15:35, 10 October 2015 (UTC)[reply]


Would it be possible to add the UNSPSC catalogue to Mix'N'Match? See P2167. If it would be helpful, I could convert the catalogue into your preferred format, but I cannot find any documentation of the Mix'N'Match upload format. See also this conversation. Cheers, Bovlb (talk) 19:06, 25 October 2015 (UTC)[reply]


Could [3] be added? It corresponds to P2263. I would be willing to help, if needed. Popcorndude (talk) 15:06, 13 November 2015 (UTC)[reply]

Feature request : specify a set of candidate matching

Hi, I recently added this catalogue :

I made have a few mistakes myself in preparing the dataset, but users (especially User:Tinm, thanks to him, have a few complaint which could help make mix'n'match better.

A few complaints :

  1. the search does not work.
    • Partly it's my fault and the labels I generated are bad for it. Do we have good guidelines like "the generated label should follow the Wikidata label conventions" ?
    • Partly it seems to be because the dataset is a multilingual one, with main language english but with names taken from all other the world. This makes the default search inefficient and doing the google "all wikipedias" search systematically was too fastedious for Tinm.
  2. The second big complaint is related to the fact that the dataset have way more ids than there is items. I'll suggest too things to handle this usecase :
    • First : I think it would help to restrict the candidates for matching Wikidata items to, say, a PagePile result set. Clearly it's inefficient to search in all wikipedias if we can already have a good candidate set to search in
    • second : choosing a random id to match to is likely to have no item. So I think that when we have a good set of candidates for matching, it would be useful to reverse the way Mix'n'Match currectly works, for example in game mode : picking one existing item, not one random id, and present a set of candidates ids ...

TomT0m (talk) 10:48, 12 December 2015 (UTC)[reply]

Remove match doesn't work?


Sometimes I make a mistake when entering the Q number, but I notice right away that the article name is wrong. Then I do "remove match". But today I noticed that the removal hadn't really been performed. I have been using the tool only this month. Could you please check it out? --Joutbis (talk) 17:41, 24 December 2015 (UTC)[reply]

Feature request: jump to an article number


Would it be possible to jump to a given article number? I find I'm only doing words with an "A".--Joutbis (talk) 17:49, 24 December 2015 (UTC)[reply]

FAST catalog

Hello, I would like to add the FAST catalog (property FAST-ID (P2163) on Wikidata) to mix'n'match. From their latest data dump, I've created a tsv-file for import into mix'n'match containing all persons from this catalog: [4] (about 60.5 MiB, 783.076 lines). Because that's quite big, here is the same file split into four chunks of max. 20 MiB: [5] [6] [7] [8]

The rows contain three tab-separated columns with the FAST-ID, the person's name and a description. The description consists of the gender, birth- and death-dates, affiliation, Wikidata-ID as found in the FAST database, LCAuth-ID (P244) as found in FAST, VIAF-ID (P214) as found in FAST.

In the future I might add more types of data (geographic, organizations, …) from the FAST database, but for now the person data should be enough work ;-). --Floscher (talk) 18:17, 8 January 2016 (UTC)[reply]

NCES District ID

Magnus, we now have P2483 on Wikidata, the NCES District ID. I believe most of the school districts in the United States already exist in Wikidata. Would you be able to get a recent dataset from Local Education Agency (School District) Universe Survey Data to put into Mix-n-Match for P2483? That ID is also known as LEAID in the flat files available from NCES. The columns LSTREE, LCITY, LSTATE, and LZIP are Location address columns (as opposed to Mailing address columns that begin with M) and can be used to disambiguate school districts. Thanks in advance for considering. Runner1928 (talk) 19:46, 11 February 2016 (UTC)[reply]

6DEG has now a Property

Catalogue 107 (6 degrees of Francis Bacon) now has a wikidata property: d:Property:P2401 and matches could be transferred. -- Gymel (talk) 21:46, 29 February 2016 (UTC)[reply] profile ID property now has a property in wikidata (d:Property:P2600) so maybe should be added to Mix'n'match. The majority of the profiles are not notable people (it's a general-purpose genealogy site so everybody can create a profile), thus only items from wikidata should be searched in, and not the reverse (trying to match all profiles into wikidata). —surueña 20:41, 16 March 2016 (UTC)[reply]

There are currently >3.1 million people in Wikidata. That would mean 3.1 million search requests to I don't think they'd be too happy about that. Maybe they could do that internally, or on a geni dump, but as it is... --Magnus Manske (talk) 10:12, 8 April 2016 (UTC)[reply]

BVMC person ID

It would be great to have the property P2799 included in Mix'n'match. Please feel free to let me know if I can help with this in some way. --abián 18:44, 4 May 2016 (UTC)[reply]

I had a brief look at their website, but couldn't find a data download, and didn't have time to figure out the SPARQL. If you can get me a simple file with their ID, person name, maybe a short description, and a URL or URL pattern, that would be great. You can also take that data and import it yourself! --Magnus Manske (talk) 08:48, 6 May 2016 (UTC)[reply]
@Magnus Manske: You can download data for an ID (for example, 273) in:
In particular, this tag could be helpful:
<nameOfThePerson rdf:datatype="">García Lorca, Federico</nameOfThePerson>
IDs go from 3 to 99999. Null entries don't contain tags <identifierForThePerson> and <nameOfThePerson>. --abián 09:04, 6 May 2016 (UTC)[reply]
Thanks! Do you think it is OK to hit their site 100K× to get all entries? --Magnus Manske (talk) 09:20, 6 May 2016 (UTC)[reply]
I sent them an email warning that their site is being linked from Wikidata. By the moment, I haven't received a reply, but I think you can start with, at least, the first thousands of IDs. Thank you in advance! --abián 10:00, 6 May 2016 (UTC)[reply]
@Magnus Manske: Confirmed, you can go on with the process. :D --abián 09:35, 7 May 2016 (UTC)[reply]

Took a while, but now here. Some already matched via VIAF, name/date-based matching running now. Will sync initial matches to Wikidata. --Magnus Manske (talk) 09:46, 10 May 2016 (UTC)[reply]

Thank you very much, Magnus. --abián 21:24, 10 May 2016 (UTC)[reply]


It would be great to add to the Mix'n'match catalog database ECARTICO - very valuable and useful collection of "structured biographical data concerning painters, engravers, printers, book sellers, gold- and silversmiths and others involved in the ‘cultural industries’" of the Dutch and Flemish Golden Ages. See more in . This project is supported by University of Amsterdam and free licensed under CC-BY-SA. Currently the database contains about 500 people, but this is "gold pages" (in my opinion).

See also Wikidata property proposal.

--Kaganer (talk) 17:39, 9 June 2016 (UTC)[reply]

Now here. --Magnus Manske (talk) 09:08, 10 June 2016 (UTC)[reply]
Thanks! --Kaganer (talk) 20:08, 13 June 2016 (UTC)[reply]
Dear Magnus, I have one question: in the "auto-matched" mode, auto-matched Wikidata items are not displayed dates of life, although they are in the Wikidata. This is (un)known bug or some feature? If this is feature, then it is very uncomfortable feature. --Kaganer (talk) 00:16, 14 June 2016 (UTC)[reply]
AutoDesc was down, restarted. --Magnus Manske (talk) 08:17, 14 June 2016 (UTC)[reply]
I saw, thank you! --Kaganer (talk) 11:55, 15 June 2016 (UTC)[reply]
As i know, ECARTICO already contains 1332 (and counting) links to Wikidata. See . These links was used for auto-matches? If not, maybe re-check matches (I'm ready to watch and check the list of conflicts)? --Kaganer (talk) 13:57, 15 June 2016 (UTC)[reply]
I associated the ones I have with Wikidata. But I didn't import the missing ones. Why are there >1000 in that list, when the website only lists ~500? --Magnus Manske (talk) 22:17, 15 June 2016 (UTC)[reply]
I'm sorry, this my mistake... This database contains 503 people only with surname started from "A" ;) "The database currently contains biographical data on 24 912 persons. Painters: 7 671, Engravers: 1 292, Booksellers, printers and publishers: 865, Gold- and silversmiths: 1 953, Sculptors: 209" --Kaganer (talk) 22:44, 15 June 2016 (UTC)[reply]
By way, this DB contains many negligible persons (e.g. this) - all these relatives of significant persons, included for completeness genealogical connectivity. Maybe need to filter this dataset on the basis of filling "Occupation(s)" field? --Kaganer (talk) 22:54, 15 June 2016 (UTC)[reply]

OK, I now have ~21K entries, they are auto-matching now. Have not filtered them; they should get N/A status in mix'n'match. --Magnus Manske (talk) 14:44, 17 June 2016 (UTC)[reply]

OK, thanks! Wikidata property also has been created. --Kaganer (talk) 21:18, 19 June 2016 (UTC)[reply]
Added, and synced. --Magnus Manske (talk) 09:03, 20 June 2016 (UTC)[reply]
How do re-check data from source database? This is performed manually? Some items was redirected by my request (3031 > 2921, 8807 > 8806). And so it will be again... --Kaganer (talk) 13:34, 22 June 2016 (UTC)[reply]

[ECARTICO] update from source

At the moment, mix'n'match does not update from the original source. Some catalogs could be updated with new entries, but there is no mechanism to change/remove entries in mix'n'match automatically. --Magnus Manske (talk) 14:04, 22 June 2016 (UTC)[reply]

Ok, but should be working process for manually addition/deletion some items? Maybe needs to standartize request's form about this issue? Such cases occured frequently, IMHO. --Kaganer (talk) 16:46, 23 June 2016 (UTC)[reply]


Also one question: with "Search only in this catalog" some existing items impossible to find. As example:

  1. "Thomas Allen" (16292) is founded successfully
  2. but his daughter "Mary Allen" (16291) -- unable to find

--Kaganer (talk) 17:44, 23 June 2016 (UTC)[reply]

Completing Great Aragonese Encyclopedia (GEA)

I think that the first time that this tool automatically matched Wikidata items with GEA entries, it only checked a reduced number of entries.

Could it run over the complete GEA? Thanks in advance. --abián 15:26, 21 July 2016 (UTC)[reply]

Done. --Magnus Manske (talk) 21:00, 21 July 2016 (UTC)[reply]

Smithsonian Museum of American History

Any chance you could add this to Mix n' Match --HCShannon (talk) 14:54, 30 July 2016 (UTC)[reply]

I'd be happy to, if you can find me a way to download or scrape (e.g. automatically browse a complete list list) their data. --Magnus Manske (talk) 10:12, 1 August 2016 (UTC)[reply]

Upload more to a catalogue

I did a test and created catalogue WikiTree. Question Can I add more to this catalogue or do I need to create a new catalogue? Salgo60 (talk) 11:04, 18 August 2016 (UTC)[reply]

As it is now, you will have to create a new catalog. The idea is that you would only upload catalogs that are complete (or as complete as possible at the time). --Magnus Manske (talk) 11:52, 18 August 2016 (UTC)[reply]

Russian encyclopedias

Can you add encyclopeidas in Russian? Lists of articles: --Ctac (talk) 18:05, 31 August 2016 (UTC)[reply]

In principle, sure. However, (1) mix'n'match is designed to map entries on external sites to Wikidata (and thus, Wikipedia), and the pages I found following your link seem to be mostly linked names, without external IDs or URLs to point to. (2) Importing all of that would keep someone busy for a long time, and I am already busy ;-) If there is one you particularly would like in mix'n'match you can import them yourself, here. --Magnus Manske (talk) 11:12, 1 September 2016 (UTC)[reply]

UAI code (code for french schools)


Is it possible to add this two files: Établissements d'enseignement secondaire (secondary schools) and Établissements d'enseignement supérieur (higher education) (Code UAI)? The property in Wikidata is P3202.

Tubezlob (talk) 15:22, 23 September 2016 (UTC)[reply]

Done. --Magnus Manske (talk) 16:20, 25 September 2016 (UTC)[reply]

Onisep occupation ID


It is possible to add this file: Liste des métiers ONISEP? The ID is $1 in this URL:$1 and the property is P3214. Thank you! --Tubezlob (talk) 09:35, 1 October 2016 (UTC)[reply]

Now here. Please note that this could have been imported by anyone using the import function. --Magnus Manske (talk) 08:39, 4 October 2016 (UTC)[reply]
Thank you Magnus, I did not know. --Tubezlob (talk) 12:05, 5 October 2016 (UTC)[reply]

Sorry, this was created twice. '' is the one to delete. 'Supermodels' is the correct one. I hope this can be set. Thierry Caro (talk) 23:40, 6 November 2016 (UTC)[reply]

There are also two small changes that should be done in the 'Réserves Naturelles de France' catalog. The alphanumeric identifier for 'réserve naturelle nationale de la grotte du T.M. 71' should be grotte-du-t.m.-71 and the one for 'réserve naturelle régionale du lac de Grand-Lieu' should be lac-de-grand-lieu-rnr. I wonder, by the way, if there is some regular checks about differences that may appear between Mix'n'match stored external IDs and Wikidata stored external IDs. Whatever, thank you for everything. Thierry Caro (talk) 13:31, 7 November 2016 (UTC)[reply]
@Magnus Manske: 'New York magazine' can also be deleted because 'Model Manual' is the exact equivalent, and the latter has been completed. And then 'Model Manual' also needs some editing whatever. The external IDs have a URL-pattern problem with all the / replaced by %2F. Can you change this the other way around? Thanks for your help and sorry for all the issues with my importing catalogs. Thierry Caro (talk) 14:56, 11 November 2016 (UTC)[reply]


  • "Supermodels".nl deactivated
  • "New York magazine" deactivated
  • I can't find the two "Réserves Naturelles de France" entries you mentioned
  • I have fixed the "Model Manual" URLs

--Magnus Manske (talk) 16:18, 11 November 2016 (UTC)[reply]

Thank you very much. You saved the day! Thierry Caro (talk) 21:46, 12 November 2016 (UTC)[reply]
@Magnus Manske:. Model Manual may now be associated to the newly created P3379. Can you add this to your tool and export to Wikidata the data that results from the matches already established on Mix'n'match? That would be awesome. Thierry Caro (talk) 03:07, 2 December 2016 (UTC)[reply]
Eventually, I have downloaded the matches and imported them to Wikidata myself. The only thing that remains to be done is adding the property to your tool so that future matches will be automatically reported there. Thanks again. Thierry Caro (talk) 03:37, 2 December 2016 (UTC)[reply]
Sorry, busy. Added the property now. --Magnus Manske (talk) 14:38, 15 December 2016 (UTC)[reply]

LEI file

Hi Magnus - I've downloaded the openly available dump from GLEIF that lists over 400,000 corporations and other legal entities from around the world, and run a process to generate the tab format file that Mix n Match handles. However it's about 50 MB in size. Also it's not really a single language (the entity type can be in a number of different languages for instance, though usually English). Any suggestions on how best to handle this? I could split it up into smaller chunks by country if that would help. The associated property for the id is 1278 (Legal Entity ID) and only has 77 values currently set in wikidata. ArthurPSmith (talk) 18:32, 14 November 2016 (UTC)[reply]

  • Ok, I uploaded a US-only portion (about 120,000 records) and working with that for now. Let me know if I should do something different though. Thanks for this tool! ArthurPSmith (talk) 22:03, 15 November 2016 (UTC)[reply]

303 and 304 can be deleted

I seem to have a hard time encoding files so that they appear correctly here. I'm very sorry about this. Would you be OK to simply delete those two catalogs? Thierry Caro (talk) 07:44, 4 December 2016 (UTC)[reply]

@Magnus Manske: The correct one, eventually, is 306. Thierry Caro (talk) 16:48, 4 December 2016 (UTC)[reply]
And then 301 now has its own property, which is P3401. The matches have been exported to Wikidata. Thierry Caro (talk) 18:05, 11 December 2016 (UTC)[reply]
You may also add P3404 to catalog 300. The matches are already on Wikidata. If possible, you may intervene on the database so that the links stored and to-be-generated will be such as instead of This is an encoding problem again. Thank you for all. Thierry Caro (talk) 23:40, 12 December 2016 (UTC)[reply]

Deleted 303 and 304, added properties to 300 and 301. --Magnus Manske (talk) 14:44, 15 December 2016 (UTC)[reply]

Thank you. Thierry Caro (talk) 18:47, 21 December 2016 (UTC)[reply]

Add a property corresponding to the "FAO Races" catalog

The property d:Property:P3380 has been at last created that was intended to be the match of this catalog into Wikidata. I don't know (if I can/)how to add it afterwards, so please point me to a procedure or be kind to do this :) TomT0m (talk) 10:50, 4 December 2016 (UTC)[reply]

Added, synced. --Magnus Manske (talk) 14:48, 15 December 2016 (UTC)[reply]

Kelvinator stove fault codes ???

Hi Magnus - currently every label (at least for the catalogs I tried - for example GRID) is displaying as "Kelvinator stove fault codes" and the description is always "0". in Mix n Match. Not good! Something broken ??? ArthurPSmith (talk) 19:41, 26 December 2016 (UTC)[reply]

[9]. --Magnus Manske (talk) 18:03, 5 January 2017 (UTC)[reply]

FundRef dataset does not load

I would like to use this wonderful tool to fill wikidata:Property:P1905, which according to Wikidata should be done at , but this page keeps "Loading..." forever… Any idea why? − Pintoch (talk) 00:22, 19 January 2017 (UTC)[reply]

I also have another question: say I have a dataset that not only contains identifiers, but also other useful information that could be added to the matched items. Is there a way to specify these statements in the catalogue, so that they are added to the matching item (if any)? Otherwise it is of course possible to do that with a bot, but it seems a bit overkill to write a bot from scratch for each dataset. − Pintoch (talk) 00:42, 20 January 2017 (UTC)[reply]

500 Server Error

Mix n Match seems to be broken! I hope it wasn't me! ArthurPSmith (talk) 21:37, 19 January 2017 (UTC)[reply]

Sync functionality

Hi, I tried to synchronize the OpenISNI-1 catalog with Wikidata, as I found ways to add identifiers from other sources (importing from VIAF and GRID). I get an error message: 'Error:Unknown action ""'. Here is a screenshot: Thanks a lot for your work! − Pintoch (talk) 11:57, 3 February 2017 (UTC)[reply]

It looks like datasets are automatically sync'd after some time. That's even better, thanks for that! − Pintoch (talk) 09:01, 14 March 2017 (UTC)[reply]

Improved search?

Hi Magnus, I find myself using the "search only in this catalog" link a lot - which is great, but what would be even more helpful would be a couple of little changes:

  • allow filtering of the search results by match status (for example filter out all manually matched entries)
  • allow searching of descriptions as well as labels

Any chance this could be done? Thanks! ArthurPSmith (talk) 16:33, 14 February 2017 (UTC)[reply]

@Magnus Manske: One year on, any progress on either of these, especially of the latter? Mahir256 (talk) 20:21, 28 February 2018 (UTC)[reply]

Obsolete base

Hello. And sorry. #384 should be totally dropped, as #385 is the good one. Can you have a look? Thierry Caro (talk) 15:59, 17 February 2017 (UTC)[reply]

Can we add VICNAMES database?

This is to import this CSV file. (Don't know why the extension says .json when it is actually CSV, but that doesn't matter.) We want to import column 9 "Place Id" into the VICNAMES Place ID (P3472) property. The Wikidata label will be matched against column 4 ("Place Name"), although there are going to be various ambiguities which is why mix'n'match is needed rather than a more automated approach. Columns 2 ("Municipality"), 4 ("Feature Type"), 7 ("Longitude") and 8 ("Latitude") may be useful in trying to resolve some of those ambiguities. (Please ignore column 3 "Name Id", that column is a different ID number from P3472 and is not currently in use by Wikidata.) Note that CSV file is released under Creative Commons Attribution 3.0 license – to confirm that yourself, go to VICNAMES, press the Download button (green downward pointing arrow), pick a municipality and a feature type randomly, click "Download", you will see the license agreement links to – also, BTW, I have zero affiliation with the operators of this DB, and haven't discussed this with them, but since they are offering a database download with CC-BY licensing so we don't strictly speaking have to do that. (The linkage of the Wikidata entry to their DB should be sufficient attribution for CC-BY purposes.) Thanks, SJK (talk) 12:29, 10 March 2017 (UTC)[reply]

Probably put the data of the other columns in the "Catalog Description" field. So for example:
"State","Municipality","Name Id","Place Name","Place Name Status","Feature Type","Longitude","Latitude","Place Id"
"VIC","EAST GIPPSLAND SHIRE","17990","MOUNT BULLA BULLA","REGISTERED","MT","148.4184722","-37.0604167","11866"
You could set description as something like "Municipality: EAST GIPPSLAND SHIRE; Place Name Status: REGISTERED; Feature Type: MT; Longitude: 148.4184722; Latitude: -37.0604167". No need to include "State" because it is always "VIC" nor "Name Id" since it is pretty useless (and people might confuse it with the "Place Id" mistakenly.) SJK (talk) 13:00, 10 March 2017 (UTC)[reply]
Oh, and by the way, the URL for each entry is just + "Place Id". SJK (talk) 13:06, 10 March 2017 (UTC)[reply]
I found this I'm getting ready to do it myself now. SJK (talk) 05:00, 11 March 2017 (UTC)[reply]
Okay, I did it. SJK (talk) 08:59, 11 March 2017 (UTC)[reply]

Catholic Encyclopedia 1913 encoding issue

If you look at this catalog, some of the catalog entries show signs of a bad character encoding / mojibake issue. For example, "Bartholomeu Lourenço de Gusmí£o" which links to Bartholomeu Lourenço de Gusmão when it should be "Bartholomeu Lourenço de Gusmão" and it should link to Bartholomeu Lourenço de Gusmão. Can this be fixed? SJK (talk) 13:13, 10 March 2017 (UTC)[reply]

How to back-import properties into mix'n'match

I've created or linked various items to VICNAMES Place ID (P3472) outside of mix'n'match, mainly by using QuickStatements. Is it a way to back-import the matches done outside of mix'n'match back into mix'n'match? SJK (talk) 00:02, 12 March 2017 (UTC)[reply]

I worked it out myself. Click the "Sync" link and then there is a button on that screen to do it. SJK (talk) 04:37, 12 March 2017 (UTC)[reply]

"Accept" control missing?

Hi, I'm relatively new to using Mix n Match. I recently setup the Parks and Gardens UK list. I've just looked at it yesterday and the tool has changed, but now I cannot see the "Accept" link in the automatically matched list (or in other lists). I can only see the "Remove" link. Any suggestions what's going on are appreciated! Pauljmackay (talk) 08:05, 17 March 2017 (UTC)[reply]


The linked user names below the "Users" heading on on are broken.

The markup is:

<tbody><tr u="[object Object]"><td><a href="//Auxiliary data matcher">Auxiliary data matcher</a></td> <td class="num">44829</td></tr><tr u="[object Object]"><td><a href="//Pigsonthewing">Pigsonthewing</a></td> <td class="num">1640</td></tr><tr u="[object Object]"><td><a href="//MistressData">MistressData</a></td> <td class="num">9</td></tr><tr u="[object Object]"><td><a href="//Magnus Manske">Magnus Manske</a></td> <td class="num">1</td></tr></tbody>

including, for example:

<a href="//Pigsonthewing">

-- Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 17:52, 30 March 2017 (UTC)[reply]

Distinguishing the type of provenance of mappings when querying edit history logs

Hi Magnus, Is there a way to distinguish (via tags or comments)

  • items matched automatically by Mix'n'match
  • items matched manually from scratch and
  • items matched manually after running Mix'n'match

when one queries Wikidata's edit history? If not, is there any data source that can be queried programmatically to get this information? Thanks a lot in advance. --Criscod (talk)

Manual matches should always be done by the matching user (unless there was some issue at the time, in which case a bot will perform the match on Wikidata later). Mix'n'match "automatic matches" are not pushed to Wikidata, unless a user manually confirms one, which is then done under the respective user name. There are "thorough automatic matches", usually done when a catalog is first created; they rely on name, birth date, etc. and are attributed to a bot. --Magnus Manske (talk) 08:11, 15 May 2017 (UTC)[reply]

Please delete two catalogs

Hi Magnus, can you please delete town catalogs? 306 is replaced by 440 and 441 is a duplicate I created by error. Thanks. --Fralambert (talk) 00:00, 15 May 2017 (UTC)[reply]

I have deactivated 306 and 441. --Magnus Manske (talk) 08:08, 15 May 2017 (UTC)[reply]

Please delete 443

I made some mistakes in the first catalog I created, and found it easier to create a second one, so catalog 443 can be deleted as it's basically the same as 450. Jon Harald Søby (WMNO) (talk) 09:13, 23 May 2017 (UTC)[reply]

443 has been deactivated. --Magnus Manske (talk) 09:25, 23 May 2017 (UTC)[reply]

Sorting by completeness

Is it possible to have this sorting done by the percentage of unmatched entries in each catalog, as opposed to just moving all completed catalogs to the bottom of the page and leaving the rest not in any particular order? Mahir256 (talk) 19:58, 5 June 2017 (UTC)[reply]

The "completeness" sort takes into account the total number of unmatched and auto-matched entries. Sorting by "percent unmatched" seems a little pointless if a catalog with 200 entries has 50% (=100 entries) unmatched, but is sorted below another with 100000 total, 20% unmatched (=20000 entries). The top catalog in the "completeness" sort is easiest to finish. I can add a "percent unmatched" sort if you really like, but what would be the point? --Magnus Manske (talk) 13:03, 6 June 2017 (UTC)[reply]
Okay, thanks for clarifying the sort order. I suppose I was thinking of something aesthetically pleasing, but I suppose adding a (not unmatched / total) indicator would make the interface similar to s:Special:IndexPages. Mahir256 (talk) 04:54, 13 June 2017 (UTC)[reply]

Please delete 493 and 494

Hello Magnus, i managed to import a CSV for the BNA author ID (P3788) but i made some mistakes (first time i messed dates with OpenRefine, second i saved the CSV with "). Catalogs 493 and 494 can be deleted, the good one is 495. Sorry for the mess! Thanks! --Mauricio V. Genta (talk) 07:34, 2 July 2017 (UTC)[reply]

493 and 494 have been deactivated. --Magnus Manske (talk) 21:44, 1 November 2017 (UTC)[reply]

Mix'n'match truncates geo coordinates

Not sure this is the right place for a bug report but... this sudden burst of incorrect "coordinate location" (P625) in WD has its source in an improper import of these coordinates upon creation of items using catalog 440, BNLQ (tracked thanks to User:Fralambert).

I reproduced it here: the imported latitude gets its initial digit truncated: coord (-47.215000, -71.861944) gets imported as (-7.215000, -71.861944)! See the same issue on Q30557727 and about a hundred more items that required manual correction.

Let me know if I need to report this elsewhere. Thanks --Laddo (talk) 01:44, 5 July 2017 (UTC)[reply]

Thanks for letting me know. I have fixed the coordinates in Mix'n'match, looks much better now! --Magnus Manske (talk) 08:07, 5 July 2017 (UTC)[reply]

Link in username

Small bug: please change the links on the catalog description page, like "Framawiki" should point to my userpage on wikidata, not https://framawiki. --Framawiki (talk) 12:39, 14 July 2017 (UTC)[reply]

And link for "automatic" should not be present --Framawiki (talk) 12:45, 14 July 2017 (UTC)[reply]
Thanks, fixed! --Magnus Manske (talk) 13:43, 14 July 2017 (UTC)[reply]
Thanks :) --Framawiki (talk) 23:34, 15 July 2017 (UTC)[reply]

UK National Archives ID

The National Archives ID property (P3029) is already used on some items on Wikidata, but I would like to start matching these more systematically. They have a straightforward API, and I wonder whether it might be possible to extract more of the data from their details of record creators: for example, for any monastic houses (cf. Cirencester Abbey, Ramsey Abbey), the religious order (P611) could be automatically extracted, but I am not sure how to go about doing this. AndrewNJ (talk)

It's not entirely clear what data to use; the property seems to be limited to people, not including buildings. I have imported the people here, running name/date-based matching now. --Magnus Manske (talk) 21:43, 1 November 2017 (UTC)[reply]


Could the URL which entries in catalogue 49 (TGN IDs) use be changed to "$1"? This URL, as opposed to the one ("$1") which is used at the moment, actually displays synonyms for the given place name (indicating its level of administrative division) and the appropriate position in the geographic hierarchy of the place. To find these with the existing URL requires two more clicks to get to the URL I'm suggesting, which slows down identification of the place. @Magnus Manske: Mahir256 (talk) 05:39, 22 September 2017 (UTC)[reply]

I seem to have done this recently, though I don't remember... let me know if it still needs fixing! --Magnus Manske (talk) 15:57, 1 November 2017 (UTC)[reply]

Entries missing from Statues Vanderkrogt

Several entries on seem to be missing from Mix'n'match, e.g. this bust of Anne Frank. (I've already added the Statues ID property manually to Bust of Anne Frank (Q41688543).) This seems to be the case with several other entries in the Greater London section of the website. Ham II (sgwrs / talk) 07:40, 11 October 2017 (UTC)[reply]

The Strong Museum of Play collection

Can you make some categories based on the collections from, if you can? I don't know what property they would correspond to?--MisstressD (talk) 02:28, 29 October 2017 (UTC)[reply]

Wikidata property for Women's Manuscripts

Hi, I uploaded this women's manuscripts collection, but I didn't include a Wikidata property. I've since made this one for it. Could you please add it for me? Thank you Rachel Helps (BYU) (talk) 21:10, 20 November 2017 (UTC)[reply]

Ah, you linked to an item, not a property? A property (like this one) needs proposal, community discussion, and consensus on creation. --Magnus Manske (talk) 23:14, 20 November 2017 (UTC)[reply]
Oops. Thanks for being patient while I figure this out. I'm not sure if the collection is "notable" enough to be a property, since it would only apply to about 20 items. Rachel Helps (BYU) (talk) 17:50, 22 November 2017 (UTC)[reply]

The Smithsonian Museum Collections ID

Items from this page, featuring objects from all the Smithsonian museums I can't scrape them. --MisstressD (talk) 01:40, 2 December 2017 (UTC)[reply]

Running now for this catalog, but "only" the 2.8M entries with picture/video. 13M is too many; 2.8M already is... --Magnus Manske (talk) 10:05, 6 December 2017 (UTC)[reply]

Deleting catalog 388


We want to migrate FundRef ids from DOIs to their own property wikidata:Property:P3153, so it would be great if catalog 388 could be deleted as it uses DOIs. I will migrate the existing claims to the new property once this is done.

Thanks a lot! − Pintoch (talk) 18:55, 11 December 2017 (UTC)[reply]

I have deactivated 388. --Magnus Manske (talk) 10:36, 12 December 2017 (UTC)[reply]


Isn't and about the same catalogue? --Anvilaquarius (talk) 10:16, 2 January 2018 (UTC)[reply]

You are correct, thanks! I have deactivated #794, and set its autoscraper to update #95 instead. --Magnus Manske (talk) 21:09, 2 January 2018 (UTC)[reply]

Issues with Unicode?

@Magnus Manske: It appears that User:Александр Сигачёв's name appears as question marks in recent changes (yielding an equally incorrect URL for his user page in the process). Also, in the TGN catalog, non-ASCII code points get translated into escape sequences ("Santarém" becomes "Santar\u00E9m").

Can these issues with non-ASCII code points be fixed somehow? Mahir256 (talk) 17:37, 24 January 2018 (UTC)[reply]

Fixed the TNG catalog. Will look into the username, different issue apparently. --Magnus Manske (talk) 15:19, 25 January 2018 (UTC)[reply]
Thank you, @Magnus Manske:, and happy you day! Mahir256 (talk) 06:23, 26 January 2018 (UTC)[reply]

Library of Wales duplicates

I find that the Library of Wales catalog has a large number of duplicate entries. For example, I came across [10], [11], [12] and [13], which are all about the same person. And that is not an extreme case at all, there are several with even more links. Could these extraneous links be removed, or at least have an 'n/a' automatic matching? (in the case mentioned here, the first link is the correct one) - Andre Engels (talk) 10:22, 8 February 2018 (UTC)[reply]

I have manually culled a lot of them into N/A status. Might have missed some, or mis-identified others. No good single pattern to the "bad" ones. --Magnus Manske (talk) 13:35, 8 February 2018 (UTC)[reply]

Please delete 1009

Matching with "lastname, surnames" seems not working properly. Please delete catalog 1009.

1009 deactivated. --Magnus Manske (talk) 13:24, 23 February 2018 (UTC)[reply]

What is the right tool to find entities in wikidata

I'd like to match persons only by their names and wonder if Mix'n'Match is the right tool for this purpose?

Mix'n'match would certainly work for that, especially if the list is static (doesn't change), or can be auto-updated from a website. Other options are OpenReconcile and a Google Spreadsheet plug-in; see here for what the Bodleian Library (Oxford) have looked at. --Magnus Manske (talk) 18:55, 23 February 2018 (UTC)[reply]

Enhanced download options

Hi there, I've just started playing with Mix'n'match and have a few queries. Firstly, the download link in dropdown action menu only downloads the matched items, not the unmatched items. I've been looking at this catalog, which has 135 pages of unmatched items. Most items are not notable and will never have a wikipedia page. But there are over 750 wikipedia articles that should be matched within those 6700 unmatched items. If I could download the full list, I could probably fairly quickly match quite a few, with some data manipulation in Excel, but I'm not going to download, manually review or game mode 135 pages of data. Can the download button please either download all items, and have a field with it's Mix'n'Match status, or have it default to downloading whichever subset (manually, auto, unmatched, n/a) page you are viewing at the time.

Also, when you are in game mode, the action button isn't shown to switch back to another mode. The-Pope (talk) 16:23, 21 February 2018 (UTC)[reply]

Translation; Mix'n'match/6/en needs fixup

I wish to put Mix'n'match/6/en or "w:en:Manual for small and new Wikipedias" as w:ja:すべての言語版にあるべき項目の一覧 to link to jawiki page. Presently, it will output errorneously. Can you have a look and correct please? --Omotecho (talk) 17:43, 25 February 2018 (UTC)[reply]

Hi, I'm not sure what you mean. All the links you give are broken. There is a Japanese page here, if that's what you mean? What's the problem with it? (note: I don't speak Japanese) --Magnus Manske (talk) 08:49, 26 February 2018 (UTC)[reply]
Hello, I mean, two pionts were mixed up. When I tried to replace English page link with the following links in translation window, it did not work, thus I thought I would ask for your help.
Now, out of curiosity, how do you use $-parameters and Special/MyLanguage on what rules? May I replace links in English source pages with either of them, so that translators just translate the words after the pipe or |, and the link will be of local language (if there is translated page)? That way, translating a page will save much time, or at the moment I am going back and forth tab to tab checking if we have ja pages to English page links in the paragraph. Especially with Help pages, I rather find it disapointing to jumping to English pages, even if the link is written in Japanese. (See the case with “List of articles every Wikipedia should have.”) Cheers, --Omotecho (talk) 19:04, 26 February 2018 (UTC) edited.Omotecho (talk) 18:48, 26 February 2018 (UTC)w[reply]

Deutsche Biographie scraper help

Hi All, I was trying to make an automated scraper on Mix'n'Match for an important database called the Deutsche Biographie ( but I wasn't able to (not much of a professional). It seems easy though, as the links seem simple ( Could anyone help? Also this might help: Thanks, Adam Harangozó (talk) 11:17, 2 March 2018 (UTC)[reply]

Made a scraper, running now, will be here. I'll have to pull descriptions separately. Any Wikidata property for this yet? --Magnus Manske (talk) 11:37, 2 March 2018 (UTC)[reply]
Update: 2014 is NDB, 1042 is ADB. --Magnus Manske (talk) 11:49, 2 March 2018 (UTC)[reply]
Amazing, thanks! No wikidata property, only Q1202222. NDB and ADB are combined on this site, so they are separate things on their own. --Adam Harangozó (talk) 12:14, 2 March 2018 (UTC)[reply]
So I think it should be one database (Deutsche Biographie), NDB and ADB can just be reached as scanned books through these profiles ( --Adam Harangozó (talk) 14:31, 2 March 2018 (UTC)[reply]
I can only have one scraper per catalog. I can merge them, but need to deactivate one scraper. Are both ADB and NDB complete, or do they add entries over time? --Magnus Manske (talk) 14:45, 2 March 2018 (UTC)[reply]
I don't know but for me it seems that we need a single scarper looking at all the identifiers starting with sfz, as it does not matter if they are in both ADB and NDB or only in one. For example: " Collinus" They still have the same identifier, which I think should simply be called Deutsche Biographie. (So ADB and NDB are just sub-categories here, no need to worry about them) --Adam Harangozó (talk) 13:08, 3 March 2018 (UTC)[reply]

User:Adam Harangozó, User:Magnus Manske - the official recommendation by the Deutsche Biographie is to use the GND, not the SFZ.

Please create a catalog based on GND. 21:26, 28 July 2018 (UTC)[reply]

Now here. Still loading, as it needs to check each URL for the details, which are not in the BEACON files. --Magnus Manske (talk) 07:59, 30 July 2018 (UTC)[reply]

"Improved search?" part 2

Apologies if you happened to miss my ping about the subject among the rest of the above subsections, but a year ago ArthurPSmith asked about possible improvements to the search feature, namely filtering by match status and searching through descriptions. Do you have any updates about those? Mahir256 (talk) 17:52, 2 March 2018 (UTC)[reply]

@Magnus Manske: I am genuinely curious as to whether these are being worked on, and I'm sure Arthur probably still is as well. My apologies if the pings are getting annoying. Mahir256 (talk) 02:41, 6 June 2018 (UTC)[reply]
I am not working on those, and I am not considering them a priority. --Magnus Manske (talk) 10:14, 14 June 2018 (UTC)[reply]

Update a non-scraper catalogue?


I could swear I had read a documentation about that, but cannot find it anymore :-(

How would I go about updating a CSV-backed Mix’n’match catalogue? I had made a mistake when scraping 989 (all games are set to Mega Drive). I re-scraped the website and have a CSV ready − what would be the process to update the catalogue?

Thanks! Jean-Fred (talk) 12:31, 4 March 2018 (UTC)[reply]

Can you just import it and I'll close the old catalog? --Magnus Manske (talk) 10:47, 19 March 2018 (UTC)[reply]

Retrieve the config of a scraper-backed catalogue?

I have loaded several catalogues using the Scraper tool − works fine :)

In several cases, I notice a discrepancy with Wikidata, which clearly hints at a mistake I made when scraping (eg in User:Magnus_Manske/Mix%27n%27match_report/789, I must have done something wrong with the 'Z').

I would thus be keen on fixing my scraper ; however I have not backed up the config (especially the regexes) I used ; and they were such a pain to craft in the first place, that I really hope I don’t have to start from scratch again :)

Is there any way to look up the underlying config?

Jean-Fred (talk) 12:34, 4 March 2018 (UTC)[reply]

Creation candidates comes out empty

I used to work with Creation candidates a lot, but now it doesn't work at all. For example gives "No results, parameters might be too restrictive" - hasn't been working for a longer time, trying to get the empty name, then grinding the whole browser to a halt. - Andre Engels (talk) 07:03, 2 June 2018 (UTC)[reply]

/human works again - Andre Engels (talk) 09:30, 4 June 2018 (UTC)[reply]
No, still not working correctly - I now get the same entry over and over again, there does not seem to be anything else available. - Andre Engels (talk) 12:01, 4 June 2018 (UTC)[reply]
I should probably turn that off. I have a bot creating these automatically now... --Magnus Manske (talk) 15:06, 5 June 2018 (UTC)[reply]

Multilingual matching and value addition

I regularly add multilingual CC-0 dataset for matching (latest being the EUROVOC). Multilingual matching and multilingual label addition would be a boon for those cases. --Teolemon (talk) 15:14, 4 July 2018 (UTC)[reply]

Problem to create item with Visual tool


Pratically every time I try to create an item with the Visual tool, I got this message "Wikidata error: Must be no more than 250 characters long". Do it is possible to fix this ? Simon Villeneuve 11:09, 30 July 2018 (UTC)

Excracting country from description

Yes check.svg Resolved.

I know there's a script especially made to extract birth and death dates, but is it possible to extract P27 from description field in catalog #1538?--HeavyTony (talk) 01:41, 3 August 2018 (UTC)[reply]

Can do, but are you sure that "Country of origin: X" is P27 and not "country of origin (P495)"? --Magnus Manske (talk) 08:17, 3 August 2018 (UTC)[reply]
I'm generating P27 ones now. --Magnus Manske (talk) 08:36, 3 August 2018 (UTC)[reply]
That's perfect.--HeavyTony (talk) 11:49, 3 August 2018 (UTC)[reply]

Adding property P434 to catalog 1486

Yes check.svg Resolved.

So, there's more than 57 000 entries, but "This catalog has no Wikidata property!". --HeavyTony (talk) 04:24, 4 August 2018 (UTC)[reply]

Had a look, looks like 1486 was fixed in the meantime :) Jean-Fred (talk) 18:34, 2 October 2018 (UTC)[reply]

Search problems

For the last few days I have been unable to have a search complete at my home and work. Is there something up? William Graham (talk) 22:13, 19 August 2018 (UTC)[reply]

Looks like it's resolved. William Graham (talk) 18:16, 20 August 2018 (UTC)[reply]

Hi, I have the issue that the wikidata search in the visual tool only works sometimes. Some example queries that don't work and just show "Searching...": Chang and Eng ; John Kasich ; L H Myers ; Haakon II Sigurdsson ; Thomas Hartman FS100 (talk) 12:18, 19 September 2018 (UTC)[reply]

Critical Condition Film Ids

The films featured on these lists (talk) 22:24, 1 October 2018 (UTC)[reply]

Modify scraper after saving

@Magnus Manske: Is it possible to edit (or at least view) the scraper after having saved it? --Malore (talk) 00:54, 3 October 2018 (UTC)[reply]

Please deactivate catalogs 1525 & 1526

Could you also rename 1527 to have 1525 title? thanks--HeavyTony (talk) 15:32, 5 December 2018 (UTC)[reply]

Adding ISSN and BFI level to catalog 2028

@Magnus Manske: Could you extract ISSN and BFI from description in catalog 2028?--HeavyTony (talk) 16:02, 15 December 2018 (UTC)[reply]

Yes check.svg Done --Magnus Manske (talk) 09:52, 15 October 2019 (UTC)[reply]

Replacing a catalog

@Magnus Manske: I created the original PCGamingWiki catalog ( about a month ago, but it had some encoding issues that were causing problems. I've now uploaded a "better" version of the catalog here:

Can you delete the old catalog? Thanks for this awesome tool :) Nicereddy (talk)

I marked the old catalogue as inactive :)
@Magnus Manske: On both the old catalogue and the new one that Nicereddy imported, Mix’n’match made Zero automatic matches, which is weird − before Nicereddy went on mass-linking with a script, there were hundreds of easy matches that mix’n’match would have typically picked up. Even now, I think there should be some more potential matches − eg this to Agricultural Simulator 2012 (Q11849886).
Jean-Fred (talk) 21:37, 7 February 2019 (UTC)[reply]
Actuallyn Nicereddy clarified to me that there were auto-matches on that second catalogue (he was just too quick to either accept the good ones and reject the bad ones before I could notice ^_^). Jean-Fred (talk) 09:14, 8 February 2019 (UTC)[reply]

Please deactivate catalog 2217 and 2220

@Magnus Manske: Please deactivate catalog 2217 and 2220. IDs of catalog 2217 are not decoded. Property of catalog 2220 is wrong.--本日晴天 (talk) 11:00, 5 March 2019 (UTC)[reply]

Please delete catalog 2362

@Magnus Manske: I created that NGMDb catalog in error. Can you please remove it so I can use mix'n'match without conflating that catalog with NGMDb ID? Thank you, Trilotat (talk) 18:42, 8 May 2019 (UTC)[reply]

Yes check.svg Done Disabled. Jean-Fred (talk) 21:58, 10 May 2019 (UTC)[reply]

When item matches multiple WikiData entries

How should one handle an entry that should map to multiple entries in WikiData? (Perhaps a future enhancement to the tool.) An example, set is CathEn 1913, entry 25390652, catalog id 12344b (article title Praxedes and Pudentiana) should match both Q268087 and Q676485. --Dcheney (talk) 06:35, 11 August 2019 (UTC)[reply]

Please delete catalog 2737

@Magnus Manske: I created the lokalhistoriewiki catalog with a lot of error. My first time creating a catalog, I will make a new catalog. I'm sorry for making a mess. - Premeditated (talk) 21:36, 21 August 2019 (UTC)[reply]

Please delete catalog 3325

@Magnus Manske, Pigsonthewing, Jean-Frédéric, Adam Harangozó, Harmonia Amanda, Thierry Caro, Ash Crow, Salgo60, and Gerwoman: (since last I heard you all are catalog admins--indeed there should be a nicer and shorter way of pinging you all): I set up a scraper for a website hoping that it would skip numeric identifiers that didn't yield matches (e.g. if "56" and "58" yielded matches but "57" didn't, I thought the scraper would simply skip "57" rather than terminate at "56"). Since that scraper evidently failed, I opted instead to just upload a bunch of IDs manually (which, as it turns out, can't be used to update existing catalogs per the current import form). I thus humbly request that catalog #3325 be marked inactive (until, perhaps, the behavior of the autoscraper in this case could be adjusted). Thank you! Mahir256 (talk) 03:13, 26 January 2020 (UTC)[reply]

Please delete catalog 3345

@Magnus Manske, Pigsonthewing, Jean-Frédéric, Adam Harangozó, Harmonia Amanda, Thierry Caro, Ash Crow, Salgo60, and Gerwoman: Catalog 3345 is a duplicate of 3344. Please delete It. Thanks

Nikola Tulechki (talk) 11:03, 30 January 2020 (UTC)[reply]

Yes check.svg Done Marked as deactivated. Jean-Fred (talk) 11:07, 30 January 2020 (UTC)[reply]

Polish catalogues

I see, that several polish catalogues are not in group country_polska, how to add them to it? Matlin (talk) 22:46, 15 February 2020 (UTC)[reply]

Scraper problem?

So I set up a scraper for a mix'n'match catalog (#3407) that, when I tested it in the catalog creation screen, worked fine (it gave two well-formatted results), but somehow didn't capture anything when I actually created it (every time I try to run the job, it returns nothing). Is something wrong with either the scraper or the way I created the catalog? Did my use of a very long regex to identify matches or a list of 20,000 URLs to scrape cause a problem? Mahir256 (talk) 21:40, 17 February 2020 (UTC)[reply]

@Magnus Manske: While I could try manually importing this catalog, I don't want to do so without knowing why the scraper I set up does not work, especially given the previous scraper I tried to set up (described above) also failed for (possibly different?) reasons. Mahir256 (talk) 22:34, 19 February 2020 (UTC)[reply]
Looks like it worked after all? --Magnus Manske (talk) 09:25, 27 February 2020 (UTC)[reply]
@Magnus Manske: Indeed it has; if it just took a long time (getting 22k pages is a lot), then it should be marked as "doing" in the jobs list whenever I check on it rather than just go straight to "done" after a few seconds. Mahir256 (talk) 04:44, 29 February 2020 (UTC)[reply]
I think I am running into the same problem with (#3448) – the job status indicates that it has finished but there are no results. There is a large range to work through, so perhaps we'll see in another day or so? AndrewNJ (talk) 22:09, 20 March 2020 (UTC)[reply]

Disabled catalog inacessible

I deactivated (in the catalog_editor) 3411, as the underlying property needed to be auto-fixed. I would now want to re-enable it, however does not work anymore − the JS console throws `TypeError: "catalog is undefined"`

Jean-Fred (talk) 12:28, 21 February 2020 (UTC)[reply]

I have reactivated the catalog. --Magnus Manske (talk) 09:23, 27 February 2020 (UTC)[reply]


@Magnus Manske: Since the current data set is quite outdated I've created a webscaper for Could you please add it since you're the owner of the catalogue?

URL pattern:$1
RegEx entry:	{.+?"id":([0-9]+),"name":"([^"]+)","slug":"([0-9A-Za-z_]+(\-[0-9A-Za-z_]+)*)","other_names":"([^"]*)","description":"([^"]*)",.*?,"classification":(null|{.*?"name":"([^"]+)").*?"jurisdiction":{.*?"name":"([^"]*)
id:	$3
name:	$2
desc:	$9, TYPE: $8, ALT NAME: $5, DESCRIPTION: $6, ID: $1

Apart from that I've noticed a PHP warning which occured when clicking on "Test this Scraper". It lead to an invalid JSON reponse for the AJAX-request and was displayed as "unknown error" in the user interface. Unfortunately I don't remember my input.

<br />
<b>Warning</b>:  preg_match_all(): Unknown modifier 'b' in <b>/data/project/mix-n-match/</b> on line <b>733</b><br />
<br />
<b>Warning</b>:  preg_match_all(): Unknown modifier 'b' in <b>/data/project/mix-n-match/</b> on line <b>733</b>

Thank you very much (especially for your tools) and best whishes --Nw520 (talk) 00:50, 28 February 2020 (UTC)[reply]

Found the issue with the unknown modifier. I had a RegExp with slashes which weren't probably encoded. Maybe it would better to define a custom error handler to prevent these warnings from being output and therefore breaking the JSON. --Nw520 (talk) 23:32, 25 March 2020 (UTC)[reply]
Added an issue for that. --Nw520 (talk) 23:49, 25 March 2020 (UTC)[reply]

"Automatically matched" and "automatched" (compared to "manually matched")

Please see d:Topic:Vjres76a1f43dhs2 about a possible change of labels. Jura1 (talk) 22:19, 2 April 2020 (UTC)[reply]

  • "Automatically matched" is now "Fully matched"
  • "Manually matched" is now "Preliminarily matched"

Jura1 (talk) 12:53, 18 April 2020 (UTC)[reply]

where can i find info about catalogue changes ?

i am unable to find info about catalog changes. where can i find them ? for some reason, i am unable to find info about Rachel C. Thomson Rachel C. Thomson (Q58874674), i tried this search, perhaps i am wrong. btw, great tool. Leela52452 (talk) 06:34, 25 April 2020 (UTC)[reply]

Please deactivate catalog 2698

@Magnus Manske: Please deactivate catalog 2698. The identifiers are totally obsolete. See also d:Property talk:P3231#ID change. 本日晴天 (talk) 08:10, 27 April 2020 (UTC)[reply]

Done.--Magnus Manske (talk) 08:40, 27 May 2020 (UTC)[reply]

catalog 2528 redirecting to

hello @Magnus Manske: catalog 2528 is redirecting to dummy website. the website is for sale. Leela52452 (talk) 09:51, 28 April 2020 (UTC)[reply]

Fixed. --Magnus Manske (talk) 08:44, 27 May 2020 (UTC)[reply]

'Remove' link is not working

The 'Remove' link (for removing preliminary, and in rarer cases also other matches) is not working. The item is grayed out, and remains grayed out without anything happening. In a perhaps related issue, the 'Create new item' button at the bottom of creation candidates pages has the same issue, and already had it longer (that was less of an issue because it could be circumvented by creating an item from one link, then add it to the others). - Andre Engels (talk) 08:24, 15 May 2020 (UTC)[reply]

Should be fixed now. --Magnus Manske (talk) 08:41, 27 May 2020 (UTC)[reply]

WD descriptions

After the migration to Toolforge, the tool doesn't load the Wikidata descriptions anymore ("Could not load description for [Q-ID]"). Is it temporary behaviour? --INS Pirat (talk) 20:52, 4 June 2020 (UTC)[reply]

  • Everything is working well now. Thanks. --INS Pirat (talk) 20:43, 8 June 2020 (UTC)[reply]

Polish catalogues

Please clean polish catalogues. There are these ones to add:

And to remove:

--Matlin (talk) 17:17, 5 June 2020 (UTC)[reply]

@Matlin: What do you mean by cleaning?
As far as I understand, the grouping per country is inferred from the property linked to the catalog − so you’d need to tag the properties accordingly.
Jean-Fred (talk) 12:43, 16 June 2020 (UTC)[reply]

Visual tool not working?

I've tried to use the visual matching tool in both Firefox and Chrome and it is showing as blank. I also tried this with different catalogs, same results.

Visual match screenshoot.jpg

--Nashona (talk) 16:27, 11 June 2020 (UTC)[reply]

I have been having the same problem for over 2 weeks and submitted a ticket to @Magnus Manske: here on June 3 but haven't seen any updates yet. --Infopetal (talk) 19:05, 16 June 2020 (UTC)[reply]

Usability issue

In Match mode, would it be possible to consider the replacement of [↑] with something easier (bigger) to click? I would suggest something like [Match!] or [Match it]. Thanks --Luckyz (talk) 06:31, 15 June 2020 (UTC)[reply]

Auto-generated descriptions use wrong pronouns for transgender people

Pinging tool maintainers (as listed on Toolforge): User:MaxFrax96, User:Magnus Manske, User:Hjfocs

When Mix'n'match suggests possible Wikidata items which correspond to a given external item, it automatically generates a description for that item. When browsing preliminary matches for Politifact IDs, I found that it generated this description for Christine Hallquist (Q56167585): "Christine Hallquist is a US-American politician. He was born on April 11, 1956 in Baldwinsville. He studied at Mohawk Valley Community College." Hallquist is a transgender woman, and referring to her as "he" is likely to cause offense.

In Wikidata, her gender is set to transgender female (Q1052281), so this seems to be a code error and not an issue with the data. Whatever code generates these descriptions needs to be corrected to use feminine pronouns here. I'm guessing that what might have happened here is that the code didn't recognize transgender female (Q1052281), and fell back to he/him/his pronouns as a default. If this is the case, it would probably be a good idea to change the fallback pronouns to they/them/their to avoid causing unintended offense in the future. –IagoQnsi (talk) 17:59, 21 July 2020 (UTC)[reply]

Ah, just noticed there's Bitbucket issue reporting for this project; I've just opened issue #63 for this bug. –IagoQnsi (talk) 18:03, 21 July 2020 (UTC)[reply]

Please delete catalog 3880

@Magnus Manske, Pigsonthewing, Jean-Frédéric, Adam Harangozó, Harmonia Amanda, Thierry Caro, Ash Crow, Salgo60, and Gerwoman:

Please delete my catalog 3880 (or delete data). Sorry for my mistake in IDs. --Manu1400 (talk) 15:08, 6 October 2020 (UTC)[reply]

Yes check.svg Done. Jean-Fred (talk) 17:35, 6 October 2020 (UTC)[reply]
I dont have those privs but I would like to reload catalog/1223 as it contains items with Show False that should not be in the catalogue see API call . Anyone who knows how to do that- Salgo60 (talk) 15:25, 6 October 2020 (UTC)[reply]