Community Wishlist Survey 2021/Wikidata/Duplicates and merge candidates

From Meta, a Wikimedia project coordination wiki
Jump to navigation Jump to search
Random proposal◄ Wikidata  The survey has concluded. Here are the results!

Duplicates and merge candidates

  • Problem: There is an increasing number of items that are empty or possible duplicates
  • Who would benefit: Wikidata editors
  • Proposed solution: Improve on prior art like Projectmerge to detect duplicates not only by labels but by comparing properties and links with other items; migrate the WD:DNM do not merge lists to something more usable (example suggested in the discussion page, migrate to P1889 statements
  • More comments:
  • Phabricator tickets:
  • Proposer: Sabas88 (talk) 12:38, 20 November 2020 (UTC)Reply[reply]


  • Removed the Phabricator task as it's not relevant. --Matěj Suchánek (talk) 15:54, 20 November 2020 (UTC)Reply[reply]
  • @Sabas88: Thanks for your proposal. Is there code for the mentioned projects that we can take a look at? We'd like to have a better understanding on how the projects detect duplicates. Thanks again! Harumi Monroy 19:35, 23 November 2020 (UTC)Reply[reply]
    Sorry I can't find it... Help:Merge has a list of tools but I didn't see a relevant git repository --Sabas88 (talk) 12:45, 25 November 2020 (UTC)Reply[reply]
  • A good idea. Improve on existing tools, to be able to better predict if two items are duplicate. Simplistic example: same name, different description, but both populated places (or similar property, city, village) with a very similar geographic location (within a radius of 2 km one from the other). --FocalPoint (talk) 05:58, 24 November 2020 (UTC)Reply[reply]
    Or if not same name, perhaps with some other String Metric and comparing properties..--Sabas88 (talk) 12:45, 25 November 2020 (UTC)Reply[reply]