Jump to content

Community Wishlist Survey 2021/Wikidata/Creation of new objects resp. connecting to existing objects while avoiding duplicates

From Meta, a Wikimedia project coordination wiki

Creation of new objects resp. connecting to existing objects while avoiding duplicates

  • Problem: The problem of connecting newly created articles to existing objects respectivley creating new objects for unconnected pages (when, how, by whom, ...) for hundreds of newly created articles per day in different language versions, and how to avoid duplicates amongst the currently 90 million objects, has been discussed for years again and again without a real solution, for example at d:Wikidata:Requests for permissions/Bot/RegularBot 2
  • Who would benefit: Improved data quality, i.e. less duplicates
  • Proposed solution:

At d:Wikidata:Contact_the_development_team/Archive/2020/09#Connecting_newly_created_articles_to_existing_objects_resp._creating_new_object_-_additional_step_when_creating_articles,_categories,_etc. a possible solution has been discussed:

An additional step after saving a newly created article etc. to present to the user a list of possible matching wikidata objects (e.g. a list of persons with the same name; could be a similar algorithm as the duplicate check / suggestion list in PetScan, duplicity example) or the option to create a new object if no one matches (depending one the type of the object, some values could be already be pre-filled and pulled from the article, e.g. from categories or infoboxes). From my point of view, one current problem is, that a lot of creators of articles, categories, navigational items, templates, disambiguations, lists, commonscats, etc. are either not aware of the existance of wikidata or did forget to connect a newly created article etc. to an already existing object or to create a new one if not yet existing, which might lead to (more) duplicates, if this creation respectivley connection is not done manually, but by a bot instead, which have to be merged manually afterwards.

In addition, there could be specialized (depending on the type of the objects, e.g. one bot for humans, one for films, one for building, etc.) bots, which are for example able to check for various IDs (like GND, VIAF, LCCN, IMDb, ...) in order to avoid creating duplicates and creates new items or connects matching items based on IDs.

Also, if someone uses the "translation function" to create a translated article in another language version, then the new translated article could be connected automatically to the object of the original article. And after a version import (after a translation), at the moment often the link to the Wikidata object gets lost and the article has to be reconnected again a second time manually.


  • Just to give the scope of the problem if nothing is done: I needed several months to integate 2,000 items wihout P31/P279 that had accumulated in biochemistry from freshly created en-WP articles. Wikipedia editors are left alone with the task of creating WD items for their articles. I am now facing 1,000 more items from de-WP articles. Any guidance that can be given to WP editors will be helpful. --SCIdude (talk) 08:05, 17 November 2020 (UTC)[reply]
  • We already have constraints that flag items with non-unique IDs. I don't think that those cases should be automatically resolved by bots and in any case bots are created by the community and don't need to be created by the community wishlist team.
I don't think the way to make Wikidata more popular is to force Wikipedia editors to do the work of interfacing with Wikidata when they create new articles. I don't think either dewiki nor enwiki would activate such a feature if it would be available to them. ChristianKl15:33, 17 November 2020 (UTC)[reply]
On the contrary, a whole heck of a lot of opposition at en.WP is precisely because the integration is so loose. Integration rolled out before support for client watchlists was provided (and then after the amount of changes were too much so the devs scaled that back). --Izno (talk) 18:14, 17 November 2020 (UTC)[reply]
  • This is probematic - how can system say "this article should be connected with this item?" But easy solution should be hey, on Wikidata is item with smae label as your article without sitelink to this wiki. Is it the same? And this system needs some modifications, e.g for wikisource, where "The Book" is ususally not the same thing as "The Book" on Wikipedia, but only edtion. JAn Dudík (talk) 15:11, 18 November 2020 (UTC)[reply]