Community Wishlist Survey 2022/Wikidata/Creation of new objects resp. connecting to existing objects while avoiding duplicates/Proposal
- Problem: The problem of connecting newly created articles to existing objects respectivley creating new objects for unconnected pages (when, how, by whom, ...) for hundreds of newly created articles per day in different language versions, and how to avoid duplicates amongst the currently 96 million objects d:Special:Statistics, has been discussed for years again and again without a real solution, for example at d:Wikidata:Requests for permissions/Bot/RegularBot 2
- Proposed solution: At d:Wikidata:Contact_the_development_team/Archive/2020/09#Connecting newly created articles to existing objects resp. creating new object - additional step when creating articles, categories, etc. a possible solution has been discussed:
An additional step after saving a newly created article etc. to present to the user a list of possible matching wikidata objects (e.g. a list of persons with the same name; could be a similar algorithm as the duplicate check / suggestion list in PetScan, duplicity example) or the option to create a new object if no one matches (depending one the type of the object, some values could be already be pre-filled and pulled from the article, e.g. from categories or infoboxes). From my point of view, one current problem is, that a lot of creators of articles, categories, navigational items, templates, disambiguations, lists, commonscats, etc. are either not aware of the existance of wikidata or did forget to connect a newly created article etc. to an already existing object or to create a new one if not yet existing, which might lead to (more) duplicates, if this creation respectivley connection is not done manually, but by a bot instead, which have to be merged manually afterwards.
In addition, there could be specialized (depending on the type of the objects, e.g. one bot for humans, one for films, one for building, etc.) bots, which are for example able to check for various IDs (like GND, VIAF, LCCN, IMDb, ...) in order to avoid creating duplicates and creates new items or connects matching items based on IDs.
Also, if someone uses the "translation function" to create a translated article in another language version, then the new translated article could be connected automatically to the object of the original article. And after a version import (after a translation), at the moment often the link to the Wikidata object gets lost and the article has to be reconnected again a second time manually.
- Who would benefit: Improved data quality, i.e. less duplicates
- More comments: Also see:
- Community Wishlist Survey 2021/Wikidata/Creation of new objects resp. connecting to existing objects while avoiding duplicates
- de:Wikipedia:Technische_Wünsche/Wunschparkplatz#Verbinden/Anlegen_von_bestehenden/neuen_Wikidata-Objekten_mit_neu_angelegten_Artikeln/Kategorien_unter_Vermeidung_von_Dubletten
- Phabricator tickets:
- Proposer: --M2k~dewiki (talk) 18:17, 10 January 2022 (UTC)