Jump to content

Community Wishlist/Wishes/Continue the development of the Harvest Templates tool that allows importing data to WData

From Meta, a Wikimedia project coordination wiki
Continue the development of the Harvest Templates tool that allows importing data to Wikidata Open

Edit wish Discuss this wish

Description

Currently, there is often more data in English Wikipedia than on Wikidata.

The Infobox Export Gadget is a related tool

If Wikidata is to be useful, it would need to have at least as much data as English Wikipedia. For example, it would make more sense to query data from English Wikipedia via the APIs than from Wikidata, and items miss a lot of data where data seems to be the point of Wikidata.

When people use templates on Wikipedia, data can often be imported from them.

This is what the Harvest Template tool allows.

So, for example, one can import the IMDb IDs of films from the IMDb template or the spoken text audio from the Spoken Wikipedia template. The template may look like {{IMDb title|ID|description=xyz}}, and one could read the ID from the template.

However, it's not developed anymore but has several issues that may not be super complicated to fix but would make this tool very useful – at least when compared to e.g. editing Wikidata by hand manually:

  • It can't set any qualifiers...such as simply setting the language qualifier to English for the query imports for spoken text audios (issue #210)
  • When starting the import, it fails after 30 seconds or so, after around 17 items – the results pane is just blanked (issue #209)
  • Only around half of the items were imported – the other items had error "Constraint violation: commonslink violation" despite that the value is fine and I could add it manually (issue 207)

A very incomplete list of things that can be imported (and you can use these also for other languages) is at pltools.toolforge.org/harvesttemplates/share.php.

MisterSynergy wrote This tool is not being actively developed anymore. Technically I have become the defacto owner, after Pasleim has handed over his tools and bots to me shortly before he stopped participating in Wikidata in late 2022. While I have worked a lot on the bots and made them much more robust, the few tools with a web user interface are not really within my skill set. Thus, I am not touching them. As long as they run, they run—and when they are gone I will not do anything about it.That said, most if not all code from Pasleim is CC0 licensed, thus another maintainer can fork the repos and make improved tools from it. If there is a clear and stable successor, I would also consider to redirect the current tool URL to the successor tool in order to avoid unnecessary competition between forks. Some of Pasleim's code is available at Github, some more in the tools "pltools" and "plnode" on Toolforge. GitHub repo

Note that if Wikidata has more data than Wikipedia, this could also be used to add data to e.g. infoboxes in Wikipedia at scale.

This may not be an overly important issue, but for Wikidata, I think it is. While IMDb IDs shouldn't be imported from Wikipedia templates in the first place, via some massive bulk import of film data, other things can also/only be imported that way.

Assigned focus area

Unassigned.

Type of wish

Feature request

Wikidata

Affected users

Wikidata users, Wikidata contributors

Other details

  • Created: 13:48, 14 April 2025 (UTC)
  • Last updated: 12:55, 17 April 2025 (UTC)
  • Author: Prototyperspective (talk)