Community Wishlist Survey 2023/Larger suggestions/Tool for searching infobox parameter values

From Meta, a Wikimedia project coordination wiki

Tool for searching infobox parameter values

  • Problem: Some template parameters should have only few values or value types. eg. | foo = <number_only> or | bar = [ABC|DEF|GHI] . For template maintenance should be useful to have list and count of values. Some of this can be done via monitoring categories (see cs.wikisource), but this is suitable for small projects only or with some db query, which is only for few geeks.
  • Proposed solution: Make some project like petscan or SPARQL, where user can choose template, parameter(s) and type of output. Some example:
    • Template:<Infobox - foo>
    • Parameter:<state>
    • Output:<count|values|regex>
      • Count -> paramter state is used 1234x
      • values - > Albania 5x, Albanie 2x, Andorra 1x.... (table)
      • regex -> 1231x yes, 3x no
    After this should be useful If I can search for articles containig specfic value in specific infobox - this can be done via insource, but there are also false positives (have template ffo, but parameter bar is in another template)
  • Who would benefit: Editors who works on maintenace.
  • More comments: There was project TemplateTiger, but is more than 5 years out of order. This project made some useful outputs from dumps.
  • Phabricator tickets: T120767
  • Proposer: JAn Dudík (talk) 21:25, 26 January 2023 (UTC)[reply]

Discussion

English Wikipedia has something that does a lot (I think) of what you're asking for: Template Parameters. It has some restrictions to reduce processing time (and only runs once a month). No more than 50 values are displayed for any parameter, and for highly used templates (more ~50,000 transclusions), the ability to show exactly which articles contain a particular parameter or value is turned off (but insource regex searches can be used to find a particular value; if I understand, the problem you're having is at least in part not evening knowing what anomalous values might be present in order to search for them). Plantdrew (talk) 23:54, 30 January 2023 (UTC)[reply]

  • Notably also part of the 2015 wishlist already. —TheDJ (talkcontribs) 13:20, 2 February 2023 (UTC)[reply]
  • @JAn Dudík: I want to make sure we're understanding the problem, and not focusing as much on the solution. It sounds like the main use-case is for deprecating parameters of template. Is that correct? I see you mention tracking categories, and I wonder why you feel that isn't workable for other projects as well. Tracking categories are used when this situation happens on English Wikipedia, for example en:Category:Deprecated parameters. Template editors can add checks for what parameters and values are passed in, and then categorize accordingly. Are there other use-cases you have that aren't solved by categorization? The proposed solution as currently written would be very complex and likely too large for our team, as it requires preprocessing an entire wiki's template namespace and all transclusions. To do this for every wiki wouldn't seemingly be feasible with the current architecture of MediaWiki. MusikAnimal (WMF) (talk) 21:27, 3 February 2023 (UTC)[reply]
    Some example: | parametrer = should be foo, [[Foo]]-Bar Foo, [[Foo]] or [[Bar|Foo]] and I want to know which format is most common and unify it.
    Main problem of tracking categories is that some people hates rec categories and creates them even if the are only for few days, and populating this categories takes long time or need bot. JAn Dudík (talk) 19:45, 6 February 2023 (UTC)[reply]
    @JAn Dudík Okay, thank you! I think I understand now. This still sounds like a massive project. I had never used the old TemplateTiger tool, but according to the German documentation it took years for it to go through large wikis like English and German Wikipedias. Surely that's not satisfactory.
    Did you see the tool that @Plantdrew mentioned above? It has a lot of limitations, but it sounds like it may offer some of what you need. Though, I see it only supports a handful of wikis.
    Overall, I think there may be something doable here for our team, but I hate to build yet another difficult-to-maintain external tool that essentially has to go through every transclusion. Better would be to somehow solve this in MediaWiki itself, which is a huge project. So I'm still leaning towards moving this to Larger suggestions. A hacky Toolforge tool doesn't sound like a suitable answer, in my opinion, especially if we know it can't ever run off of recent, live data. MusikAnimal (WMF) (talk) 17:24, 7 February 2023 (UTC)[reply]
    I was afraid this is too large. But - some users can give similar outupt from some database query in short time. Maybe some interface for less experienced users should do the same? JAn Dudík (talk) 19:18, 7 February 2023 (UTC)[reply]
    @JAn Dudík Can you give an example? I'm not aware of template parameters or their values being stored in the MediaWiki database. MusikAnimal (WMF) (talk) 01:53, 8 February 2023 (UTC)[reply]
  • I think DBPedia does this, but it only gets updated a few times a year. --Tgr (talk) 05:32, 5 February 2023 (UTC)[reply]
  • Question Question: @ MusikAnimal (WMF) I agree about the complication, as we are after database functionality in a wiki, and are maintaining the system using ad hoc maintenance. What about if we bit the bullet and used sql functionality at time of entry and update? and wikidata schemas to validate infoboxes at data entry? OpenStreetMap has some sort of valiation using wikidata, but it's regex based. A SQL structure is far simpler, and would be better suited for dynamic links so that we could get updates from external offical databases. Wakelamp (talk) 04:07, 7 February 2023 (UTC)[reply]
    @Wakelamp Something along those lines would be better, yes! I was thinking as far as validation, mw:Extension:TemplateStyles could better handle this, as that's what the community is used to using for specifying each parameter and values. At any rate, I think the project is too big for our team, as much as I'd love to work on it. MusikAnimal (WMF) (talk) 18:29, 7 February 2023 (UTC)[reply]
    I misread the original proposal - it's about a dump not live data
    But about the other issue - It's sad that it is too big. I was thinking that we may have some hidden constraints with the way we view wikipedia we
    1. everything must be done in wikipedia using wikis
    2. Everyting involving an article should be updated on the article.
    Wakelamp (talk) 10:36, 9 February 2023 (UTC)[reply]
  • @Bamyers99, Hello! How difficult is it to extend this tool (https://bambots.brucemyers.com/TemplateParam.php) to all wikis (or just some of them)? :) Your tool is really cool! Iniquity (talk) 23:10, 8 February 2023 (UTC)[reply]
    The tool can be extended to support other wikis (not all). It is hosted on my personal server (long story on why I moved it from Toolforge) so there are data size limitations and processing time constraints. A main purpose of the tool is to find template usage that does not conform to the TemplateData schema defined for a template. Plantdrew is correct about the tools limitations, however, the tool does report all non-conforming template usage even for highly used templates. — The preceding unsigned comment was added by Bamyers99 (talk) 02:05, 10 February 2023 (UTC)[reply]
    @Bamyers99, thanks for answer! Where can I request the addition of a new wiki? :) Iniquity (talk) 07:20, 10 February 2023 (UTC)[reply]
    @Iniquity: New wiki requests at en:User talk:Bamyers99. --Bamyers99 (talk) 14:43, 10 February 2023 (UTC)[reply]

Voting