Talk:Toolhub/Data model

From Meta, a Wikimedia project coordination wiki
Jump to navigation Jump to search

1.1.0 feedback[edit]

Overall it would be helpful if each property had a description associated with them. Most I could guess from their names, but it would be helpful if it documented e.g. whether the "description" property accepted any kind of markup.

  • Could you use a uri type instead of string for URLs?
  • Some code hosting sites use different URLs for access with a browser, and another for access with git. Debian handles this by having separate Vcs-Browser and Vcs-Git/Vcs-Svn/etc. fields. Would it make sense to include that distinction here? Or at least make it clear which one the repository field should contain.
  • What about tools that are wiki-agnostic and support non-Wikimedia wikis?
  • A help URL for user-facing documentation?
  • Screenshots/gallery for images? Maybe it could point to a Commons category?
  • Should the author field support URLs? MediaWiki does by using wikitext, for example. Composer has properties for each author.
  • A separate "maintainer" field to reflect the current maintainer(s), which may not be some of the authors?
  • Depending on how it gets used, it might make sense to have both a short and long description. Both Python and Debian (and probably others) do this.
  • Why is keywords a string?
  • Will name need to be globally unique across all tools? I think regardless there should be some unique id/way to refer to tools.
  • If some tool becomes obsolete and/or breaks, could a tool indicate that it replaces that functionality? Debian and Composer support a "replaces" field for this purpose.

Legoktm (talk) 07:55, 8 May 2018 (UTC)

Hello Legoktm, thank you for the feedback. Property descriptions are a good idea; I will add them for the next draft.
  • I've made the change; it will be included in draft 2.
  • In general I've noticed that when tool developers link to repositories, they just include one link, and it's only ever one type of repo – it's not like they have separate Git and Subversion repos. The repository field is a holdover from version 1.0.0 and it doesn't officially state what kind of repository it should be. For now I'm inclined to leave it as it is, but it would be great to know of use cases where having differentiated repository fields would be useful.
  • On the wiki page it states (but the schema should also be annotated to note) that * can be used to note tools that work on all wikis. (Likewise, you can have hostnames like *.wiktionary.org to refer to all Wiktionaries, for example.) Question: do you know of any tools that work only with non-Wikimedia wikis and not with Wikimedia wikis at all?
  • In addition to the JSON Schema there are additional annotations that can be made to tools; these are described at Toolhub/Data model#Annotations. One of these fields is for user-facing documentation. The idea is to support both official and unofficial documentation for tools, since in some instances the docs are written by volunteers. Should there also be a schema field for "official" documentation?
  • Screenshots and videos are also a part of the annotation specification.
  • author is a holdover from version 1.0.0, and as far as I can tell the author strings have to be just plain text with no links. Since 1.1.0 is supposed to maintain full backwards compatibility I don't think we can arbitrarily just decide to support links, but it's definitely something we could do in 2.0.0.
  • There is an annotation for official maintainer.
  • The description field, a required field from 1.0.0, is the long description to subtitle's short description (or should subtitle be renamed to something like summary?).
  • keywords is a string because it is a holdover from 1.0.0. I did not change it to support arrays of strings because it's deprecated in favor of Toolhub's more robust annotation system, so I decided to just leave it as-is.
  • As with the current version of the schema, name will need to be unique across all tools. I recommend prefixing/namespacing tool names to lower the risk of conflict.
  • A "replaced by" field is a good idea. Out of curiosity, how would you address such a replacement tool? By its name attribute? By a URL?
Harej (WMF) (talk) 16:18, 8 May 2018 (UTC)

Service usage[edit]

I'd love to see also some info on WMF service usage - e.g. if the tool uses Wikidata Query Service, I'd like to be able to locate such tools. Smalyshev (WMF) (talk) 19:52, 10 May 2018 (UTC)

Titles and description in different languages[edit]

One comment is to add support for more than one language in the Toolhub/Data_model. I would suggest something like

  "title": {
    "en:": "Wikimedia Tool",
    "fr": ""
  }
  "description": {
    "en": "A tool is a piece of software that helps facilitate contribution toward, or consumption of, a Wikimedia project, not including the core wiki software and its extensions",
    "fr": ...
}

John Samuel 07:22, 19 May 2018 (UTC)

Multilingual support is very important, but the way you suggest it would break backwards incompatibility. I think we could probably do two things:
  1. Introduce a syntax like this for the 2.0 version
  2. Introduce a separate key that includes multilingual labels. The current software ignores extra keys that are not in the spec.

For solution 2 i could imagine something like:

{
    "title" : "Wikimedia tool",
    "description" : "A tool is a piece of software that helps facilitate contribution toward, or consumption of, a Wikimedia project, not including the core wiki software and its extensions.",
    "i18n" : {
        "title": {
            "fr": "Outil Wikimedia"
        },
        "description" : {
            "fr" : "Un outil est un logiciel qui contribue à faciliter la contribution ou la consommation d'un projet Wikimedia, à l'exclusion du logiciel wiki principal et de ses extensions."
        }
    }
}

Husky (talk) 08:52, 19 May 2018 (UTC)

Husky, John Samuel, would it be adequate if tool records were translatable through a standard translation platform like Translatewiki, with the JSON files themselves remaining in one (configurable) language? My thinking is that it helps keep the records themselves to a reasonable size while also letting you tap in to an active translator community, but I would be eager to hear if my impression is mistaken. (This would also help establish which version is the "true" version in the event of a conflict.) Harej (WMF) (talk) 18:08, 24 May 2018 (UTC)
User talk:Harej (WMF), I think what you are suggesting is also fine. If I got it right, you will have multiple folders like fr/ en/ pl/ ml/ etc. Each of these folders will have the tool translation. Thus instead of having one single large file (with all the translations), you will have multiple folders with small translation file. John Samuel 20:03, 25 May 2018 (UTC)

Experimental: Yes/no[edit]

Something that you may want to do is introduce a bit more than a "yes/no" -- for example, if someone thinks it's purely experimental, an Alpha, which is unstable, but fairly functional, or a beta (most magnus tools) dont't rely on it, but know that it's fairly rigorous.Sadads (talk) 17:53, 24 May 2018 (UTC)

Sadads, how would you operationally distinguish between experimental and alpha, or alpha and beta? I like the idea of more granular levels of stability but I think it's important that the words used mean the same (or close enough) things to different people. Harej (WMF) (talk) 18:17, 24 May 2018 (UTC)

Wikidata[edit]

Many tools have items on Wikidata; QIDs should therefore feature in the data model. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 22:13, 28 May 2018 (UTC)

Pigsonthewing, good recommendation. I have added it to the data model. Thank you! Harej (WMF) (talk) 23:29, 28 May 2018 (UTC)

Tool type[edit]

Choice of web app, desktop app, bot, gadget, other

Other tools are tool-making tools such as PyWikibot. Written procedures such as [1]. Command-line tools such as [2]. -- GreenC (talk) 23:24, 3 June 2018 (UTC)

GreenC, I've added command line tool and coding framework as tool type options for draft 7. I'm holding off on adding written documentation as a tool type for now, since the scope of Toolhub is currently software tools, but it may be an option in the future. Harej (WMF) (talk) 02:31, 14 June 2018 (UTC)

Languages[edit]

"Supported languages: ISO 639 language strings like "zh" and "scn". If not defined, it is assumed the tool is only available in English."

So there currently is no way to specify that a tool supports all languages. That is not that uncommon, since some tools just add icons and/or change something with JS/CSS but don't add any text of there own. -- MichaelSchoenitzer (talk) 00:17, 6 June 2018 (UTC)

You can use "*" to refer to every language; I've made this clearer on the page. Harej (WMF) (talk) 19:40, 13 June 2018 (UTC)
I also realized that the schema technically did not allow "*" as a value; I've updated that for draft 7. Harej (WMF) (talk) 03:29, 14 June 2018 (UTC)

Main URL is for docs/homepage/actual tool?[edit]

I think for web-based tools, the URL field is the direct link to the actual tool, i.e. https://tools.wmflabs.org/toolname — what about e.g. CLI or desktop tools that are downloaded? I think the URL would be for the homepage of the tool, but then there's the "Documentation link" which would be the same URL (and which by the way perhaps should be called "Documentation URL" to match the other URL attributes? I dunno). Or am I just confused? :-) Sam Wilson 03:14, 14 June 2018 (UTC)

Sam, I updated "Documentation Link" to be "Documentation URL" for consistency. As for just plain "URL," it would link to the tool itself or a download page or some way of accessing the tool. A "documentation" link would be a link to guidance on how to use the tool, but not necessarily to the tool itself. It's something that I think should be judged on a case-by-case basis where there's no clear automatic answer. Harej (WMF) (talk) 03:45, 14 June 2018 (UTC)
Cool, yeah makes sense. So the two could be the same, for a download-only tool? Sam Wilson 05:48, 14 June 2018 (UTC)
The two could end up having the same value in some cases but I think a documentation link should link directly to a how-to page or a manual whereas the generic URL could just be to a home page. Harej (WMF) (talk) 21:08, 18 June 2018 (UTC)

Comments related to languages and translation[edit]

1) I don't think it makes sense to store the information about supported interface languages here. It changes too often, so it is not reasonable to maintain it by hand. Better would be just to link to a place which shows up to date information, for example Special:MessageGroupStats if the tool is translated on translatewiki.net.

2a) Already discussed a bit above, but the only case a where tool can claim to supports all languages would be that it doesn't have any linguistic elements. I bet this will be very rare, If there will be such tools, I would recommend using code zxx that is defined as "no linguistic content, not applicable" instead of *.

2b) There can be confusion between having the tool translated to language X vs. being able to use it on Xish language Wikipedia. A rewording to something like "Available user interface languages" can help to avoid a such confusion.

3) Also already discussed above, I would recommend against putting translations into the individual toolinfo.json files. We don't do that for extensions.json either. My recommendation would be that if a tool is build to explore the tools (haha), then that tool would be responsible for collecting all the titles, subtitles and descriptions together into one big json file, that could be translated separately for example in translatewiki.net. The other option would be to have those strings translated separately with the each project, but toolinfo.json would only have pointers where to find those translations. --Nikerabbit (talk) 13:01, 14 June 2018 (UTC)

  • @ 2a) There are two cases where a gadget can claim to support “all” languages:
    • The interface is graphical only, or other non-linguistic tricky icon things, and probably quite simple functionality.
    • The gadget is borrowing all messages from the regular MediaWiki core/extensions, as e.g. lintHint does, and can rely on available translations in some 200 languages and English fallback mechanism for the gaps, and perhaps receiving some standard phrases from translatewiki (could be a CORS issue depending on REST API).
    The filter options as promised need to specify for the searching user later that they do not speak any other language than French or Arabic and no other results than such with an appropriate user interface shall be offered.
  • @ 2b) The For wikis... item suggest to use *.wikisource.org or he.*.org to make clear that gadget functionality and purpose is limited to a Wikisource or an Hebrew or Thai project, e.g. since offering particular language depending scripting or translating, like Chinese into Pinyin or whatever.
  • The issue that there is not only one tool homepage URL but more than one, and the best match with the querying user language shall be offered rather than the one and only English URL is not yet addressed.
    • Same for accompanying material like the URL of screenshots or videos or feedback channel which are supposed to be English only and nothing else than English URL shall be presented.
  • For the people who shall update an extension of functionality or discontinued capabilities in tool description it seems to be a mess when definition pieces are spread over many places and not available for the tool author who knows best what did change in functionality or adding URL.
Still the multilingual support seems very unclear to me, so I am worrying Toolhub would be useful for people only with good skills in English, and therefore the announced tools as well.
Greetings --PerfektesChaos (talk) 16:07, 14 June 2018 (UTC)
Hello Nikerabbit, thank you for your feedback. Regarding manually listing supported user interfaces, do you think it would make sense to discourage people from providing that field manually if they also link to Translatewiki where the data could be automatically extracted? What should the behavior actually be like in a conflict – should the manually inputted field be ignored, or should it act as a manual override? As for the name, I've updated it to available_ui_languages for the next draft. Harej (WMF) (talk) 03:22, 19 June 2018 (UTC)
It doesn't feel a big concern to me. If an API URL exists, I would use it over the static list. --Nikerabbit (talk) 14:37, 19 June 2018 (UTC)