Toolhub/Data model

From Meta, a Wikimedia project coordination wiki
Jump to navigation Jump to search

The goal of Toolhub is to make it easier for Wikimedians to find tools to use in their work. This data model describes what pieces of information the Toolhub collects and organizes to assist with that goal.

Summary[edit]

Below is a table of the various pieces of information that can be used to describe a tool in Toolhub. Note that this table omits some metadata fields that don't aid tool discovery, and in general glosses over the implementation details. The #Technical details section below describes how this works in greater detail.

Name Description Note
Basic information
Name
(Required)
Unique identifier The name needs to be unique across all tools. Tool developers are encouraged to use namespaces/prefixes to reduce the risk of clashes.
Title
(Required)
User friendly name
Description
(Required)
Tool description This should be around 3-5 sentences describing the tool, what it's used for, etc.
URL
(Required)
Link to where the tool is found
URL alternates Alternate links to the tool, e.g. for differing tool translations
Author Name of the tool developer
Official maintainer User(s) with the ability to communicate official updates about the tool Subject to a verification process
Sponsor Organization that sponsored this tool's development Example: Wikimedia Sverige has sponsored the development of several tools.
Subtitle Brief pitch for tool. Longer than title, briefer than description Longer than a title but shorter than a description. Currently has a character limit of 250 characters.
Bot username If the tool is a bot, the username of the bot
Tool type Which type of tool? Web app, bot, gadget? Choice of web app, desktop app, bot, gadget, other
Icon Small icon that represents the tool Probably should be a Commons URL
Wikidata item The Wikidata item ID for the tool Example: Q4063270 for AutoWikiBrowser.
OpenHub ID The OpenHub ID for the tool Given a project URL "https://openhub.net/p/foo", the project name is foo.
When and where to use the tool
For wikis... Which wikis should this be used on? Use hostnames Examples: * (all wikis), fr.wikipedia.org (French Wikipedia), *.wikisource.org (all Wikisources), www.wikidata.org (Wikidata)
Use cases Different tasks that can be performed with this tool See #Tool use cases
Audiences Applicable audiences of people Must be one of the standard Wikimedia Resource Center audiences; see #Audiences.
Related topics Concepts that are related to this tool Use Wikidata items
Collections This tool is a part of the following collections Tools can be organized into arbitrary groupings for ease of reference
Supported languages Languages the tool interface supports ISO 639 language strings like "zh" and "scn". Use "*" if a tool is effectively available in all languages. If not defined, it is assumed the tool is only available in English.
Feedback URL Where people should go to leave feedback
Broken Yes/no flag for community members to indicate whether a tool is broken
Experimental Yes/no flag to mark that the tool is unstable and can change at any time
Deprecated Yes/no flag to mark that use of the tool is officially discouraged This allows tools to be indexed while also helping to filter them out of search results
Replaced by If a tool is deprecated, a link to the replacement tool
Tool use guidance
Documentation URL Link to documentation Can include official and user-generated docs
Screenshots Commons filename of screenshot Still-image preview of the tool
Video Commons filename for video Video related to the tool, including an introduction or a tutorial
Privacy policy URL Link to applicable privacy policy
Additional information Supplementary descriptive text
For developers
License Description of software license Use SPDX identifiers
Repository Git repository where the code lives
Technologies used Technological concepts used in implementing the tool Includes programming languages, data standards, etc.
API URL Link to the tool's API if it exists
Developer documentation URL Developer documentation URL
Bug tracker URL Link to bug tracker on Phabricator, Trello, GitHub, etc.
Volunteering
Volunteer (user assistance) These users offer to help people use the tool
Code maintainer These users offer to help with maintaining code
Testers People who have signed up to test new versions of tools
Translate URL Link to translation workflow

Controlled vocabularies[edit]

Some data fields are represented by controlled vocabularies. A controlled vocabulary limits you to a pre-determined set of options which helps ensure that tools are described consistently. They should be able to change as needed but ideally don't change too often.

Audiences[edit]

This refers to the audience categories in the Wikimedia Resource Center, which currently are:

  • For program coordinators
  • For contributors
  • For developers
  • For affiliate organizers

Tool use cases[edit]

Use cases for tools are represented by a controlled vocabulary meant to represent different purposes a tool may serve. Tools can have multiple use cases.

To put it briefly, tools can be used for developing or consuming content, facilitating interactions among community users, writing code, and organizing projects. With respect to content-related tools, the type of content is treated separately from the thing done with the content; appropriate Wikimedia projects to use a given tool on are represented through a separate tool attribute.

  • Content format
    • Content pages (encyclopedia articles, original texts)
    • Media (images, videos, sound recordings)
    • Data (Wikidata items, structured file data)
    • Code
    • Templates
    • Documentation
  • Contributors
    • Prepare
      • Research
      • Collection curation (curating datasets, curating image sets)
    • Create
      • Page creation
      • Uploading
      • Drafting
    • Change
      • Annotating
      • Expanding
      • Copyediting
      • Formatting
      • Illustrating
      • Renaming
      • Merging
      • Splitting
      • Categorizing
      • Format conversion (e.g. OCR, video conversion)
    • Quality assurance
      • Copyright management
      • New page patrolling
      • Recent changes patrolling
      • Maintenance tagging
      • Assessment
    • Destroy
      • Reverting
      • Deleting
      • Suppressing
  • Interacting with users
    • Socializing users
      • Welcoming
      • Training and mentoring
      • Counseling and social support
    • Conduct
      • Reverting
      • Warning
      • Blocking
      • Dispute resolution
    • Other
      • Assistance (solving specific problems)
      • Talk page discussion
      • User rights (admin, rollback, etc.)
      • User activity analysis
  • Developers
    • APIs
    • Coding environments
    • Data services
    • Productivity tools
    • Tool development kits
    • Wikimedia operational tools
  • Organizers
    • Online project planning (WikiProjects, etc.)
    • Event planning
    • Contest organizing
    • Governance
    • Learning and evaluation
    • Worklist development
    • Project communication
    • Partnership development
  • Consumers
    • Reading
    • Data and metrics
    • Visualization and remixing
    • Large-scale content analysis

Technical details[edit]

This section mostly serves to document technical implementation details. You don't need to know most of this stuff for day-to-day use.

Toolhub's data model is split into two parts: the tool record and its annotations. Any tool described in Toolhub has a fundamental tool record containing basic information such as the tool name and author. Tool developers have their choice of submitting tool records directly through Toolhub's interface, or by compiling the information in a toolinfo.json file that can be accessed anywhere on the web, including the tool itself or a Git repository, that is then crawled by Toolhub. Note that if tool record data is stored externally in toolinfo.json files, it can only be edited there. We encourage tool developers to host their toolinfo.json files as part of a Git repository so that volunteers can submit pull requests. (Note that the opposite is true as well: if the tool record was originally submitted through Toolhub, it can only be edited through Toolhub. By allowing either one or the other we avoid the risk of conflicts.)

Once Toolhub receives a tool record, volunteers can submit annotations to it. Annotations help make it easier to find tools by supplying additional information. Annotations cannot be stored in toolinfo.json files; they are meant to always be editable by community members.

Some parts of the data model rely on controlled vocabularies, where a field can only be defined using one of several pre-defined terms. Those are described above in #Controlled vocabularies.

Toolinfo schema[edit]

Version 1.0.0[edit]

Hay's Tool Directory established a de facto standard for describing Wikimedia tools using JSON files. This standard has been retroactively established as version 1.0.0 of the toolinfo JSON schema.

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "title": "Wikimedia Tool",
  "description": "A tool is a piece of software that helps facilitate contribution toward, or consumption of, a Wikimedia project, not including the core wiki software and its extensions",
  "version": "1.0.0",
  "authors": [
    "Hay Kranen",
    "James Hare"
  ],
  "definitions": {
    "tool": {
      "type": "object",
      "properties": {
        "name": {
          "type": "string"
        },
        "title": {
          "type": "string"
        },
        "description": {
          "type": "string"
        },
        "url": {
          "type": "string"
        },
        "keywords": {
          "type": "string"
        },
        "author": {
          "type": "string"
        },
        "repository": {
          "type": "string"
        }
      },
      "required": [
        "name",
        "title",
        "description",
        "url"
      ]
    }
  },
  "oneOf": [
    {
      "type": "array",
      "items": {
        "$ref": "#/definitions/tool"
      }
    },
    {
      "type": "object",
      "$ref": "#/definitions/tool"
    }
  ]
}

Version 1.1.1[edit]

Version 1.1.0, published on 30 June 2018, updated the schema with new fields while maintaining full backwards compatibility with the previous schema.

Version 1.1.1, published on 13 October 2018, corrects a typographical error from 1.1.0.

The JSON Schema is below.

{
  "$schema": "http://json-schema.org/draft-04/schema#",
  "id": "https://tools.wmflabs.org/toolhub/schema/1.1.1",
  "title": "Wikimedia Tool",
  "description": "A tool is a piece of software that helps facilitate contribution toward, or consumption of, Wikimedia projects and associated data, not including the core wiki software and its extensions",
  "version": "1.1.1",
  "authors": [
    "Hay Kranen",
    "James Hare"
  ],
  "definitions": {
    "tool": {
      "type": "object",
      "properties": {
        "name": {
          "type": "string",
          "description": "Unique identifier for tools. Must be unique for every tool. It is recommended you prefix your tool names to reduce the risk of clashes."
        },
        "title": {
          "type": "string",
          "description": "Human readable tool name. Recommended limit of 25 characters."
        },
        "subtitle": {
          "type": "string",
          "maxLength": 250,
          "description": "Longer than the full title but shorter than the description. It should add some additional context to the title."
        },
        "openhub_id": {
          "type": "string",
          "description": "The project ID on OpenHub. Given a URL https://openhub.net/p/foo, the project ID is `foo`."
        },
        "description": {
          "type": "string",
          "description": "A longer description of the tool. The recommended length for a description is 3-5 sentences. Future versions of this schema will impose a character limit."
        },
        "url": {
          "type": "string",
          "format": "uri",
          "description": "A direct link to the tool or to instructions on how to use or install the tool."
        },
        "url_alternates": {
          "type": "array",
          "items": {
            "$ref": "#/definitions/url_multilingual"
          }
        },
        "keywords": {
          "type": "string",
          "description": "Comma-delineated list of keywords. This parameter is deprecated and will be removed in the next version."
        },
        "author": {
          "type": "string",
          "description": "The primary tool developer."
        },
        "repository": {
          "type": "string",
          "format": "uri",
          "description": "A link to the repository where the tool code is hosted."
        },
        "bot_username": {
          "type": "string",
          "description": "If the tool is a bot, the Wikimedia username of the bot. Do not include 'User:' or similar prefixes."
        },
        "deprecated": {
          "type": "boolean",
          "default": false,
          "description": "If true, the use of this tool is officially discouraged. The `replaced_by` parameter can be used to define a replacement."
        },
        "replaced_by": {
          "type": "string",
          "format": "uri",
          "description": "If this tool is deprecated, this parameter should be used to link to the replacement tool."
        },
        "experimental": {
          "type": "boolean",
          "default": false,
          "description": "If true, this tool is unstable and can change or go offline at any time."
        },
        "for_wikis": {
          "oneOf": [
            {
              "type": "array",
              "items": {
                "$ref": "#/definitions/wiki"
              }
            },
            {
              "$ref": "#/definitions/wiki"
            }
          ],
          "default": "*",
          "description": "A string or array of strings describing the wiki(s) this tool can be used on. Use hostnames such as `zh.wiktionary.org`. Use asterisks as wildcards. For example, `*.wikisource.org` means 'this tool works on all Wikisource wikis.' `*` means 'this works on all wikis, including Wikimedia wikis.'"
        },
        "icon": {
          "$ref": "#/definitions/commons_file",
          "description": "A link to a Wikimedia Commons file description page for an icon that depicts the tool."
        },
        "license": {
          "$ref": "https://tools.wmflabs.org/spdx/schema/licenses.json#/definitions/license",
          "description": "The software license the tool code's is available under. Use a standard SPDX license keyword."
        },
        "sponsor": {
          "$ref": "#/definitions/string_or_string_array",
          "description": "Organization that sponsored the tool's development."
        },
        "available_ui_languages": {
          "oneOf": [
            {
              "type": "array",
              "items": {
                "$ref": "#/definitions/language"
              }
            },
            {
              "$ref": "#/definitions/language"
            },
            {
              "type": "string",
              "enum": [
                "*"
              ]
            }
          ],
          "default": "en",
          "description": "The language(s) the tool's interface has been translated into. Specify this field manually only if the tool does not handle interface translation through translatewiki.net. Use ISO 639 language codes like `zh` and `scn`. If not defined it is assumed the tool is only available in English."
        },
        "technology_used": {
          "$ref": "#/definitions/string_or_string_array",
          "description": "A string or array of strings listing technologies (programming languages, development frameworks, etc.) used in creating the tool."
        },
        "tool_type": {
          "type": "string",
          "enum": [
            "web app",
            "desktop app",
            "bot",
            "gadget",
            "user script",
            "command line tool",
            "coding framework",
            "other"
          ],
          "description": "The manner in which the tool is used. Select one from the list of options."
        },
        "api_url": {
          "type": "string",
          "format": "uri",
          "description": "A link to the tool's API, if available."
        },
        "developer_docs_url": {
          "oneOf": [
            {
              "type": "array",
              "items": {
                "$ref": "#/definitions/url_multilingual"
              }
            },
            {
              "type": "string",
              "format": "uri"
            }
          ],
          "description": "A link to the tool's developer documentation, if available."
        },
        "feedback_url": {
          "oneOf": [
            {
              "type": "array",
              "items": {
                "$ref": "#/definitions/url_multilingual"
              }
            },
            {
              "type": "string",
              "format": "uri"
            }
          ],
          "description": "A link to where tool users can leave feedback."
        },
        "privacy_policy_url": {
          "oneOf": [
            {
              "type": "array",
              "items": {
                "$ref": "#/definitions/url_multilingual"
              }
            },
            {
              "type": "string",
              "format": "uri"
            }
          ],
          "description": "A link to the tool's privacy policy, if available."
        },
        "translate_url": {
          "type": "string",
          "format": "uri",
          "description": "A link to the tool translation interface."
        },
        "bugtracker_url": {
          "type": "string",
          "format": "uri",
          "description": "A link to the tool's bug tracker on GitHub, Bitbucket, Phabricator, etc."
        },
        "toolinfo_version": {
          "type": "integer",
          "default": 1,
          "description": "The major version number of the Toolinfo schema used. The default value assumed is 1, referring to versions 1.0.0 and 1.1.0."
        },
        "toolinfo_language": {
          "$ref": "#/definitions/language",
          "default": "en",
          "description": "The language the toolinfo record is written if, if not the default value of English. Use ISO 639 language codes."
        }
      },
      "required": [
        "name",
        "title",
        "description",
        "url"
      ]
    },
    "wiki": {
      "type": "string",
      "pattern": "^(%5C*|(.*)?%5C.?(mediawiki|wiktionary|wiki(pedia|quote|books|source|news|versity|data|voyage|tech|media|mediafoundation))%5C.org)$"
    },
    "commons_file": {
      "type": "string",
      "format": "uri",
      "pattern": "^https://commons.wikimedia.org/wiki/File:.+%5C..+$"
    },
    "language": {
      "type": "string",
      "pattern": "^(x-.*|[A-Za-z]{2,3}(-.*)?)$"
    },
    "url_multilingual": {
      "type": "object",
      "properties": {
        "language": {
          "$ref": "#/definitions/language"
        },
        "url": {
          "type": "string",
          "pattern": "uri"
        }
      }
    },
    "string_or_string_array": {
      "oneOf": [
        {
          "type": "string"
        },
        {
          "type": "array",
          "items": {
            "type": "string"
          }
        }
      ]
    }
  },
  "oneOf": [
    {
      "type": "array",
      "items": {
        "$ref": "#/definitions/tool"
      }
    },
    {
      "$ref": "#/definitions/tool"
    }
  ]
}

Annotations[edit]

These are additional pieces of data that can be used to describe tools. Annotations cannot be submitted through toolinfo files; they are meant to be submitted through Toolhub only.

Current planned annotations include:

  • Additional info – expands on the tool description.
  • Audiences – Wikimedia Resource Center audiences.
  • Broken – yes/no flag to indicate that a tool is no longer working, with the username of the person making that report and an associated report.
  • Collections – community-curated groupings of tools.
  • Documentation URL – link to user documentation, including both official documentation and user-generated documentation.
  • Official maintainer – the people who are currently responsible for maintaining the tool's code.
  • Related topics – links between tools and Wikidata items as another way of describing tools.
  • Screenshots – visual aids showing the tool in use.
  • Testers – people who have signed up to test new versions of the tool.
  • Use cases – controlled vocabulary outlining different uses for tools.
  • Video – tutorials and other such audio-visual guides.
  • Volunteer (user assistance) – people who have volunteered to help other users with using the tool.
  • Wikidata item ID – the Wikidata item ID for the tool.

Automated data inputs[edit]

Automatically generated data will help factor into tool relevance. Note that not all of these will be available right away, nor will they be available for every tool.

  • Tool availability – is the tool up? When was the last time it was up? How often is the tool down?
  • Translators – credits for translation, based on translatewiki.net statistics.
  • Total gadget users – based on data from the wikis
  • Active gadget users – based on data from the wikis
  • Web hits – for Toolforge tools, based on data from Toolforge
  • Unique devices – nice to have, but would probably be harder to accomplish in practice
  • Last updated – based on changes to git repository, probably?
  • Wikis where used – for gadgets
  • Toolforge maintainers