Wiktionary Cognate Dashboard

From Meta, a Wikimedia project coordination wiki

Welcome to the Wiktionary Cognate Dashboard documentation!

The Dashboard enables the Wiktionary editors and wider community to determine what is missing from their projects and where it can be found.

Overview[edit]

The Wiktionary Cognate Dashboard is developed by Wikimedia Germany in order to provide the Wiktionary editors and community with the following functionalities:

  • to be able to detect the most interlinked Wiktionary entries not having a page on a particular Wiktionary;
  • to be able to determine the number of interlinks between each possible pair of Wiktionaries
  • to be able to determine what entries do not have a page on a particular Wiktionary and on how many other Wiktionaries these entries do have a page;
  • to be able to compare a particular Wiktionary to another one and retrieve the entries for which the former does not have a page while there is an existing page in the later;
  • to visualize essential information about the relationships between different Wiktionaries.

All of the dashboard's features are organized in tabs (i.e. the dashboard sections that are always present in the navigation on the left hand side). The following sections describe one by one tab, its intended use and functionality. Concise instructions closely along the following lines are provided on the Dashboard itself.

My Wiktionary tab[edit]

Upon selecting a particular Wiktionary from the drop-down menu, the dashboard will generate three outputs: two charts to the left (in a vertical arrangement), and one table to the right.

The first chart (blue line) will present the top 25 Wiktionaries with which the selected Wiktionary shares a highest number of links. The Wiktionaries are placed on the horizontal axis and represented by the respective two-letter language codes. The vertical axis represents the number of links shared with the selected Wiktionary.

Decreasing statistical curve
Wiktionary Cognate Dashboard: Figure 1.

The second chart (red line), generated immediately bellow the first one, will present the 25 Wiktionaries with which the selected Wiktionary shares a lowest number of links. The Wiktionaries are again placed on the horizontal axis, while the vertical axis again represents the number of links shared.

Decreasing statistical curve
Wiktionary Cognate Dashboard: Figure 2.

The Wiktionary Links Dataset table, generated right next to the two charts, encompasses three columns: (1) Source, which is always the selected Wiktionary, (2) Target, which presents all other Wiktionaries, and (3) Num. Links, which reports the number of links shared between the Source and the Target. The Download (csv) button enables the user to download the full table.

The table can be sorted by any of the columns and searched by Wiktionaries or numerical values of the Num. Links column. It initially presents 25 Wiktionaries sorted in a descending order by the number of links they share with the Source (selected) Wiktionary. The user can browse the table page by page (using the buttons placed right bellow the table's bottom edge) or decide to present more than 25 items per page (the Show entries button placed right above the table).

Hubs tab[edit]

The tab will generate a network of Wiktionaries where each node represents a Wiktionary and is marked by the respective two-letter language code. Each node (Wiktionary) points to the three Wiktionaries with which it shares the most links. The thicker the line connecting one Wiktionary to another, the more links are shared between them.

Thus we obtain a network that helps us recognize important hubs - Wiktionaries that attract the most arrows in a network are those with which many other Wiktionaries share a highest number of links. The size of each node represents exactly that information: the larger the node representing a particular Wiktionary, the more Wiktionaries point towards it because it is among the Wiktionaries that they share the highest number of links with.

Users can interact with the network which is complex because of the number of nodes and links represented: nodes can be dragged and moved around to isolate a particular Wiktionary and determine its "neighborhood" (the Wiktionaries pointing towards it, or receiving links from it). Clicking a particular Wiktionary will select only its immediate "neighborhood" and put the remaining parts of the network in a shadowed background. Use mouse wheel to zoom in or out the network visualization. Users can choose to select a specific Wiktionary to inspect from the Select by label drop-down menu.

Wiktionaries that are represented by the same node color are found to have a similar pattern of linkage with other Wiktionaries.

Wiktionary Cognate Dashboard: Figure 3.

Anti-Hubs tab[edit]

The tab offers exactly the same functionality as the Hubs tab, except that this time the Wiktionaries point towards the three other Wiktionaries in the network with which they share the least numbers of links. Thus we can recognize the Wiktionaries that are least developed in this sense.

Links Dataset tab[edit]

The tab enables the user to browse, sort, search and download the full Interlinks dataset. The table presented encompasses three columns: Source, Target, and Num. Links, with semantics identical to those described in the My Wiktionary tab section of this documentation. Each possible pair of Wiktionaries is listed (each pair being presented twice in the table, depending on the position of the Wiktionaries in the pair as being referenced as a Source or a Target). By using the search fields placed immediately above the table's column headers users can sort the data exactly to match the needs of inspecting a particular Wiktionary as a Source (Target) of links for (from) other Wiktionaries.

I Miss You tab[edit]

The users can select a particular Wiktionary from the drop-down menu. The dashboard will generate a table encompassing the top 1,000 entries found in other Wiktionaries that are absent from the selected project. Once again, the Download (csv) button above the table brings the table to your desktop as a comma separated file (it can be opened by any spreadsheet software like LibreOffice Calc, for example).

Compare tab[edit]

The user first needs to select a Source, and then a Target Wiktionary from the respective drop-down menus, and then click the Generate button. The Dashboard will generate a table of all entries that are found in the Target, but not in the Source Wiktionary. Again, Download (csv) will enable the user to work with the result locally as a comma separated file, and using this option is strongly advised in any comparison that includes any large Wiktionary (i.e. one maintaining a large number of entries).

This feature is still experimental. It delivers accurate results, however, it takes some time to compare large Wiktionaries, while it is fast for small to moderately sized Wiktionaries. Until further improvement users will need to be patient (and follow the Dashboard feedback during the comparison operation which will be provided in the right bottom corner of the screen once the Generate button is clicked). We are actively working to improve the efficiency of the comparison operations.

Most popular tab[edit]

This table presents the entries that have the biggest number of links in Cognate. For performance reasons, the list is constrained to encompass only the entries that appear in at least 10 Wiktionaries.

More information[edit]

The Wiktionary Cognate Dashboard is developed in the programming language R and supported by the RStudio Shiny technology and free, open source version of the RStudio Shiny Server running the Dashboard's front-end. The dashboard draws essential information on Wiktionaries from the database of the MediaWiki Cognate extension - hence the name Wiktionary Cognate Dashboard. Some of the dashboard's features still need improvements.

You can also access directly to the public datasets of Cognate.

Feedback[edit]

If you have any problem, question, or suggestion of new features, please use the talk page to provide your feedback.