Grants:IdeaLab/Datamining from ZooBank & automation for synonyms

From Meta, a Wikimedia project coordination wiki
Datamining from ZooBank & automation for synonyms
Get all taxa names from ZooBank. Notification about missing synonyms, notification about missing redirects from synonyms.
contact emailSnek01 (talk) 21:13, 26 March 2016 (UTC) (email on request, then feel free to contact me via Skype or via Facebook for discussion)
idea creator
created on21:13, Saturday, March 26, 2016 (UTC)

Project idea[edit]

What is the problem you're trying to solve?[edit]


ZooBank (en:ZooBank) has the names of all new taxa of animals since 2008. It is a very reliable resource. Officially all new names of taxa must be included there to be valid. When a new species is published, then the following data are included in the ZooBank:

  • scientific name including the authority and year of description
  • Rank: if it is a species of genus or someting else
  • Parent: to what genus the species belong to
  • Specific Name: what is the specific name name
  • Authorship: who described the species
  • Publication: the reference
  • Page: on what page was the fist mention of this new species
  • Figure(s):
  • Type Specimen(s):
  • Type Locality:
  • Fossil: there is "no", if the species is extant; there is "yes" if the species if extinct.

Zoobank is openacces, so there is supposedly some way, of data mining. I do not know it. Somebody should examine it.


When a species is moved to another genus, then its original name will became a synonym and the authority is written in parenthesis.

For example Linnaues described tiger as Felis tigris Linnaeus, 1758. Then tiger was moved to genus Panthera and tiger obtained the scientific name Panthera tigris (Linnaeus, 1758). Its original name has become a synonym.

  • Find out the automatic way for notification, when the synonym is missing in Taxobox.
  • Find out the automatic way for notification, when synonym is in the taxobox, but when there is no redirect from its synonymous name.

What is your solution?[edit]


All new species should be included in Wikipedia as separate articles. Fossil species should be included in an articles about its genus. Therefore wikipedians should be informed, what new species were described. For example there are manually created lists such as en:List of gastropods described in 2015, and so on. It would be very helpful, if we could have automatically created working lists. Optimally list of new gastropods for Wikiproject gastropods, list of new birds for Wikiproject birds, and so on. - Imagine the situation, when a new species is published in a closed access journal and when majority of wikipedians have no access to it. Then we do not know, that such a new species exist and the ZooBank is the only resource for the name of the species and for its reference.

Task 1: Examine the way, how the data from ZooBank could be used semi-automatically or automatically (for Wikipedia or its wikiprojects, or for Wikidata, or for Wikispecies).


Example 1: A user will start an article about a species. He will write the authority like this:

| binomial_authority = (Linnaeus, 1758)

but he will write nothing in the row with synonyms:

| synonyms =

The bot will scan the new article. The bot will recognize parenthesis in the binomial authority. The bot will recognize the empty synonyms section. If there is a parenthesis and if there is an empty synonyms section, then the botwill see, that there is missing at least one synonym. The bot will inform the user, that there is a missing synonym in his new article (in a similar way how DPL bot is informing about disambiguation links).

Example 2: A user will add a synonym. The bot will scan the article. The bot will see that there is at least one synonym so that there should be at least one redirect from its synonymous name. If the bot will find no redirect to the article, then it will inform the author, that a redirect is mssing from the synonym.

Those are very simple tasks for a bot. There are probably a few more ways how to improve it, but it probably depends on how synonyms are written. Therefore there is preferred way to inform editors. A bot can only guess that there is something missing. But a bot cannot be sure and a bot can not correct it by itself in the examples that I described above.

Project goals[edit]

As a result we will have a list of new animal species, and we will have redlinks to new unwritten articles about animal species on Wikipedia.

The editor, will be notified, if synonyms are missing. Editors will learn, that when they will add synonyms, then they should also create redirects from synonyms.

Get involved[edit]


  • Advisor I can only test results, check out results for data related to gastropods. Snek01 (talk) 21:13, 26 March 2016 (UTC)
  • volunteer1= Only for data related to gastropods JoJan (talk) 14:54, 25 October 2016 (UTC)
  • volunteer2 - for the wiki-part --Martin Urbanec (talk) 06:07, 31 October 2016 (UTC)


Expand your idea[edit]

Would a grant from the Wikimedia Foundation help make your idea happen? You can expand this idea into a grant proposal.

Expand into an Individual Engagement Grant
Expand into a Project and Event Grant