Grants:IEG/WikiBrainTools/Midpoint

From Meta, a Wikimedia project coordination wiki


Welcome to this project's midpoint report! This report shares progress and learnings from the Individual Engagement Grantee's first six months.

Summary[edit]

In a few short sentences or bullet points, give the main highlights of what happened with your project so far.

  • We have engaged the Wikipedia AI research community through email and IRC discussions to identify the WikiBrain features that are most valuable to the Wikimedia community of bots and researchers.
  • We have undertaken algorithmic software development in the WikiBrain open source software project to enhance aspects of the project that are most appealing to the Wikimedia community. In particular, we completed work to adapt existing algorithms to all 279 Wikipedia language editions.
  • We have launched a public, beta version of the [shilad.github.io/wikibrain/tutorial/web-api.html WikiBrain Web API].
  • We have traveled to present research that uses WikiBrain at IJCAI 2015, one of the top conferences in AI.

Methods and activities[edit]

How have you setup your project, and what work has been completed so far?

Our midpoint goal was to release a beta version of the WikiBrainTools web API. To do so we:

  • Engaged developers on the #wikimedia-ai IRC channel to identify WikiBrain features valuable to Wikipedia bots and researchers.
  • Adapted WikiBrain's natural language processing (NLP) algorithms to make them effective for languages that have no standard training data. To do so, we created simplified versions of several WikiBrain components that use supervised machine learning algorithms, and created an algorithm that generates synthetic NLP datasets based on co-occurrence patterns.
  • Created a web-server API for WikiBrain that exposes WikiBrain functionality.
  • Developed software in the open source WikiBrain project.
  • We optimized the performance of several parts of the WikiBrain API that were used by the WikiBrain web server.
  • Setup a dedicated WikiBrain server in the Wikimedia Labs environment.
  • Traveled to IJCAI 2015, one of the top Artificial Intelligence conferences, to present a paper by Sen et al. based on WikiBrain. Promoted WikiBrain as a platform to AI researchers.

Describe how you've setup your experiment or pilot, sharing your key focuses so far and including links to any background research or past learning that has guided your decisions. List and describe the activities you've undertaken as part of your project to this point.

  • When possible, we have used and extended the algorithms that already exist in WikiBrain, described in Sen et al., 2014.
  • We added a more accurate Wikification algorithm that identifies Wikipedia articles described in passages of free text.
  • We extended our Semantic Relatedness algorithms that generate a score estimating the strength of the relationship between two articles or phrases to a multi-lingual setting. Hecht et al., 2012 was our guiding reference for this work.


Midpoint outcomes[edit]

What are the results of your project or any experiments you’ve worked on so far?

  • We have added new code to the WikiBrain open source project, as described above.
  • We have trained natural language processing (NLP) algorithmic models needed for WikiBrain for all 279 language editions.
  • We have deployed a beta version of the WikiBrain Web API and developed documentation for it.

Finances[edit]

Please take some time to update the table in your project finances page. Check that you’ve listed all approved and actual expenditures as instructed. If there are differences between the planned and actual use of funds, please use the column provided there to explain them.

Then, answer the following question here: Have you spent your funds according to plan so far? Please briefly describe any major changes to budget or expenditures that you anticipate for the second half of your project.

Learning[edit]

The best thing about trying something new is that you learn from it. We want to follow in your footsteps and learn along with you, and we want to know that you are taking enough risks to learn something really interesting! Please use the below sections to describe what is working and what you plan to change for the second half of your project.

What are the challenges[edit]

What challenges or obstacles have you encountered? What will you do differently going forward? Please list these as short bullet points.

  • Not exactly a challenge, but we were delayed by work on related projects that we hoped to roll into the WikiBrain Web API. We've now put these projects on hold while completing the work for this grant, but we are excited to integrate them when they are ready!
  • It has been challenge to fit WikiBrain into the resource constraints of the Wikimedia labs environments. This has required restructuring and optimization of the WikiBrain API.

What is working well[edit]

What have you found works best so far? To help spread successful strategies so that they can be of use to others in the movement, rather than writing lots of text here, we'd like you to share your finding in the form of a link to a learning pattern.

Next steps and opportunities[edit]

What are the next steps and opportunities you’ll be focusing on for the second half of your project? Please list these as short bullet points. If you're considering applying for a 6-month renewal of this IEG at the end of your project, please also mention this here.

  • We will publicize the beta API service and seek feedback from researchers and bot developers.
  • We will add two additional API methods based on feedback.
  • We will improve the speed of one slow API call (mostSimilar).
  • We will deploy the service within the Wikimedia Labs environment. To do so we will need to partition the service by language.
  • We will create a Python library that connects to the API service.

Shilad is in the midst of a related project to develop a cross-lingual semantic relatedness model that jointly models all languages. This will overcome sparsity issues in smaller Wikipedia language editions. We may request a 6-month extension for this work.

Grantee reflection[edit]

We’d love to hear any thoughts you have on how the experience of being an IEGrantee has been so far. What is one thing that surprised you, or that you particularly enjoyed from the past 3 months?

I've been delighted at the helpfulness and camaraderie I've been shown in the #wikimedia-labs and #wikimedia-ai IRC channels!