Grants:Project/ContentMine/WikiFactMine/Midpoint

From Meta, a Wikimedia project coordination wiki


Report accepted
This midpoint report for a Project Grant approved in FY 2016-17 has been reviewed and accepted by the Wikimedia Foundation.
  • To read the approved grant submission describing the plan for this project, please visit Grants:Project/ContentMine/WikiFactMine.
  • You may still review or add to the discussion about this report on its talk page.
  • You are welcome to email projectgrants(_AT_)wikimedia.org at any time if you have questions or concerns about this report.



Welcome to this project's midpoint report! This report shares progress and learning from the grantee's first 3 months.

Summary[edit]

Thousands of new scientific papers are published each day. Even in just one topic area this is more than any human can read. The WikiFactMine project is using machines to 'read' a fraction of these papers published and extract facts which are connected to Wikidata items. These can then be used by editors to manually improve articles or items and in certain cases even suggest those improvements automatically such that they just need to be overseen by a human before they are included.

WikiFactMine is well on the way to support Wikipedia and Wikidata by mining the Open Access literature. A large number of different small tools have been written to support this effort. These have been built to support the community and are the result of feedback and outreach with them. Lots of effort has also gone into engaging a variety of communities both on and off Wiki. While at this point in the middle of the grant there aren't any statements semi-automatically added to Wikimedia projects there are already tools to let people start taking advantage of the mined facts for manual improvement. The infrastructure is also there for other developers to make tools from the work we're doing.

Mining has been underway for some months now with thousands of papers mined and 10s of thousands of facts available via the API.

Methods and activities[edit]

How have you setup your project, and what work has been completed so far?

Describe how you've setup your experiment or pilot, sharing your key focuses so far and including links to any background research or past learning that has guided your decisions. List and describe the activities you've undertaken as part of your project to this point.

Software Started/Developed/Set Up[edit]

  • canary-perch - This is a port of an existing ContentMine tool so that it can be run on WMF toollabs
  • karoo - This is a new tool to give a CLI interface to run on the Tool Labs GridEngine
  • wikifactmine-pipeline - The is the tool that runs the daily extraction on toollabs + keeps statistics about the run
  • WikiFactMine ElasticSearch Cluster on Labs
  • wikifactmine-api - This is the interface to query the facts we've mined
  • wikifactmine Swagger UI - This is the UI you see if you use your web browser to go to http://tools.wmflabs.org/wikifactmine-api/
  • FactVis - This is a simple way for a human to glance at facts from the API
  • Wikidata User Script - Way to see facts as you browse Wikidata
  • fatameh This is a tool to make Wikidata items about papers (so eventually we can add statements to them)

Community Outreach/Engagement[edit]

Lots of work done by Wikimedia in Residence User:Charles_Matthews who's a long time and prolific Wikimedian [1].

Blog Posts:

Date Link !
25th April The Trouble with Oxen
2nd May Blobs and Trees
9th May Distances Between Drugs
16th May Into The Unknown
23th May Metadata Merge
30th May Learned Machines
6th June Extract, Transform, Load
13th June Turning 14 and Encountering an Old Acquaintance
20th June Can You Recall Precisely?
27th June Reliability Ecology

In person events. We've either hosted or sent people to:

Midpoint outcomes[edit]

As of 2017-06-30 there are:

  • 2802924 facts available via the API
  • Made from approx 500k dictionary entries
  • We've attended 2 conferences
  • Run 2 Talks and Training Sessions
  • Run 1 Wikimedian Meetup.

Finances[edit]

Please take some time to update the table in your project finances page. Check that you’ve listed all approved and actual expenditures as instructed. If there are differences between the planned and actual use of funds, please use the column provided there to explain them.

Then, answer the following question here: Have you spent your funds according to plan so far? Please briefly describe any major changes to budget or expenditures that you anticipate for the second half of your project. Funds have been spent according to plan so far. There have been some additional costs/overheads that weren't anticipated in the grant but ContentMine Ltd. has been able to support these.

Learning[edit]

The best thing about trying something new is that you learn from it. We want to follow in your footsteps and learn along with you, and we want to know that you are taking enough risks to learn something really interesting! Please use the below sections to describe what is working and what you plan to change for the second half of your project.

What are the challenges[edit]

What challenges or obstacles have you encountered? What will you do differently going forward? Please list these as short bullet points.

  • We've had a variety of technical problems you can see discussed on the timeline
  • We had so many great applicants for the Wikimedian in Residence role hiring was a challenge. We were very grateful for the support WMUK provided to us in the hiring process. If doing this again I'd definitely reach out to a local WM chapter for help.

What is working well[edit]

What have you found works best so far? To help spread successful strategies so that they can be of use to others in the movement, rather than writing lots of text here, we'd like you to share your finding in the form of a link to a learning pattern.

Next steps and opportunities[edit]

What are the next steps and opportunities you’ll be focusing on for the second half of your project? Please list these as short bullet points. If you're considering applying for a 6-month renewal of this grant at the end of your project, please also mention this here.

  • Lots more technical work to be done:
    • More gadgets
    • Some tools to easy dictionary creation for less technical people. We hope this will lead to more dictionaries.
    • Integration with Primary-Sources tool
  • Continued Outreach for both awareness and feedback:
    • More training sessions in Cambridge
    • More Wikimedian Meetups
    • Wikimania 2017 Lecture
    • Possibly present at WikidataCon

Grantee reflection[edit]

Being a Wikimedia Grantee has been a great experience. We've had a lot of freedom to work on this project to achieve the maximum possible impact. We're all excited about the work to come.