Timeline for ContentMine
We're working on putting times to tasks/milestones. See Grants:Project/ContentMine/WikiFactMine/Planning for tasks without times.
|Advertise for Wikimedian in Residence||31/01/2017|
|Port Canary to Tool Labs (T5 from Proposal)||31/02/2017|
|1000 new dictionary entries from WiR outreach||30/04/2017|
|Graphical Tool to ingest weekly feeds and present to editors||15/05/2017|
|Gadget to suggest journal article relevant to Wikipedia article/Wikidata Item||30/06/2017|
|100 people attended in person or online events relating to linking the scientific literature, Wikidata and Wikipedia||30/07/2017|
|Gadget to suggest references for Unreferenced Wikidata statement||30/08/2017|
|Suggested "Main Topic" Statements Via Primary Sources Tool||30/10/2017|
- 1 Timeline for ContentMine
- 2 Monthly updates
This month we've just been getting started:
- Development of the API: initial version of date querying with dummy data available on tools now .
- Getting permissions to use the ES cluster on tool labs see: T149709
- Working towards WiR details
Progress on the software development front:
- Using the ES cluster on tools as the back end for the API (sample data is available on the date 2016-12-09)
- Searching by 'ingestion date' using the API is now supported
- Basic gui interface to date searching is now available on the 'FactVis' demo (see https://tarrow.github.io/factvis/)
- Begun porting canary to standalone commands to run on tool labs
You can follow this progress in more detail at https://github.com/contentmine/wikifactmine-api
Wikimedia in residence: Good progress made on arranging details like: desking, insurance, internet access etc..
We attended the WMF developers summit. We started work on the Karoo tool which is a port of canary to a command line application so that it is suitable for running on WMF tool labs. Code was written to implement:
- Correctly mapping and clearing the Elasticsearch indices for papers and facts
- Loading papers from the disk to the Elasticsearch paper index
We also corrected a bug in Canary that resulted in fewer facts than expected because we weren't correctly using the Elasticsearch scroll api.
The news this month is that we have just advertised the WiR position! Please look at the ContentMine website ([http://contentmine.org/jobs ) for further details and how to apply. .
Applications closed at the end of this month and we're now preparing for interviews.
We had a great deal of success in advertising for a WiR. We had a large number of very well qualified good candidates. We've conducted two round of online interviews and have now made an offer to the most suitable candidate. We're just organising the administrative details before work can start. It's a very exciting time indeed!
Development work has also made good progress after stalling for a little bit due to some problems getting everything running on Labs.
We had problems with using the Elasticsearch cluster that is available for tool-labs. Unfortunately is seems likely that the repeated queries we were making to extract facts made the cluster respond occasionally by falling over. You can read about this on Phabricator .
In an attempt to debug this an Elasticsearch node was set up under the wikifactmine labs project. Although this took quite a while it turns out that using just a single small node on our own labs project didn't seem to fall over during fact extraction.
Thankfully this means that the pipeline is currently working and you can get facts that we extracted today (from all open access papers published on EuropePMC on the day exactly 30 days ago).
The main advances in development work this month are the ability to search for facts by related Wikidata item. Again this can be seen on the APIs swagger interface . the next steps to be follows are to use this to create a gadget which will display a selection of facts as you browser different items (for which we have facts) on Wikidata.
We also now have a dashboard where you can see the number of facts extracted each day .
A userscript for Wikidata has been developed to showcase facts that we have for each item. It is backed by the wikifactmine api. You can see both the code and instructions to install it in your common.js on github.
We also attended the WMF hackathon and Wikicite in Vienna. An important proof of concept was put together there with help from User:Tobias1984. Fatameh is an OAuth tool that can be used to create Wikidata items about academic papers that have PMID (and associated items such as authors where appropriate) on demand in a very friction-less fashion. WikiFactMine may well hope to use this tool to create items onto which we can then offer statements for addition using the primary sources tool. The code is available on phabricator.
5 Blog posts written:
- 2 May: Blobs and Trees
- 9 May: Distances Between Drugs
- 16 May: Into The Unknown
- 23 May: Metadata Merge
- 30 May: Learned Machines
Arranged a Cambridge Wikimedian's Meetup on meta at Meetup/Cambridge/34.
Continued work on Fatameh, bug fixing. Enabling it usage for PMCIDs as well. Enabled command line usage of it through API keys to it can be used in a highly automated way. It made (or attempted to make >50k Wikidata items) by ca. end of June. On wikidata landing page is: here
4 Blog posts written:
- Extract Transform Load
- Turning 14 and Encountering an Old Acquaintance
- Can You Recall Precisely?
- Reliability Ecology
Ran Cambridge Meetup 34
Delivered issue 1 of FactoPost by mass delivery to ca. 50 stakeholders on en-wiki
Also had meeting with Marti Johnson from WMF. The slides are here.
Most development work in this month focused on producing aaraa. This is a tool to help Wikimedians create their own WikiFactMine dictionaries. More documentation on this can be found by following the link. The tool can be found here. Work was also put into supporting the fatameh tool which saw increased usage.
2 Blog Posts Written:
The July issue of FactoPost
Much work was then focused on supporting the development of aaraa as well as using to to produce a wide range of dictionaries from SPARQL queries. These queries can be seen here. The dictionaries made from these queries can be seen on github
August saw us send two people to Wikimania in Montreal as well as for the hackathon before hand. We ran a workshop at the hackathon and gave a talk at the main conference. We also ran a stand for the majority of the conference giving Wikimedians the chance to find out about the project and to have a go at making a dictionary for a topic that interests them.
We also went to the IFLA World Library and Information Congress to present the project to librarians.
Time that was not spent travelling was used for bug fixing in WFM api/pipeline, aaraa, fatameh
Lots of documentation was written under our Wikidata 'Landing page' as well as the creation of more dictionaries.
The August issue of Facto Post was written and can be read here