Grants:Project/ContentMine/ScienceSource/Midpoint

From Meta, a Wikimedia project coordination wiki


Report under review
This Project Grant midpoint report has been submitted by the grantee, and is currently being reviewed by WMF staff. You may add comments, responses, or questions to this report's discussion page.



Welcome to this project's midpoint report! This report shares progress and learning from the grantee's first 3 months.

Summary[edit]

During the first 6 months of this project, we have been able to achieve the following milestones of our project plan:

1) To successfully develop and test our first pilot.
2) To build a community of champions who will be testing and providing feedback helping us to improve our platform, tools, algorithms, dictionaries and corpus.
3) Putting in practice all the 19 activities related to our Communication, Dissemination and Engagement plan, reaching the "achieved" status in some of them already.
4) Overcome technical challenges associated to the technology, and agreed on the next steps to fulfil our pipeline.
5) Disseminate the project on conferences, meetups, talks and workshops reaching a wide audience of Wikimedians and groups of interest.
6) Expanding the boundaries of our project to other communities (Global South and other wikiprojects)
7) To successfully manage, control, document and plan all the steps of our project allowing us to be ready for the second half of this project.
8) Applying the lessons learned from previous experiences and the feedback received from the community
9) Properly plan the second half of this project and the sustainability of ScienceSource once the project ends.

All those reasons make us very excited about the progress we have made so far and the work we need to do in the second half of ScienceSource.

Methods and activities[edit]

Technical

  • In the first half of the project, a major focus was on creating the wiki platform and software tools and bots to support our model of annotation of scientific papers. The data schema relating annotation to Wikibase format is innovative, and custom software was designed to extract full information from papers in XML form. It needs two stylesheets, one to render the XML version into the parsoid HTML (the type used on Wikipedia) posted to the ScienceSource wiki, and one that produces a plain form of text suitable for text mining. This work has been completed for a pilot of 100 downloaded papers.
  • All this work built on the experience of the preceding WikiFactMine project, which used one stylesheet and did not render XML in a human-readable form. The extracted data now includes precise information about where the search terms found is in the text. This approach, combined with storage in Wikibase, allows SPARQL queries to retrieve fine-grained information. The use of ContentMine dictionaries compiled from Wikidata to organise sets of search terms, as in WikiFactMine, means that such searches can include sophisticated side-conditions set by users.
  • The approach, therefore, makes the advanced search of scientific literature accessible to those who have learned some basic functionality of Wikidata queries. The project has applied for federation with query.wikipedia.org. Another main thrust was data work, which underlies the drive to make the MEDRS guideline on Wikipedia algorithmic. There is useful data that can be scraped from major websites, but the correct approaches are not necessarily the obvious ones, and some trial-and-error has been needed.

Community

As part of our ScienceSource project, we developed a Communication, Dissemination and Engagement plan. The goal is to create an active community of editors and ensure the sustainability of this project once finished. We divided our first half of the project into three main activities: Online and social media presence, In-person presentations and External media and communications.

Among the activities we have been performing during this period are:

Community engagement (from monthly report)[edit]

Area Specific goals Metrics June-18 July-18 August-18 September-18 October-18 November-18
Community engagement Meetups (Cambridge) Organise and deliver a three-monthly meetup, at or near ContentMine offices. (10-20 people per session) Meetups calendar planning 2018-2019 Meetups calendar planning 2018-2019 Confirmed date, 29th September. Preparation of meet up. Meetup and workshop 29 September, 15 attended, training session report for the first meetup and workshop Meetup arranged for Dec-18, March-19 and May-19
Community engagement Workshops Organise and deliver four workshops for users and developers, e.g. at Wikimania 2018, Wikicite, Mozfest and FORCE11, follow-up on results (20-50 people per session) Planning Wikimania hackathon workshops (50 people) and to engage with potential editors. Workshop delivered at Wikimania, (+80 people) Confirmed date, +20 people invited for October Preparation of SS conference and workshop Report conference and workshop 50% workshops achieved. Other 2 workshops already arranged for Jan19and March 19
Community engagement Conference presentation Deliver presentation during Wikimania 2018 Wikimania talk not accepted (before grant finalised), but opportunities at the hackathon; stall in the community village. (See below for community village numbers.) Fair data presentation +50 people Vilnius hackathon (+30 people) Gosh presentation (+100 people), Shuttleworth gathering (+50 people), Mozfest presentation (+50 people). EBI presentation (+40 people) Conference presentation achieved. Other 4-6 presentation arranged to disseminate the outputs of the project
Community engagement Newsletter Deliver a monthly newsletter, reaching Wikimedians, volunteers and community members (100 people on newsletter list). June newsletter to 120 people, delivery taken of a video on annotations (>1000 views expected). Newsletter posted at Wikimania, to 137 pages. Delivery of a video on the focus list. Newsletter, with video, delivered to 136 pages Newsletter delivered to 141 pages Newsletter delivered to 147 pages Monthly Newsletter to (+100 people) achieved and on going
Community engagement Project webpage and social media Ensuring development work and results are communicated through ContentMine’s own site, wiki page and social media to interested communities at each step in the process. Twitter analysis (>800 views first video) + implementation of a ScienceSource section on ContentMine's own website. Focus list launch on Twitter 0.75K impressions; focus list video on Twitter 16 July, 2K impressions CM webpage ScienceSource project added. Tweet impressions 14.9K ScienceSource Page competed and information up to date. Tweet impressions 26.6K ScienceSource web information updated. Wikimedia Uk Video about ScienceSource (+500 views). Tweet impressions 38k achieved and ongoing activities in our social media network
Communications plan ScienceSource editors 25 active people in the platform, with five selected "project champions". Champion engagement plan and community village engagement Preparation of engaging material for project champions Preparation of engaging material for medical schools in LATAM engaging the first team of 15 champions engaging the second team of 10 champions Community of 25 active project champions engaged waiting for working on SS
Communications plan Wikipedia community 200 people. Convert some awareness to serious interest, by personal demonstration and presentations. Creation of MEDRS page on the English Wikipedia. (At Med Day lightning talks at Wikimania, did not present.) Mentoring page creation and update Preparation of ScienceSource article Wikimedia article presentation Wikidata meetup
Communications plan Broader Wikimedia community 1000 people. Inform biomedical audiences on ScienceSource scope and workflows Community awareness plan preparation Community village at Wikimania, serious contacts about 40 engagement with WMUK WMUK ScienceSource interview and video Contacting other wiki projects (i.e India) Contacting Medical wiki community members
Communications plan General community Wider public interested in ScienceSource, 150 people. August presentation in London planned. This would be a wider community event in London for ~100 people, with a scholarly communications slant. Mentoring page created on Wikiversity, 10 July. Mailchimp campaign, +300 on a monthly basis Contacting the first batch of 10 medical schools LATAM Contacting the second batch of 10 medical Schools LATAM Arranging first set of interviews with wide coverage media
Communications plan Non-Wikimedia conferences Attendees at conferences, 500 people. Wikidata (Berlin, organised by the European Research Council), Bioscience conference (Lisbon), Sci-Foo (San Francisco). Total people reached > 200. Open Plant Forum (Norwich), poster preparation Open Plant Forum (Norwich), about 100 attended (31 July/1 August Attendance to GOSH 2018. +40 people reached Presentation at Mozfest. +50 people reached Bioinformatics hackathon, Synbio forum, University of Edinburg presentation
Communications plan ContentMine/Wikimedia community Attendees at ContentMine, Wikidata and WMUK events, 600 people Showcase prepared by and with Wikimedia Foundation. Video clip now in commons:Category:ContentMine videos. ScienceSource Newsletter Mailchimp to +400 people within ContentMine community Preparation of ContentMine SS conference, SS newsletter MailChimp ContentMine community CM conference and workshop. +16 people Wikicite attendance Open Science Day presentation Planning second-half project Community activities
Communications plan Social media communities ContentMine community, 10,000 people Twitter 4800 impressions this month. Twitter impressions related to Wikimania Cape Town 3.3K Twitter impressions logged 6.1k Twitter impressions 26.6K Twitter impressions 38k Twitter impressions 19K
Communications plan Outside world 100,000 people. Press campaign and aim to place an op-ed in a major medical journal (target is the BMJ), on the state of the open access medical review literature. Preparing press campaign plan Contacting the first list of journals for interviews Contacting the second list of journals for interviews Contacing Local press and related magazines for interviews First interview confirmed Jan-19 Planning second-half project communication plan

Midpoint outcomes[edit]

Our main technical and community outputs for the first half of scienceSource project include:

  • Pilot run completed on the wiki with the first set of 100 papers
  • Text-mining tool posted to GitHub
  • Engagement of 25 Editors who will be testing SS pipeline
  • Number of content pages created or improved across Wikimedia projects achieved
  • Two workshops run (Wikimania hackathon in July, Wikidata 6th birthday event in October), and one more booked (Cambridge Science Festival in March)
  • One meetup run (June), another scheduled for December, January and May
  • Set of 4 introductory videos posted, on Wikiversity
  • Wikipedia Signpost article in October, 1.3K pageviews
  • Monthly newsletters on Wikipedia, growing circulation list now over 140
  • Attendance at Wikimania and WikiCite 2018 conferences
  • Attendance to non-Wikimedia conferences such as (bioscience Lisbon, Sci-foo, Open plant forum, Gosh, Mozfest among others)
  • Twitter campaigns, reaching over 50k
  • Mail list campaign, reaching over 2k
  • Account creation on ScienceSource wiki
  • Communicate our work and results through ContentMine’s own site
  • Annotations posted to the wiki, and first facts extracted and posted to Wikidata

Finances[edit]

Expected changes to the budget we anticipate for the second half of your project include:

  • Communication, Dissemination and Engagement efforts are expected to be higher than planned. The main reason: Due to the long list of tasks we are developing as part of our communication plan in order to build the expected community of users to ensure sustainability once the project is completed.
  • Travel expenditure is expected to be lower than planned. The main reason: We reduced the number of people attending overseas conferences to optimise the budget allocation. Also, we received financial support to attend conferences.
  • Non charged costs: As part of the project, ContentMine team is providing "in-kind" time contribution (technical and management). This allows us to keep the project under budget. So far we have contributed >$10k, and we are expecting to contribute the same amount for the second half of this project.

Learning[edit]

What is working well[edit]

  • Team spirit has been great. We managed to assemble a diverse team of Wikimedians, Software developers and community around this project.
  • Expanding the boundaries of this project to non-Wikimedia organizations and groups of interest (i.e medical schools)
  • Communication, Dissemination and Engagement material preparation. Videos, newsletters, workshop materials have been of great help to communicate our project so far.
  • Workshops and meetups, we have delivered continuous sessions during this project involving a variety of people around this project. By the end of the project, we expect to reach >100 people who attended one of our events.
  • Participation on conferences, during the first 6 months of this project we managed to participate on >10 conferences (related to the topic and more general) allowing us to test our ideas, increase our network of collaborators and reaching a wider audience (inside and outside Wikimedia community).
  • Software and project management methodologies, the use of APMP and agile methodologies have helped us to efficiently work through the project stages including, requirements, analysis, technical development, communication, risks and budget management.

What are the challenges[edit]

  • Readiness and documentation issues with Wikimedia software (Wikibase containerised by Docker, MediaWiki API). For non-Wikimedian developers coming into this area, conceptual foundations must be laid, and code discussed in the broader context.
  • Delivering an ambitious Communication, dissemination and engagement plan within time and budget. Especially defining and reaching the audience outside Wikimedia that may be interested in participate.

Next steps and opportunities[edit]

Work Package Dec-18 Jan-19 Feb-19 March-19 April-19 may-19
WP1 Project Management 6-Month Tasks planning and KPI's Full pipeline test management Comms plan coordination and monitoring Comms plan coordination and monitoring Comms plan coordination and monitoring Comms plan and close the project
WP2 Dictionaries Review and Create dictionaries Create dictionaries NA Dictionaries review and update based on users feedback Update dictionaries as needed based on users feedback NA
WP3 Corpus Define rest of corpus Upload corpus into SSwiki (ready to use format) Update corpus as needed based on users feedback update corpus as needed update corpus as needed update corpus as needed
WP4 Software Development Finalise corpus Technology, MEDRS algorithm Test full pipeline, UI update MEDRS algorithm improvement as needed UI update as needed Technical closeout report & Github final upload Na
WP5 Communication, Dissemination and engagement plan 6-month Task planning and KPI's Formal and periodic communication with team of champions Developing community engagement KPI's (Meetups, workshops, conference presentations, newsletter, webpage and social media)
developing comms plan activities (SS editors >25, 1000 people reach on Wikimedia community, 150 Med people reached outside Wikimedia med community, +500 people reached through conference attendance and presentations, +10K social media engagements, +100k general med public engagement through publications and interviews)
Developing community engagement KPI's (Meetups, workshops, conference presentations, newsletter, webpage and social media)
developing comms plan activities (SS editors >25, 1000 people reach on Wikimedia community, 150 Med people reached outside Wikimedia med community, +500 people reached through conference attendance and presentations, +10K social media engagements, +100k general med public engagement through publications and interviews)
Developing community engagement KPI's (Meetups, workshops, conference presentations, newsletter, webpage and social media)
developing comms plan activities (SS editors >25, 1000 people reach on Wikimedia community, 150 Med people reached outside Wikimedia med community, +500 people reached through conference attendance and presentations, +10K social media engagements, +100k general med public engagement through publications and interviews)
Comms plan close out report and sustainability action plan

Grantee reflection[edit]

We are very pleased with the current progress we have made on this project, both technically and on the community building. The time for planning and documenting our activities has been nothing but beneficial for us. Currently, we are already achieving some of the 19 project KPI's we established for this project. Our technical pipeline is clear, agreed and achievable. We managed to run the first pilot (100 papers) weeks ago, and now we are turning all our efforts on bringing our champion editors on board. As usual, we enjoyed going to WikiCite and Wikimania and being able to feel part of a world community.

We continue promoting Wikidata in all our projects and conferences. The dictionary structure we developed in previous projects has been beneficial for ScienceSource and we aim to provide technical maintenance once the project is finished. We have applied for another project grant, DiversiTech, to create a portal to access a community-curated multilingual corpus of high-quality articles on LGBT issues that encourage existing and new Wikipedia editors to create and improve articles by offering rapid referencing suggestions and prompts for related content starting from their Wikipedia articles and reference sources of interest.