Wikimedia Foundation Annual Plan/2017-2018/Draft/Programs/Technology

From Meta, a Wikimedia project coordination wiki

About Technology[edit]

The Technology department supports global access to the Wikimedia projects that is reliable, fast, and secure. This team supports performance, availability, development infrastructure, technical operations, security, architecture, release management, analytics engineering and Research. One of the larger teams, they support most of the core operations to make sure our projects, services and development pathways are available to as many people as possible, on as many devices as possible, in a manner that safeguards users’ privacy and trust.

We support progress by providing tooling and infrastructure that make it possible for product developers to enhance and augment the capabilities of our software. We create pathways for creative and motivated individuals to translate their ideas into working software that is reliable, easy-to-use, secure, and scalable.

We work closely with the Product team to support ongoing initiatives and cover development dependencies. We provide counsel by assisting product teams and other units within the organization and the movement to make good choices on technology by assessing costs, analyzing usability, anticipating failures, evaluating privacy, reviewing security, projecting impact, and suggesting alternatives.

We also work with the Product audiences providing research before new products or features are built. We conduct, enable, and review research to validate and iterate on concepts, to ensure usability and that products are built around users’ needs. We conduct research and explore services so that we can surface the best ways to end barriers to contribution. We use formal collaborations with industry and academia to scale efforts of the organization.

Our new initiatives for 2017-18 include a renewed emphasis and focus on the MediaWiki platform, launching the Wikimedia Cloud Services Team and expanding the machine learning capability of ORES to continue helping our editors with creating high quality content faster and more easily. Also for this year, all of Technology's work is programmatic. Highlights of our programs include a concerted effort to reduce technical debt and strengthening the technical community inside and outside the Foundation. A key initiative in our Research team will be an effort to increase content in multiple languages using recommendation technology to help our editors prioritize their work.

Program 1: Availability, performance, and maintenance[edit]

Team: TechOps, Cloud Services, Performance, Analytics, Release Engineering, Services

Strategic priorities: This applies strongly to all strategic priorities: Reach, Community, Knowledge. This is the baseline work needed so all wikimedia sites keep running reliably for editors and readers all over the world 24/7. In the absence of this work, no other program at the Foundation (or community) can be executed.

Time frame: Perpetual

Summary[edit]

The Wikimedia Foundation operates one of the world’s most popular web site properties, and it continues to expand with deployments of additional features and services as part of its programmatic work. These resources need to be maintained with high levels of availability, reliability, security, and performance.

Goal[edit]

We will maintain the availability of Wikimedia’s sites and services for our global audiences and ensure they’re running reliably, securely, and with high performance. We will do this while modernizing our infrastructure and improving current levels of service when it comes to testing, deployments, and maintenance of software and hardware.

Outcomes, Objectives, and Milestones[edit]

Outcome 1: All production sites and services maintain current levels of availability or better.

  • Objective 1: Deploy, update, configure, and maintain production services (Traffic infrastructure, databases & storage, MediaWiki application servers, (micro)services, network, and miscellaneous sites & services)
  • Objective 2: Assist in the architectural design of new services and making them operate at scale
  • Objective 3: Maintain data center infrastructure and equipment lifecycle from procurement through break-fix to decommissioning
  • Objective 4: Incident response, diagnosis, and follow-up on system outages or alerts across our stack

Outcome 2: All our users consistently experience systems that perform well.

  • Objective 1: Maintain a comprehensive toolset to measure the performance of our platforms
  • Objective 2: Catch and address performance regressions in a timely fashion through automation
  • Objective 3: Modernize our performance toolset. We will measure performance metrics that are closer to what users experience.

Outcome 3: We have scalable, reliable and secure systems for data transport.

  • Objective 1: Consolidation of Kafka infrastructure to tier-1 requirements, including TLS encryption
  • Objective 2: Maintenance and expansion of current Hadoop cluster to support new use cases that require more computational resources
  • Objective 3: Software, hardware upgrades, and maintenance on analytics stack to maintain current level of service

Outcome 4: Wikimedia Cloud Services users can leverage a reliable and public Infrastructure as a Service (IaaS) product ecosystem for VPS hosting.

  • Objective 1: Maintain existing OpenStack infrastructure and services
  • Objective 2: Pay down technical debt and allow upgrading of the core OpenStack platform to modern, supported releases by replacing the current network topology layer with OpenStack Neutron, which has become the standard for most OpenStack deployments.
  • Objective 3: Increase availability of compute resources for the IaaS product by expanding deployment of physical resources beyond the current single broadcast domain

Outcome 5: We have effective and easy-to-use testing infrastructure and tooling for developers.

  • Objective 1: Maintain existing shared Continuous Integration infrastructure
  • Milestone 1: Develop and migrate to a JavaScript-based browser testing stack

Outcome 6: Engineering teams can effectively plan, track, and complete their work.

  • Milestone 1: Maintain and improve existing shared code-review platform (Gerrit)
  • Milestone 2: Maintain and improve existing shared project management platform (Phabricator)

Program 2: MediaWiki[edit]

Team: MediaWiki, TechOps

Strategic priorities: Knowledge, Reach & Community

Time frame: 24 months

Summary[edit]

This program represents some of the main activities of the new MediaWiki team.

Goal[edit]

We will strive for a refreshed, performant core platform by bringing renewed focus on MediaWiki.

Outcomes, Objectives, and Milestones[edit]

Outcome 1: Stakeholders in MediaWiki development will have sense of progress and direction in MediaWiki.

  • Objective 1: Develop a MediaWiki roadmap
  • Milestone 1: Hire a product manager for MediaWiki by 2017-08-31

Outcome 2: MediaWiki code quality will be improved.

  • Objective 1: Increase measured unit test coverage
  • Objective 2: Break up large classes and source files

Outcome 3: MediaWiki security and stability will be improved.

  • Objective 1: Address the backlog of action items that arise from security and downtime post-mortems

Program 3: Addressing technical debt[edit]

Team: Release Engineering, Team Practices Group

Strategic priorities: Knowledge, Reach, Community

Time frame: This program is intended to create an ongoing process.

Summary[edit]

Over the last decade and a half the Wikimedia Foundation has accrued what is termed “technical debt”; historical choices and technical limitations that limit the velocity of development. The primary goal of this program is to development and implement practices that help the entire organization to identify and prioritize the resolution of technical debt properly. This program will have a positive multiplying effect on the speed and quality of all other programs the Foundation implements.

Goal[edit]

Wikimedia developers are able to create and release new features that integrate cleanly with the rest of the technical stack in a reasonable amount of time.

Outcomes, objectives, and milestones[edit]

Outcome 1: The amount of orphaned code that is running Wikimedia “production” services is reduced.

  • Objective 1: Define a set of code stewardship levels (from high to low expectations)
  • Objective 2: Identify and find stewards for high-priority/high use code segment orphans
  • Objective 3: Define and steward a light-weight process for adopting or orphaning/sunsetting products and infrastructure.

Outcome 2: Organizational technical debt is reduced.

  • Objective 1: Define a “Technical Debt Project Manager” role that regularly communicates with all Foundation engineering teams regarding their technical debt
  • Objective 2: Define and implement a process to regularly address technical debt across the Foundation
  • Objective 3: Promote and surface important technical debt topics at large gatherings of Wikimedia developers (e.g., DevSummit and Hackathon(s))

Program 4: Technical community building[edit]

Team: Cloud Services, Research and Data, Design Research, Scoring Platform (ORES), MediaWiki, Community Engagement, Resources

Strategic Priorities: Communities, Reach

Time Frame: 12 months

Summary[edit]

Wikimedia's software products and platforms have a diverse collection of technical communities including code contributors, documentation contributors, bug reporters, API consumers, volunteers who build innovative solutions to on-wiki workflow issues, researchers who examine the data generated by the Wikimedia projects, value-added vendors who provide services and support based on Wikimedia free and open-source software products, and true 'third parties' who install and use FLOSS software produced by the Wikimedia movement on their own computers for various reasons. These audiences contribute directly and indirectly to the broadest goal of the movement: to collect and disseminate knowledge. However, they have not always been well recognized for these contributions and supported in their work. The technical community support project will attempt to begin to address this shortcoming by providing better documentation, facilitating community building, and establishing better pathways for communication between these communities and the Foundation.

Goal[edit]

We will expand and strengthen our technical communities, focusing on understanding their needs and measuring the progress and outcome of our efforts. In particular, we will focus on three traditionally underserved communities: tool and bot developers; API and data consumers; and third-party users of our software.

Outcomes, Objectives, and Milestones[edit]

Outcome 1: Becoming a technical contributor to the Wikimedia movement by creating and maintaining 'tools' (bots, webservices, etc) and other innovative solutions is easier than it has been historically because documentation is easier to find, more comprehensive, and descriptive of start to finish steps needed to solve common problems. Cloud Services product users feel comfortable sharing their knowledge with others as part of a community with a culture of sharing via documentation and mutual support.

  • Objective 1: Collaborate with community to find volunteers willing to form a documentation Special Interest Group to update documentation of existing Cloud Services products
  • Objective 2: Create tutorial content for common issues including but not limited to: creating initial account, deploying a functional web service, deploying a functional bot, and running periodic jobs with variations. Where applicable, produce variants for more than one implementation language (e.g. PHP, Python, etc).
    • Milestone 1: Hire a technical writing contractor
    • Milestone 2: Users are able to find documentation they need. Agree/Disagree answer ratio for "Documentation is easy to find" annual developer survey question improves compared to prior surveys.

Outcome 2: The adoption of Wikimedia technology can be reliably measured

  • Objective 1: Design a set of formal KPIs (key performance indicators) to measure the growth and diversity of our technology audience

Outcome 3: Value-added vendors who provide services and support based on Wikimedia software and true 'third parties' who install and use software produced by the Wikimedia movement on their own computers are more confident in recommending, deploying, and extending Wikimedia FLOSS projects.

  • Objective 1: Establish canonical point of contact for third-parties by promoting the existence of a dedicated technical liaison for software projects with support for third-party users
  • Objective 2: Clarify the Foundation’s short- and long-term commitments to third-party users. Create, publish, and promote a multi-tiered, third-party support level system for Wikimedia software projects. Document the support level of existing FLOSS projects and ensure that the documented levels of support are delivered.

Outcome 4: The collaboration with research in industry and academics is further scaled and supported, so that more findings and datasets are published and disseminated under an open license. This helps us solve strategically important questions.

  • Objective 1: Organize and host the annual Wiki Research Workshop to help align the interests of the academic community to issues of strategic importance for the movement. Continue to successfully run a research workshop at a major conference, as we have for the past 3 years.
  • Objective 2: Maintain the current capacity for formal research collaborations with industry and academia to reduce the overall cost for the organization to conduct research projects. As of March 2017, the Wikimedia Research departments works with 30 collaborators under the terms of our Open Access policy.

Outcome 5: Organize Wikimedia Developer Summit as a three day meeting of ~50 senior technical contributors focusing on one strategic theme announced before the call for participation and scholarship requests start.

  • Objective 1: Developer Summit web page published four months before the event includes dates and location (at least nearest airport), main theme, call for participation, call for scholarship requests, and calendar with deadlines. A good representation of non-WMF stakeholders related to the main theme are invited and participate at the event (preferred) or online.
  • Objective 2: A process allows prospect participants to submit statements and proposals about the main theme, and allows the Program Committee to review them and notify their decisions. Discussions start before the event with the involvement of all the relevant stakeholders, in order to identify the points that need to be addressed at the event.
  • Objective 3: Activities during the Summit are well documented, especially outcomes and actions, which will be compiled in a systematic way for better evaluation and followup.

Program 5. Scoring Platform (ORES)[edit]

Team: Scoring Platform (ORES), Research, Operations, Services

Strategic Priorities:

  • Knowledge: We’re working with communities to use machine prediction to increase the quality and coverage of content in our projects.
  • Reach: We’re working with Community Engagement to target support for emerging communities.
  • Communities: ORES directly supports the developer community and editors indirectly by helping bring easy and efficiency to burdensome curation processes.

Time Frame: 6-9 years. By then, we’ll have built out the missing components of the platforms and published best practices documents that will enable others to follow in our footsteps.

Summary[edit]

Artificial Intelligence (AI) has great potential to help our projects scale by reducing the work that our editors need to do and enhancing the value of our content to readers. However, AIs also have the potential to perpetuate biases and silence voices in novel and insidious ways. ORES , is a high-capacity, machine learning prediction service that is already heavily adopted within and outside the Wikimedia Foundation. By expanding the service to support new wiki processes and implementing auditing tools, we will help identify and mitigate the effects of prediction bias.

Goal[edit]

In the next fiscal year, we’ll create a dedicated team to further develop ORES and related technologies to balance efficiency and accuracy with transparency, ethics, and fairness.

Outcomes, Objectives, and Milestones[edit]

Outcome 1: Tool developers and Product teams can innovate tools that use machine prediction to make wiki-work more efficient.

  • Objective 1: Expand vandalism & good-faith detection models to more wikis (focus on Emerging Communities)
  • Objective 2: Work with CE and a community liaison to develop and implement better processes for supporting new wikis with a focus on emerging communities
  • Objective 3: Improve documentation around ORES service, related tools, and contribution processes
  • Objective 4: Work with a professional tech writer to increase coherence around AI system documentation

Outcome 2: Volunteers are empowered to track trends in prediction bias and other failures of AI in the wiki.

  • Objective 1: Develop best practices for using community input to improve/correct predictions
  • Objective 2: Use re-judgement from humans to experiment with (1) retraining models, (2) reporting changes in model fitness to users, and (3) learning how users come to understand prediction models. Publish relevant reports and process documentation.

Program 6. Streamlined service delivery[edit]

Team: Technical Operations, Release Engineering, Services

Strategic Priorities: Community

Time Frame: This program will take longer than 12 months. The objectives below represent the work expected to complete within FY17-18.

Summary[edit]

We will streamline and integrate the delivery of services, by building a new production platform for integrated development, testing, deployment and hosting of applications.

Goal[edit]

We will build a new production platform for integrated development, testing, deployment, and hosting of applications. This will greatly reduce the complexity and speed of delivering a service and maintaining it throughout its lifecycle, with fewer dependencies between teams and greater automation and integration. The platform will offer more flexibility through support for automatic high-availability and scaling, abstraction from hardware, and a streamlined path from development through testing to deployment. Services will be isolated from each other for increased reliability and security.

Wikimedia developers, as well as third-party users, benefit from the ability to easily replicate the stack for development or their own use cases.

This work also represents an investment in the future; although this will not yet significantly materialize within FY17-18, this project will eventually result in significant cost savings on both capital expenditure (through consolidation of hardware capacity) and staff time (by streamlining development, testing, deployment and maintenance).

Outcomes, Objectives, and Milestones[edit]

Outcome 1: We have seamless productization and operation of (micro)services.

  • Objective 1: Set up production-ready Kubernetes cluster(s) with adequate capacity
  • Objective 2: Create a standardized application environment for running applications in Kubernetes

Outcome 2: Developers are able to develop and test their applications through a unified pipeline towards production deployment.

  • Objective 1: Create guidelines and abstractions for building and testing applications in containers
  • Objective 2: Set up a continuous integration and deployment pipeline to publish new versions of an application to production via testing and staging environments that reliably reproduce production
  • Objective 3: Provide a lightweight integrated development environment that lets developers test their code against a local miniature copy of the production stack

Program 7. Smart tools for better data[edit]

Team: Analytics, Cloud Services, Ops, Services and Research and Data

Strategic Priorities: Provide some of the tools and data to be able to measure progress on strategic priorities. Also address specific data needs of our community such as updating http://stats.wikimedia.org(the</> community’s main source of metrics for Wikimedia projects) and revamping infrastructure on cloud cloud services environment for better data access.

Time Frame: 12 months

Summary[edit]

Our data is not as discoverable and accessible as it should be for both for the Foundation and our communities. This is most notable for data in the edit ecosystem. This program aims to make data of higher quality and to improve data access; the more accessible that data is, the more impact it can have. Most of the focus of this program is on infrastructure and tools for better public data access; however, we also include some improvements to private datasets.

Goal[edit]

Make Wikimedia data easily available for both the Foundation and the different Wiki communities by providing better tools, infrastructure, and access to data for editors, communities, and Foundation staff.

Outcomes and Objectives[edit]

Outcome 1: Foundation staff and community have better tools to access data.

  • Objective 1: Wikistats 2.0 redesign. Wikistats is the de-facto source of statistics for the wikimedia projects for community; this includes developing a basic and Advanced Frontend and an API powered backend.
  • Objective 2: Better visual access to EventLogging data
  • Objective 3: Experiments with real-time data and community support for new datasets available
  • Objective 4: Invest in Jupyter Notebook setup for hadoop, Data Lake, and other data sources

Outcome 2: Foundation staff and community have access to Wikimedia content and data with scalable APIs.

  • Objective 1: Develop a scalable and cost-effective storage solution backing an API exposing the full wiki edit history as structured data
  • Objective 2: Expand the REST API to cover high-volume content access needs

Outcome 3: Wikimedia Cloud Services users have easy access to public data..

  • Objective 1: Provide reliable and available access to Wikimedia database dumps by upgrading the hardware used and consolidating access by internal teams, Cloud Services users, external mirrors, and HTTPS downloaders to the new canonical location.
  • Objective 2: Complete migration of production database replica access for Cloud services customers to the new high-availability cluster, which uses 'row based' replication technology to provide a more consistent view of production data.
  • Objective 3: Collaborate with Wikimedia Cloud Services customers to publish new applicable data sets
  • Objective 4: Provision a cluster for public Data Lake access in labs that can be used as a Quarry backend. In this iteration the Data Lake will include historical data about editing (revisions, pages, users) for all Wikimedia projects since the beginning. Data is optimized to be queried in an analytics-friendly way that allows for simple and fast queries.
  • Objective 5: Deploy visual exploration tool for Data Lake for labs community

Outcome 4: Users see improvements on data computing and data quality.

  • Objective 1: Vetting and release of new metrics that measure content consumption
  • Objective 2: More efficient Bot filtering on pageview data
  • Objective 3: Build prototype for mediawiki content processing. For example: ingest and process text on every wikipedia page to use later for analytics-style computations.
  • Objective 4: (carry over from last year) Experiment with real-time processing of pageview data to avoid costly batching computations.

Outcome 5: Foundation staff and Wikimedia communities have an objective measure to talk about impact of Foundation products and projects.

  • Objective 1: Pilot study on 1 wiki to measure the community backlog of work
  • Objective 2: Implement system to measure community backlog in wikis that wish to have it

Program 8. Multi-datacenter support[edit]

Team: TechOps, MediaWiki, Services, Performance

Strategic Priorities: Reach, Communities

Time Frame: 12 months

Summary[edit]

Although Wikimedia currently operates two data centers each independently capable of serving our core sites and services, many of our services – including our most important core platform component (MediaWiki) – are only active in a single data center at any point in time, with the other data center being on standby. Switching between the two data centers is currently a very involved manual process with significant impact to the availability of our services for our users and substantial risk of failure. By extending existing services (and MediaWiki in particular) with support for serving requests from multiple data centers concurrently, this impact can be minimized and currently unused performance benefits can be leveraged.

Goal[edit]

We will improve availability and performance for our users, while also minimizing the impact from fail-over testing and catastrophes. We will do this by expanding our multi-data center capabilities to serve requests from multiple data centers simultaneously.

Outcomes and Objectives[edit]

Outcome 1: Our audiences enjoy improved MediaWiki and REST API availability and reduced wiki read-only impact from data center fail-overs.

  • Objective 1: MediaWiki support for having read-only “read” requests (GET/HEAD) be routed to other data centers
  • Objective 2: Test an active/active deployment for read-only requests of the MediaWiki application platform and REST APIs
  • Objective 3: Integrate MediaWiki with dynamic configuration or service discovery, in order to reduce the time required for a master switch from one datacenter to another

Outcome 2: Backend infrastructure works reliably across data centers.

  • Objective 1: Set up a robust multi-data center event & job processing infrastructure, and migrate all job queue use cases
  • Objective 2: Full support for serving REST API requests from both core data centers simultaneously

Program 9. Growing Wikipedia across languages via recommendations[edit]

Team: Research, Editing, Reading, Services, Security

Strategic Priorities: Knowledge, Reach, Community

Time Frame: 12 months. Some initiatives are a continuation of work started in FY 2016-2017 and may continue beyond the next fiscal year.

Summary[edit]

There are significant gaps of knowledge in Wikipedia today, both in terms of the articles available in different languages as well as the depth of content available in existing articles. Recommendation systems that can help editors identify prioritized missing content across Wikipedia editions and contribute towards closing the gaps are key for accelerating the article creation rate.

Goal[edit]

Use machine learning to build recommendation algorithms that can help editors identify what to edit, in order to close the content gaps on Wikipedia and other Wikimedia projects

Outcomes and Objectives[edit]

Outcome 1: Interested editors will be able to use recommendation services that will allow them to have relevant information about the articles they want to edit immediately at their repository. Editathon organizers benefit from automatically generated templates and recommendations that can help them in onboarding new or less experienced editors.

  • Objective 1: Build, improve, and expand algorithms that can provide more detailed recommendations to editors about how an article could be expanded. This step will require running natural or controlled experiments and will involve recommendations at different levels of granularity (from section recommendations, to reference and image recommendations all the way to potentially providing guidance on how to expand, for example, sections, by offering statistics about typical section features).
  • Objective 2: Develop and gather design requirements for how the algorithms’ results should be exposed to the editors. This objective requires the continuation of the work with the community of editors and editathon organizers started in FY16-17.
  • Objective 3: Evaluate the usefulness of article expansion recommendations for target users in typical usage scenarios
  • Objective 4: Build Labs API(s) that can be used by researchers and developers to use and surface the recommendations in other products and research initiatives. (Note that building the productionized API(s), when relevant, will be done in collaboration with Product teams and is not captured in this objective.)

Outcome 2: Editors can benefit from improved recommendations exposed via recommendation API, in Content Translation, and the Editor Dashboard tool.

  • Objective 1: Continue experimenting (and implementing when applicable) algorithmic improvements on article recommendation for creation (the service behind GapFinder and Suggestions feature in Content Translation tool)
  • Objective 2: Develop (personalized) recommendations for the editor dashboard tool, in collaboration with the Editing team. This objective may involve exploring new types of recommendations.

Program 10. Public cloud services & support[edit]

Team: Cloud Services, Community Engagement

Strategic Priorities: Knowledge, Communities

Time Frame: Perpetual

Summary[edit]

The 'services' in the Wikimedia Cloud Services team name encompasses a collection of products that build upon the utility of the core infrastructure as a service (IaaS) product to present a well rounded and useful platform for volunteers. This helps solve the technical problems of the Wikimedia movement.

Goal[edit]

Empower volunteers to create technical solutions to the problems of on-wiki communities with a minimal investment of time and low friction for transferring maintainership from one individual to another.

Outcomes, Objectives, and Milestones[edit]

Outcome 1: Members of the Wikimedia movement are able to develop and deploy technical solutions with a reasonable investment of time and resources on the Wikimedia Cloud Services Platform as a Service (PaaS) product.

  • Objective 1: Maintain existing Grid Engine and Kubernetes web services infrastructure and ecosystems.
  • Objective 2: Migrate Tool Labs account workflows from Wikitech to Striker where they are easier to integrate with the new user onboarding workflow and easier to maintain
    • Milestone 1: Maintain high overall customer satisfaction for the Tool Labs product as measured by the annual developer survey

Outcome 2: The 'Labs, labs, labs' branding confusion is eliminated. Branding is separated, so that all of these are no longer referred to as just ‘Labs”: infrastructure as a service product, the platform as a service product, the team that manages those products, and the community that uses them to produce technical solutions.

  • Objective 1: Complete initial outlined rebranding activities and announcements by 2017-12-31

Outcome 3: Wikimedia community members, Foundation staff, and potential contributors are aware of the breadth of products and services offered by the Cloud Services team.

  • Objective 1: Promote available services and products at relevant conferences, hackathons, and within the Wikimedia communities

Outcome 4: Support requests from Cloud Services users are addressed in a best effort manner without interrupting core operational and development work by the Cloud Services team towards other program goals.

  • Objective 1: Provide first line technical support resources to triage and respond to Cloud Services managed product support requests
    • Milestone 1: Hire first line tech support contractor

Program 11. Improving citations across Wikimedia projects[edit]

Team: Research

Strategic Priorities: Knowledge

Time frame: 12 months (FY 2017-2018). Some initiatives are a continuation of work started in FY 2016-2017 and may continue beyond the next fiscal year.

Summary[edit]

Wikimedia projects rely on verifiability as one of their core policies.There has been growing interest in building a stronger technological foundation to how sources are represented, stored and reused by contributors across Wikimedia projects. Sourcing of statements is a high priority in projects like Wikidata and a range of technical and programmatic initiatives (such as Citoid, the Wikipedia Library, OABot) have been designed to facilitate the creation of references. Despite over 10 years of community-driven efforts to design better ways to support citation-related work in Wikipedia, it’s only with the advent of Wikidata that these efforts have started to coalesce. The present program aims to develop a deeper understanding of how Wikimedia contributors use sources and lay the foundation for better technological support around sources and citations.

Goal[edit]

In the next fiscal year, we will conduct research aiming to: improve the user experience of editors and readers around sources and citations; quantify citation coverage across Wikimedia contents, identifying gaps and areas of low citation quality; help contributors identify topic areas of Wikimedia projects in greater need of sourcing work so that citation quality gaps are addressed. We will leverage our network to establish new formal collaborations and answer research questions related to the coverage, quality and accessibility of citations across Wikimedia projects. We will also continue to lead the WikiCite series, which started in 2016, in order to help align community and technical efforts related to citation data and infrastructure.

Outcomes and Objectives[edit]

Outcome 1: Quantitative research is available to help Wikipedia and Wikidata contributors focus and prioritize their sourcing efforts.

  • Objective 1: Estimate what proportion of content in Wikipedia or Wikidata is unsourced and in need of citations. Estimate what proportion of existing sources cited across Wikimedia projects are accessible by the general public.
  • Objective 2: Collect and analyze clickthrough data for footnotes and external links to understand how readers interact with them (after discussing and reviewing privacy and security implications)

Outcome 2: Readers and contributors’ needs around citations are better understood.

  • Objective 1: Learn about readers’ and contributors’ interactions, experience and needs with referencing and evaluating sources through literature review, surveys, and interviews

Outcome 3: Outreach activities continue to ensure community and technical efforts to improve the structure and quality of citations are aligned

  • Objective 1: Fundraise for, and host the 3rd annual meeting in the WikiCite series (previous events in 2016 and 2017 were entirely funded via restricted grants)

Program 12: Grow contributor diversity[edit]

Team Participants: Research in conjunction with Community Resources

Strategic Priorities: Knowledge, Community https://meta.wikimedia.org/wiki/Research:Voice_and_exit_in_a_voluntary_work_environment

Time Frame: 12 months

Summary[edit]

Only 10-15% of Wikipedia editors are known to be female. The issue of lack of gender diversity has long been acknowledged by the Wikimedia community. We are interested to focus on specific drivers of lack of gender diversity identified in the academic literature, design frameworks that can change such drivers, and measure the impact of such changes on contributor diversity in Wikipedia. (Please read more details about the program documented in Meta.)

Goal[edit]

Design and test socio-technical solutions that can help increase contributor diversity in Wikipedia

Outcome and Objectives[edit]

  • Outcome 1: We improve Wikipedia’s contributor diversity after designing and testing potential intervention(s).
    • Objective 1: Identify the underlying (potential) causes of lack of representative contribution from certain demographics
    • Objective 2: Design frameworks to change the current socio-technical infrastructure to address at least one of the underlying causes of lack of representativeness (“Lack of confidence” is considered one such underlying cause). This step will take place in collaboration with the community of editors already experienced in this area and it has already started.
    • Objective 3: Run experiment(s) to assess whether the recommended design will have the desired outcome