Wikimedia Foundation Annual Plan/2023-2024/Draft/Product & Technology/OKRs

From Meta, a Wikimedia project coordination wiki
Jump to navigation Jump to search
Translate this page; This page contains changes which are not marked for translation.
Ambox warning blue construction.svg
Ambox warning blue construction.svg
This is part of the Wikimedia Foundation's Product and Technology departments' drafting process for the 2023-24 Annual Plan.

This document represents "part 2" of the 2023-24 Annual Planning process for the Wikimedia Foundation's Product and Technology departments. It focuses on the departments' draft "objectives and key results" (OKRs) for the 2023-24 Annual Plan. "Part 1" was an explanation of the draft work portfolios (nominally called "buckets") and the theory and planning behind this document.

Throughout April and May, you are invited to leave comments or questions on the talk page, next to the subheading associated to each draft Key Result. Some of the individual draft Key Results will also have associated "focus group" meetings, where the relevant Wikimedia Foundation staff team would like to talk with a small group of people who care about that topic in particular, to discuss the topic in detail.

Also in April, the draft plan for the whole Wikimedia Foundation will be published. There will be several synchronous and asynchronous ways to provide feedback to that plan.

Objectives v2 (v1) Key Result Explanation

WE1: Contributor experience[edit]

Support the growth of high quality and relevant content within the world’s most linguistically diverse, trusted and comprehensive free knowledge ecosystem by enabling and supporting high quality and accessible experiences.

Context: In order to focus on one thing we need to trade off another. We want to focus on supporting content and content moderators, mobile contributions, supporting online campaigns, and reducing IP blocks. In order to focus on those things we have to deprioritize in-person event support and new editor productivity (with the exception of the IP block KR).


1. Increase unreverted mobile contributions in the main article namespace on Wikipedias by 10%, averaged across a representative set of wikis. This KR provides broad encouragement to promote mobile content editing, both through activities that support the other KRs (e.g. moderation and content coverage) and through activities primarily geared toward mobile contribution. Over the last three years, mobile web content contribution has increased by about 20%. If we can come up with ways to increase it by 10% more in just one year, that will be an acceleration over its natural rate of increase.

This KR is inclusive of both the web and apps. It is "averaged across a representative set of wikis" to ensure that we make improvements that are valuable for multiple wikis, and not just our largest ones. We will choose which wikis later on.

2. Complete improvements to four workflows that improve the experience of editors with extended rights (admins, patrollers, functionaries, and moderators of all kinds); extend their creativity; impact at least four different wikis, and meet KRs for each improvement set collaboratively with the volunteers. We don't yet know whether a specific goal of, say, reducing backlogs is, in fact, what these editors doing moderation work need from us. Ultimately, we want to use our resources to increase their satisfaction and increase their ability to build and manage workflows -- that's what "extend their creativity" is about: these community members have built amazing things, and some of the best ways we can help are when we enable that creativity through platforms, endpoints, templates, and other tools. The number "four workflows" is because of how many teams we might imagine working on this KR. And the number "four different wikis" is to encourage us to generalize our impact across projects where possible. This KR will then require us to work with the affected volunteers to set actual KRs for each given improvement, so that we can agree with them on when we've had impact. Note that work on workflows used by users other than the editors doing moderation can still improve their experience. For instance, work that causes newcomers to create stronger first articles may improve the burdens of those who patrol new articles.
3. X% increase in the share of articles in high-impact topics in mid-size Wikipedias (of articles meeting shared quality standards) -- high-impact topics to be chosen as a collaboration between departments, potentially starting with gender and geography. Aligns with Foundation-wide content metric, touching on both quantity and quality -- this means we could impact it both by generating new articles or by improving existing ones. "High-impact topics" is a concept from Movement Strategy Recommendation #8: "Identify Topics for Impact". Gender and geography are both topic areas that our movement has highlighted as having important content gaps and that our Research team is equipped to measure.

Example of how this might sound (if successful): "It used to be that only 10% of XX Wikipedia's articles that meet YY quality criteria were about art, music, or film. But now it's 15%."

4. X% increase in the share of IP blocks that get appealed, with static or decreasing share of appeals that get unblocked. IP blocks are our movement's main tool for stopping abusers of our sites, but it has the unfortunate effect of blocking many users who are acting in good faith. This causes a particularly negative impact on new editors and on community programs. There is no reliable way to measure how many people are blocked erroneously, but we can approximate it through how many of them request an exemption (i.e. appeal) being blocked. A barrier to doing this, though, is that our appeal process is difficult for users to find and complete. Therefore, this KR attempts to guide us to improving the IP block situation on two fronts. First, it calls on us to make the appeal process clear for users, such that we would expect to see more blocked people appealing. And at the same time it calls on us to reduce how many erroneous blocks are happening in the first place by looking at the share of appeals that are unblocked. In other words, if we are able to block only exactly the right users, then we'll see very few of them getting unblocked. This KR may precipitate deep community and technical discussions about the nature of IP addresses and how we use them, and about the workloads and workflows of the functionaries who manage these processes. As we work with community members, we may discover that there are better ways to measure progress on the IP blocks issues, and we can refocus on other metrics.

WE2: Reading and media experience[edit]

Produce a modern, relevant and accessible reading and media experience for our projects.

Context: We want to focus on increasing unique devices, increasing internal discovery, and non-editing engagement. In order to do that, we have to deprioritize engagement with images and audio and inbound issues with accessibility. The KRs below reflect that focus as well.


1. Ensure a quality reading experience for all users by adapting the default experience for 15% of pageviews, based on the individual needs and constraints of the user. This KR is focused on allowing the opportunity for our interface to adapt to individual needs when necessary.  The theory here is that people will feel more engaged with a website and interface that can adapt based on their needs. This can include work such as dark mode, text and page density, and font size customizations. Some of this adaptation can be done automatically by the interface - for example, creating responsive versions of a feature or tool, or ensuring that dark mode turns on based on the browser or device settings of the user.  In other cases, this adaptation can be done through intentional customization - allowing users to select non-default states in specific (but limited) cases. From an accessibility perspective, it will focus on the features that need to be built as standalone to allow for more accessibility, or to allow for setting defaults that are more accessibility friendly, while leaving the opportunity for customization to users who have different preferences. To set the specific number “15%“, we looked at how users adapt the default experience in the Wikipedia iOS app. 59% of users of the app are using a non-default theme (dark, black, or sepia). We used this number as a baseline, but factored in our assumption that it is more likely that habitual users of Wikipedia on the web take the time to adapt their reading experience, as opposed to sporadic users.
2. Interested readers will discover and browse more content, measured via a 10% increase in internally referred page interactions in representative wikis. This KR is focused on making it easier for interested readers to discover content by exploring different content discovery methods or entry points. The goal is to provide readers with these options in specific moments of their journey or after specific actions which indicate that they’re interested in learning more.  "Page interactions", in this context, is inclusive of all the ways that a user can interact with content beyond just looking at a page (page previews are an example).  "Internally referred" means that we'll only be counting those page interactions that happen after a user already starts their session on our property (i.e. excluding the first time they land on the site, which usually happens through a search engine referral).
3. Deepen reader engagement with Wikipedia via 0.05% of unique devices engaging in non-editing participation. This KR focuses on deepening reader engagement, while also exploring ways in which readers can contribute to our projects that are not editing pages. We hypothesize that there are people who are interested in getting involved with the wikis but for whom editing of any kind is too big of a leap. We want those people to have a way to get more deeply involved, perhaps becoming more committed readers, or eventually becoming comfortable enough to edit. "Non-editing participation" refers to any actions users can take on the wikis besides editing (we are also counting edits to discussions as 'editing'). While our websites don't have any of this, our apps do, in the form of reading lists or sharing content to social media. This work could include letting users configure their own personal reading experience, or could also focus on sharing content across the wiki, curating, and suggesting content to others. The KR is inclusive of work on the mobile and desktop websites and the apps.  For mobile and desktop it may include the adoption of some non-editing participation functionality that exists on the apps.  For the apps, it may include improving on existing functionality or building out new ideas. The number 0.05% is approximately the ratio of editors to unique devices -- so perhaps in the first year of this feature set, we see a similar ratio for non-editing participants, which would eventually increase to greater than the number of editors in the future.

WE3: Knowledge Platform[edit]

Increase collaboration and efficiency among software developers by improving the development process for MediaWiki


1. Reduce fragmentation in developer workflows, measured by X% increase in adoption of officially supported developer tools. The goal of this key result is to provide standard development tools that meet the needs of most Wikimedia developers. We also aim to be able to replicate production-like environments for a wider range of components at the development, testing and deployment stages. By accomplishing this, we will provide a better developer experience. This experience will allow engineers to onboard more quickly, assist each other when running into difficulties and deploy new features to production with greater confidence. This work is not intended to serve all developer workflows in the first year, but to make improvements in the areas that most impact developer productivity.
2. Increase by 20% the number of authors that have committed more than 5 patches across a specific set of MediaWiki repositories that are deployed to production. Increasing the number of people willing and able to contribute to the MediaWiki code base will make it less likely that a team gets blocked when changes to MediaWiki core are needed. It also makes it less likely that workarounds are created that add technical debt. In addition, this metric shows that the code base is becoming easier and safer to contribute to without unexpected effects.
3. Resolve and document 4 major points of technical strategic direction/policy/process. Product and Technology leadership has identified key areas where strategic direction is needed to increase the impact of technical work. Examples include defining an approach to support for MediaWiki outside Wikimedia and creating a policy for open-source software. Defining a strategic direction for these topics will mean increased efficiency and more cohesion in Wikimedia’s technical direction.
Objectives v2 (v1) Key Results Explanation

SDS1: Defining essential metrics[edit]

Each metric and dimension in our essential metric data set is scientifically or empirically supported, standardized, productionized, and shared across the Foundation.

Context: Effective use of metrics to make strategic decisions at the Foundation requires us to measure and assess the impact of work using a common, reliable, and well-understood set of metrics. Ensuring that different teams working on different projects are using the same metrics with the same definitions to understand the impact of their work will allow us to align efforts across the Foundation, with affiliates, and with communities. These metrics will allow Foundation staff and communities to evaluate proposals for programs and product features, and to monitor and evaluate results. And they enable the engineers who support the tools used in data preparation and analysis to deliver a higher standard of service by more precisely defining the scope of their work, making the effort more tractable with our current resourcing. Data is only as useful as it is accessible to users. Our metrics must have maximum accessibility for us to maximize their utility to all audiences. We will gather, organize, and make available the necessary information to guide appropriate use and prevent misuse.


1. Three out of the four core metric areas, provide at least 1 scientifically or empirically supported metric which has a clear definition, calculation, data provenance, versioning, and designated data steward. If we are to use metrics to make strategic decisions, we must have a broadly shared understanding of how they are defined, how we measure them, when they have changed, and who is accountable for guiding the definition and maintenance of the metric. This will allow us to know that when teams work on moving a metric that they are working on the same goal.
2. For three out of the four core metric areas [content, contributors, relevance, sustainability], at least 1 dataset is fully and publicly documented with clear guidance on how to use it to guide strategic decisions. In order for staff and volunteers to be able to understand and use our metrics, we must share public documentation. Without this, the metrics will have limited utility.

SDS2: Making empirical decisions[edit]

Wikimedia staff and leadership make data-driven decisions by using essential metrics to evaluate program progress and assess impact

Context: By using essential metrics to evaluate program progress and assess impact, we can ensure that we are making informed decisions that are backed by evidence. This allows us to stay focused on our most important goals, make adjustments as needed, and track our progress over time.


1. 100% of our defined and produced essential metrics data is consistently described in a data catalog to include provenance and means of production. A data catalog is an essential piece of data management infrastructure that stores metadata in a consistent, searchable, and discoverable way. It supports proper use of data in other tools, and provides a baseline of means for compliance, for example with privacy policy. Describing our metrics in a data catalog unlocks other capabilities down the road.
2. Four cross-department Wikimedia initiatives adopt a core metric as a measure of progress or impact. If we want to make empirical decisions as an organization, then we must use these essential metrics for more than on-platform product decisions. Having them used by other departments in a coordinated way will show us that adoption is real.
3. For three out of the four core metric areas, publish data reports that display measurements and trends based on core metrics, made available to the public. A data report provides a summary view of a metric area, and for many consumers of metric information, this will be the starting point.
4. Establish and implement a process to ensure that our essential metrics continually evolve to support data-informed decision making. The world in which we operate and the technological environment around us are continually changing. Consider the range of changes in consumer electronics and information consumption in the last ten years, or the changes in what data may or may not be available to us. Our core metrics together represent a theory about how we make impact in the world. To ensure we continue to make impact, we must re-evaluate whether our theory remains true, whether we can continue to measure in the way we have, and to make adjustments as needed.

SDS3: Using and distributing data[edit]

Users can reliably access and query Wikimedia content at scale

Context: Search and discovery experiences are critical to how users experience our content. We must be able to deliver those experiences in a reliable, sustainable, scalable fashion to meet the needs of free knowledge distribution and discovery.


1. Reduce the number of unsatisfied requests for Wikidata by 50% Right now, the infrastructure powering WDQS has well documented technical issues. These can cause user queries to the endpoint to fail in multiple ways – sometimes taking Blazegraph down. While the team has worked on ways to manage the data in Blazegraph (namely, reducing the graph size) as an approach to manage the situation, that is likely not a long term solution. In the long run, we need to find other ways for user needs for querying and retrieving Wikidata to be satisfied.
2. Identify and implement a way to measure editor and reader satisfaction with search, evaluate satisfaction, and use the evaluation to inform at least 1 product decision. One of the challenges with understanding what is worth improving about our on-wiki search experience is lacking a baseline measure of search user satisfaction, with editor and reader being two important categories of search users. We need to establish this baseline before seeking to improve it. Using it to inform a product decision closes the loop on an empirical path forward for search improvement.
3. For each of the four core metric areas, at least one dataset is systematically logged and monitored, and staff receive alerts for data quality incidents as defined in data steward-informed SLOs. Logging and monitoring are standard operational practices to ensure reliability and quality of a service. In this case, we must implement these same practices for the core metrics. This will allow us to sustain data quality standards by understanding what qualifies as a deviation from the standards and knowing when there has been a deviation, so we can address it.
4. 100% of productionized non-privacy-sensitive essential metric datasets are publicly available. As is our practice with data distribution in general, we intend to make our essential metric data freely available, as much as our privacy policy and practices can allow. This is how we ensure access to affiliates, chapters, user groups, volunteers, and other interested parties in the public.
Objectives v2 (v1) Key Results Explanation Research

FA1: Describe multiple potential strategies[edit]

Through which Wikimedia could satisfy our goal of being the essential infrastructure of the ecosystem of free knowledge


1. Participants in Future Audiences work are equipped with at least three candidate strategies for how Wikimedia projects (especially Wikipedia and Wikimedia Commons) will remain the “essential infrastructure of free knowledge” in the future, including the audiences they would reach, the hypotheses they test, and approaches for testing them. Before the Future Audiences bucket digs in to investigate possible future work, we want to lay out the different strategies that we'll be investigating, and think through the questions that need to be answered to detect their viability.

Commons community members have explicitly asked us to think about the strategy for the future of Commons -- this KR ensures that we do, but that it also fits in with the larger product strategy thinking of the bucket.

Wikimedia External Trends 2023 overview highlighted a number of changes to technology and user behavior in search and content creation that pose potential risks to our movement's sustainability. This track of work will be aimed at diving deeper into how our projects and communities can continue to thrive in the face of different potential future challenges.

Contact: User:MPinchuk (WMF)

FA2: Test hypotheses[edit]

To validate or invalidate potential strategies for the future, starting with a focus on third party content platforms


1. Test a hypothesis aimed at reaching global youth audiences where they are on leading third party content platforms, to increase their awareness and engagement with Wikimedia projects as consumers and as contributors One of the strategic directions we're sure we want to investigate is around the spreading of free knowledge on other platforms, like YouTube, Instagram, etc. A tremendous amount of knowledge is consumed in these places for free, and we don't yet do anything to facilitate that, nor do we yet have theories on how to gain participants and revenue from those places.
  • 2022 Brand Health Survey looked at how Wikipedia is seen by different age groups. It noted especially low scores among 18-24 year olds in some markets (US, Germany, South Africa), who gave Wikipedia a negative Net Promoter Score. Per the survey: "This poses a high risk for the future of the project and the movement as a whole."
  • The New York Times reported on evidence that global youth are increasingly spending time on social apps and less time using traditional search engines (which typically bring the bulk of new audiences to our projects).

Contact: User:MPinchuk (WMF)

2. Test a hypothesis around conversational AI knowledge seeking, to explore how people can discover and engage with content from Wikimedia projects Another strategic direction we're sure we want to investigate is around conversational AI, a technology that looks like it will be transformative in the free knowledge ecosystem. Not all work using large language models and chatbots would fall in this KR; rather just that work that investigates conversational AI as a way to bring free knowledge to audiences that otherwise would not experience Wikimedia content.
  • Reuters reported that as of February 2023, 2 months after launching, ChatGPT had 100 million active users, indicating its large appeal and fast growth.
  • GPT-4 and other LLMs are now being used to power many new tools including search and content creation online. Many in our movement are interested in and concerned about how our work and projects can continue to thrive in a world of increasingly sophisticated AI tools.

Contact: User:MPinchuk (WMF)