Wikimedia Enterprise/Essay

From Meta, a Wikimedia project coordination wiki
Jump to navigation Jump to search
Other languages:
Deutsch • ‎English • ‎español • ‎français • ‎русский • ‎ไทย
Wikimedia Enterprise
enterprise.wikimedia.com
DescriptionMain pagePrinciples
Essay & FAQ (March 2021)
TechnicalMediaWikiPhabricatorGit
This essay about the Wikimedia Enterprise API
was written by its team and represents its views and those of the Wikimedia Foundation. Published in March 2021


Libre and Gratis are the two meanings of “free,” commonly phrased as free as in speech, or free as in beer.

Wikimedia projects are, have always been, and will always remain libre. The principles of free cultural works mean that anyone can use Wikimedia without restriction, including commercially. As a movement, we embrace this. It is why we reject ‘non-commercial’ licenses, as they would limit the kinds of reuse possible. And it is why we consider commercial reuse an important means of distributing knowledge to audiences.

Equally, Wikimedia projects are, have always been, and will always remain gratis. The ability to freely access the knowledge available across all Wikimedia projects has always been core to the mission of the Foundation and the movement. We provide this access not only to individuals visiting our websites but also programmatically to machines so that our content can be repurposed in other environments. The full corpus of Wikimedia content always has been, and will continue to be, made available for reuse in various forms (including but not limited to database dumps, APIs, and scraping) at no cost.

As a result, our content is often repurposed by for-profit organizations that rely on it to support their business models, and which consequently earn revenue from it. Outside of voluntary corporate donations to the Wikimedia Foundation, the movement has never received benefits from any of this revenue through return investment. In acknowledgement of this, under the heading of Increase the sustainability of our movement the Movement Strategy process asked the Wikimedia Foundation to explore, among other things, “enterprise-level APIs...models for enterprise-scale for-profit reusers, taking care to avoid revenue dependencies or other undue external influence in product design and development.” Furthermore, under the heading Improve User Experience, a further recommendation stated, "Make the Wikimedia API suite more comprehensive, reliable, secure, and fast, in partnership with large scale users where that aligns with our mission and principles, to improve the user experience of both our direct and indirect users, increase the reach and discoverability of our content and the potential for data returns, and improve awareness of and ease of attribution and verifiability for content reusers."

The Enterprise project team is developing a new resource aimed at for-profit content reusers, who have product, service, and system requirements that go beyond what we freely provide. Use of this offering will not be required for for-profit content reuse; companies can continue to use the current tools available at no cost. All Enterprise API revenue will unequivocally be used to support the Wikimedia mission—for example, to fund Wikimedia programs or help grow the Wikimedia Endowment.

This project represents a new kind of activity at the Foundation. The project is at a very early stage that should be considered a learning period. We will have successes, we will make mistakes, and we will need to adapt our strategies. The team is committed to listening, engaging, and where possible, integrating the feedback we get on our work. This document is organic and is reflective of the team's current thinking; we are attempting to document as much work as possible in the open. Up until now, our work has been shaped by a series of initial interviews with community members, Wikimedia Foundation Board and staff, researchers, and reusers.

Commercial reusers of Wikimedia[edit]

Beyond normal reading of pages, access to Wikimedia content by reusers is currently achieved through three broad means: Scraping of web pages; data dumps; and APIs. These services are provided freely to all reusers of Wikimedia content. They are and will remain free, libre and gratis, to everyone.

High-volume for-profit entities, independent smaller initiatives, and individual volunteer reusers rely on the same services and the same bandwidth, accessed at the same time and with the same rate-limits and update frequency. What many of the largest commercial technology organizations require in order to effectively utilize Wikimedia content goes beyond what we currently provide. Consequently, each of these large companies independently re-builds Wikimedia projects internally to address their very similar use-cases. This significant investment is not only duplicated effort, but also represents resources spent within each company rather than in support of Wikimedia itself, or the broader free knowledge ecosystem.

Some well known examples of high volume commercial reuse of Wikimedia content include:

  • The ‘infoboxes’ or knowledge graphs shown in search engine results
  • Voice-operated virtual assistants such as Siri and Alexa
  • Augmented information provided on digital maps, such as in-flight entertainment systems or smartphones

The Wikimedia Enterprise API is a new service focused on use cases of high-volume for-profit reusers of Wikimedia projects, that those entitites can use at scale, and for which they will be charged.

Why charge[edit]

Sustainability[edit]

The Wikimedia movement strategy process has generated the strategic direction that lays out the ultimate challenges we want to try to solve. The movement aims to provide a platform that provides open knowledge to the world, in whatever medium, by removing the social, political, and technical barriers that prevent the creation and access to free knowledge. That is a huge challenge. There are technological gaps we need to solve, knowledge gaps we need to fill, and gaps in knowledge access to address, as well. Complementing the strategic direction are the movement strategy recommendations that many hundreds and possibly thousands of people have poured their time and energy into, which address the ways we hope to tackle the challenges we face in working towards the strategic direction.

From a resource perspective, this is about setting up the movement to thrive for decades to come, to weather any storm, and to genuinely stand a chance at achieving the mission first conceived 20 years ago. We’re going to need more resources, more partners, and more allies if we are going to achieve the goals implicit in our vision statement and 2030 strategic direction. The key will be making sure that support is diverse, unrestricted, and removed from direct program influence. That’s why it’s important to make sure the movement can sustain itself both now and for the future in perpetuity.

Consequently, one of the movement strategy recommendations specifically requests the creation of what is now known as The Wikimedia Enterprise API:

Explore new opportunities for both revenue generation and free knowledge dissemination through partnerships and earned income [...] Building enterprise-level APIs [...] Engage partners in the development wherever appropriate, incorporating the needs of a spectrum of small, non-commercial, and larger commercial reusers. Explore fees or sustainability models for enterprise-scale commercial reusers, taking care to avoid revenue dependencies or other undue external influence in product design and development. Develop appropriate safeguards to ensure continued free, unrestricted access for non-commercial, research, and small to moderate commercial use. — Strategy Recommendations, Increase the Sustainability of Our Movement

Self-funding[edit]

Serving the needs of a group of highly intensive reusers of Wikimedia content is ambitious. Those needs are valid. However, using the Wikimedia Foundation's existing financial resources to respond to these needs would mean subsidizing the software development needs of some of the world's largest commercial organizations with donor money. The Wikimedia Enterprise API avoids this through self-funding.

Making the Enterprise API service self-funding allows for the hiring of dedicated support for those customers without needing to take financial resources away from supporting existing volunteer editors’ and readers’ needs. In the long term, this frees up the existing Wikimedia infrastructure and staff to focus on community and movement needs. The costs of development of the Enterprise platform, ongoing maintenance, and any additional expenses resulting from it will be fully covered by that revenue.

Maintaining our Independence[edit]

The Wikimedia Foundation is primarily funded by readers from around the world who give an average of $15 responding to appeals in banners and email. This funding model has supported the foundation's growth while maintaining our independence. Approximately 8 million readers will contribute to the Wikimedia Foundation this year. We want to be very clear: This is the best and most important support that the movement receives. It gives us independence and keeps us aligned to serving our readers. We will not let nor do we expect revenue from the Wikimedia Enterprise API eclipse the generous support we receive from our donors. If it grows to be a significant source of revenue, we will come back to the community to discuss ways to insulate the Wikimedia Foundation from the influence that might come from it.

It is also critical to realize that the small donation model is partially dependent on desktop and mobile traffic. Even as global access to the internet continues to grow, Wikimedia readership has remained effectively static for the last several years. One of the biggest changes is that an increasingly significant proportion of interactions with Wikimedia content is no longer on the Wikimedia websites themselves. Since 2015 the Wikimedia Foundation identified this change as something that could severely impact the movement’s ability to support itself in its long-term and ongoing work. As more people access Wikimedia content beyond our own websites—often through services the Enterprise API will aim to support—it is important to diversify the movement's funding sources. This will increase the resilience of the Wikimedia movement in the event that traffic to wikipedia.org decreases. The project therefore helps to ensure the financial sustainability of the movement.

Ensuring commercial investment in free knowledge[edit]

It is important to ensure that large for-profit organizations recognize the value that Wikimedia brings to their product. High-volume reusers increasingly rely on Wikimedia projects, as well as on the Wikimedia volunteer community that creates and curates that content, while becoming increasingly profitable. Speaking of corporate donations in 2019, Katherine Maher stated, “We want people all over the world to use, share, add to, and remix Wikipedia...At the same time, we encourage companies who use Wikimedia’s content to give back in the spirit of sustainability.” Enabling large-scale for-profit reusers to have a contractual relationship with Wikimedia Enterprise means that as their reliance on Wikimedia increases, their investment in the Wikimedia movement will increase commensurately. This will increase the revenue available for the Wikimedia movement to invest in the movement strategy recommendations, our 2030 strategic direction and the Wikimedia Endowment, which ensures the long-term sustainability of the Wikimedia projects. It will also ensure that donations from our readers will not be used to cover the expenses of large corporate reusers. They will be paying their own way, and also contributing back to the cultural and intellectual commons of humanity.

What services do commercial reusers need[edit]

The focus of the Enterprise API is on Wikimedia content reusers that wish to repurpose all or most of our content in a for-profit environment. Our current hypothesis is that these reusers have four immediate needs from a service that supports large-scale content reuse: system reliability, high-frequency or real-time access, content integrity, and machine readability. At present, we offer some of these services, but in a piecemeal, disconnected way. Bringing them together into a single platform that offers a better user experience is the immediate goal of the Wikimedia Enterprise project.

System reliability[edit]

High-volume reusers tend to use our content in ways that are critical to the function of their services. This means that the reliability of their systems and services depends to some extent on the reliability of ours. Currently, many of our APIs and data services (EventStreams API and Dumps) are not designed with the large-scale use cases of for-profit reusers in mind. For-profit reusers expect not only an extremely high volume of content to be available with high system reliability, but most importantly, a contractual guarantee of that reliability. Wikimedia Enterprise API aims to provide these service guarantees, offering a way for for-profit entities and services to be more confident when incorporating Wikimedia content in business-critical settings.

At this early stage of the project, where long-term direction and success remain uncertain, we are building this service on an externally owned and operated cloud infrastructure (AWS) in conjunction with contracted engineers. This ensures that our own infrastructure, and staff, are not being burdened with or disrupted by the contractual requirement of system reliability that only affects a very small number of for-profit reusers. It also ensures donor money is spent on Wikimedia’s own infrastructure and not used to subsidize major companies’ technical requirements.

High frequency or real-time access[edit]

Access to bulk data services in Wikimedia is currently available through our SQL/XML dumps on a fortnightly basis, through HTML scraping directly by the user and querying of Wikimedia APIs. Select immediate updates, such as recent changes, can also be accessed through the EventStreams API. Providing access to the full Wikimedia dataset at a faster cadence would allow content reusers more flexibility in using our data to suit the needs of their specific use case.

Content integrity[edit]

For certain types of Wikimedia content, there can be tension between content that is recent and content that has been reviewed by the community. It is sometimes the case that content which is more recent is more susceptible to vandalism, misinformation, or disinformation, in comparison with content that has been exposed to several hours or days of visibility and review from the community.

Depending on their context of reuse, some reusers have a preference for recency (such as a researcher looking to examine the state of a particular project at a specific point in time), whereas others have a preference for accuracy (such as a search engine looking to provide biographical summaries of notable people). Providing a methodology by which content reusers can choose to access the type of content they need is critical to support a wide range of content reuse cases.

Structure[edit]

Content reusers already make significant use of the content of all Wikimedia projects, but the way in which each project (and language edition) is used, processed, and integrated by full-corpus reusers is unique in case: different methodologies, different formats, and different frequencies pertain to each case. This is partly the result of the unstructured nature of many of our projects, but is also due to editorial practices and presentation choices fundamentally differing from wiki to wiki. While this aspect makes Wikimedia amazingly useful, it also creates challenges for full-corpus reusers.

Augmenting Wikimedia’s content and data to put additional structure behind our unstructured content will allow content reusers to adapt more easily to their individual requirements, while allowing us to provide more inputs, including attribution, licensing, and content quality—all in one place.

The Wikimedia Enterprise API will not directly affect Wikidata, or the Query Service. Also, at this stage of development, the Enterprise API does not serve data from Wikidata or Wikimedia Commons. While WDQS is a significant service for bulk Wikidata reusers for baselining their knowledge graphs, currently the goals of the Enterprise API are focused on streaming near real time content – which is a different service than WDQS. Eventually, some information that Enterprise API customers currently obtain via the WDQS might now be obtained via the API instead, which may decrease the amount they use the WDQS service.

What will the Enterprise API users be charged for[edit]

Based on interviews with current users and potential customers, we are identifying what reusers need most, what they are willing to pay for, and what we are able to deliver. Most elements are likely to be provided as part of a commercial contract known as a Service Level Agreement (SLA) with Wikimedia Enterprise API's users. This contract will relate to things like the frequency of data updates, the reliability [uptime] of the service, and the availability of technical support.

The format of the Wikimedia content provided to Wikimedia Enterprise API customers will be more tailored to the specific needs of large scale usage. This could include the grouping and filtering of multiple [publicly available] API outputs into a single “package” of content, the re-parsing of Wikimedia content into a different output (for example HTML as opposed to Wikitext), and the duration of the contract to access the service.

As this project is still at an early stage, the specific business model and the most appropriate pricing levels are still being investigated. Similarly, we are still exploring the most efficient and effective way for the Wikimedia Foundation Board to provide oversight as part of its standard governance responsibilities.

As we develop and adjust based on feedback, we have produced a list of Wikimedia Enterprise operating principles to guide its activities as a project, and help determine what its for-profit customers can do.

Free access for some users[edit]

The Enterprise API is designed to be used by high-volume for-profit customers but will allow for free access by some users, in cases where existing APIs do not meet specific needs. Other technology organizations offer similar kinds of free-access exemptions to their paid services; for example, Github Premium is available gratis to nonprofits.

The Wikimedia Enterprise team is working with Wikimedia Technical Engagement to add free community support through cloud services by June 2021. In the mean time, Access to the Enterprise API services will be provided at no charge in use-cases which are both: highly related to the Wikimedia mission (especially where the use is non-commercial or open-access in nature); requiring high frequency data services not served by the existing APIs or database dumps. Academic research and mission-aligned non-profit services are two such potential cases. Volunteers and researchers wishing to request free access should directly contact the team.

How this will be structured legally[edit]

For now, with the approval of the Wikimedia Foundation Board, the Wikimedia Foundation has set up a single-member, US limited liability company (LLC) to provide these services. The use of subsidiaries by mature non-profits is common even within FLOSS and free knowledge:

  • Creative Commons owns a Canadian subsidiary for contractual purposes when operating in Canada.
  • Mozilla Foundation has a wholly-owned for-profit subsidiary in the form of the Mozilla Corporation focused on revenue generation.
  • Linux Foundation owns a number of wholly-owned subsidiaries for a variety of purposes, including revenue generation through the for-profit provision of training services.
  • Open Data Institute and Open Knowledge Foundation both use for-profit revenue to fund their activities.

Based on advice from the Foundation’s legal team, we settled on the simple LLC structure to allow us to test this service model. This approach should limit startup costs and unnecessarily complex government reporting requirements through the early phases. The LLC structure will also insulate the Foundation from liabilities generated by the service. That said, the Foundation is still required under US law to publicly disclose the LLC’s revenues and expenses in our annual tax filings (find the previous audited financial reports here). As the project matures, we may change the specific legal structure, but we will always retain the same operating principles. This might include moving to a more permanent legally robust structure in the long term. The LLC will operate under the auspices of the Wikimedia Foundation, all its staff will be Wikimedia Foundation staff, and is ultimately subject to the governance of the Wikimedia Foundation Board of Trustees.

Regardless of the legal structure, all Wikimedia Enterprise revenue will unequivocally be used to support the Wikimedia mission—for example, to fund Wikimedia programs or help grow the Wikimedia Endowment.


To facilitate discussion any of the issues raised in this essay, the Wikimedia Enterprise team of will host regular open “office hours” meeting for at least the initial development phase, and will continue to be available through asynchronous communication channels (such as the project’s Meta talkpage).