Wikimedia Enterprise/Essai

From Meta, a Wikimedia project coordination wiki
Jump to navigation Jump to search
This page is a translated version of the page Wikimedia Enterprise/Essay and the translation is 27% complete.
Outdated translations are marked like this.
Wikimedia Enterprise
Wikimedia Enterprise logo.svg
Building new opportunities for both revenue generation and free knowledge dissemination through partnerships and earned income.
enterprise.wikimedia.com
ProjetPage principalePrincipes
FAQ & Essai (mars 2021)
Détails techniquesMediaWikiUpdatesPhabricatorGit
Documentation API
This essay about the Wikimedia Enterprise API
was written by its team and represents its views and those of the Wikimedia Foundation. Published in March 2021


 

Les projets Wikimedia sont libres, ont toujours été libres, et resteront toujours libres. En tant que mouvement, nous épousons les principes des œuvres culturelles libres, selon lesquels n'importe qui peut utiliser Wikimedia sans restriction, y compris à des fins commerciales. C’est pourquoi nous rejetons les licences « non commerciales », car elles limiteraient les types de réutilisation possibles. Et c'est pourquoi nous considérons la réutilisation commerciale comme un moyen important de diffuser les connaissances auprès des publics.

De même, les projets Wikimedia sont gratuits, ont toujours été gratuits, et resteront toujours gratuits. La possibilité d'accéder librement aux connaissances disponibles dans tous les projets Wikimedia a toujours été au cœur de la mission de la Fondation et du mouvement. Nous fournissons cet accès non seulement aux personnes qui visitent nos sites web, mais également aux programmes qui interrogent nos serveurs afin que notre contenu puisse être réutilisé dans d'autres environnements. Le corpus complet du contenu de Wikimedia a toujours été mis à disposition pour une réutilisation sous diverses formes, et continuera de l'être (notamment via des sauvegardes de bases de données, des API, et le web scraping) sans frais.

Par conséquent, notre contenu est souvent réutilisé par des organisations commerciales qui en dépendent pour soutenir leurs modèles commerciaux et qui en tirent ainsi des revenus. En dehors des dons corporatifs effectués volontairement à la Fondation Wikimedia, le mouvement n'a jamais reçu de bénéfices de ces revenus grâce à un retour sur investissement. En reconnaissance de cela, sous l’égide de la « Durabilité de notre mouvement », le processus de stratégie du mouvement a demandé à la Wikimedia Foundation d'explorer, entre autres, « des API pour les entreprises [et] des modèles pour les réutilisateurs commerciaux grande échelle, en prenant soin d'éviter la dépendance des revenus ou toute autre influence extérieure indue dans la conception et le développement des produits. » En outre, sous l’égide de l'expérience utilisateur, une autre recommandation invite à « Rendre la panoplie d'API Wikimedia plus complète, fiable, sécurisée et rapide, en partenariat avec des utilisateurs à grande échelle, là où cela correspond à notre mission et à nos principes, pour améliorer l'expérience utilisateur de nos utilisateurs directs et indirects, augmenter la portée et la découvrabilité de notre contenu et le potentiel de retour de données, et améliorer la connaissance et la facilité d'attribution et de vérifiabilité pour les réutilisateurs de contenu. »

L'équipe du projet Enterprise développe une nouvelle ressource destinée aux réutilisateurs commerciaux de contenu, qui ont des exigences en matière de produits, de services et de systèmes qui vont au-delà de ce que nous fournissons gratuitement. L'utilisation de cette offre ne sera pas requise pour la réutilisation commerciale du contenu ; les sociétés peuvent continuer à utiliser gratuitement les outils actuellement disponibles. Tous les revenus de l'API Enterprise seront utilisés sans équivoque pour soutenir la mission Wikimedia, par exemple pour financer des programmes Wikimedia ou aider à développer le fonds de dotation Wikimedia (Wikimedia Endowment).

Ce projet représente un nouveau type d'activité à la Fondation. Le projet est à un stade très précoce qui devrait être considéré comme une période d'apprentissage. Nous aurons des succès, nous ferons des erreurs et nous devrons adapter nos stratégies. L'équipe s'engage à écouter, à s'engager et, dans la mesure du possible, à intégrer les commentaires que nous recevons sur notre travail. Ce document est organique et reflète la réflexion actuelle de l'équipe ; nous essayons de documenter notre travail de façon publique autant que possible. Jusqu'à présent, notre travail a été façonné par une série d'entretiens initiaux avec des membres de la communauté, le conseil d’administration et le personnel de la Wikimedia Foundation, des chercheurs et des réutilisateurs.

Réutilisateurs commerciaux de Wikimedia

Beyond normal reading of pages, access to Wikimedia content by reusers is currently achieved through three broad means: Scraping of web pages; data dumps; and APIs. These services are provided freely to all reusers of Wikimedia content. They are and will remain free, libre and gratis, to everyone.

High-volume for-profit entities, independent smaller initiatives, and individual volunteer reusers rely on the same services and the same bandwidth, accessed at the same time and with the same rate-limits and update frequency. What many of the largest commercial technology organizations require in order to effectively utilize Wikimedia content goes beyond what we currently provide. Consequently, each of these large companies independently re-builds Wikimedia projects internally to address their very similar use-cases. This significant investment is not only duplicated effort, but also represents resources spent within each company rather than in support of Wikimedia itself, or the broader free knowledge ecosystem.

Some well known examples of high volume commercial reuse of Wikimedia content include:

  • The ‘infoboxes’ or knowledge graphs shown in search engine results
  • Voice-operated virtual assistants such as Siri and Alexa
  • Augmented information provided on digital maps, such as in-flight entertainment systems or smartphones

The Wikimedia Enterprise API is a new service focused on use cases of high-volume for-profit reusers of Wikimedia projects, that those entitites can use at scale, and for which they will be charged.

Pourquoi faire payer ?

Durabilité

The Wikimedia movement strategy process has generated the strategic direction that lays out the ultimate challenges we want to try to solve. The movement aims to provide a platform that provides open knowledge to the world, in whatever medium, by removing the social, political, and technical barriers that prevent the creation and access to free knowledge. That is a huge challenge. There are technological gaps we need to solve, knowledge gaps we need to fill, and gaps in knowledge access to address, as well. Complementing the strategic direction are the movement strategy recommendations that many hundreds and possibly thousands of people have poured their time and energy into, which address the ways we hope to tackle the challenges we face in working towards the strategic direction.

Du point de vue des ressources, il s'agit de préparer le mouvement à prospérer pendant les décennies à venir, à résister à toutes les tempêtes et à avoir une chance réelle de réaliser la mission conçue il y a 20 ans. Nous aurons besoin de plus de ressources, de plus de partenaires et de plus d'alliés si nous voulons atteindre les objectifs implicites de notre déclaration de vision et de notre orientation stratégique 2030. La clé sera de s'assurer que ce soutien est diversifié, sans restriction et sans influence directe sur les programmes. C'est pourquoi il est important de s'assurer que le mouvement peut se maintenir à perpétuité, aujourd'hui et à l'avenir.

Consequently, one of the movement strategy recommendations specifically requests the creation of what is now known as The Wikimedia Enterprise API:

Explore new opportunities for both revenue generation and free knowledge dissemination through partnerships and earned income [...] Building enterprise-level APIs [...] Engage partners in the development wherever appropriate, incorporating the needs of a spectrum of small, non-commercial, and larger commercial reusers. Explore fees or sustainability models for enterprise-scale commercial reusers, taking care to avoid revenue dependencies or other undue external influence in product design and development. Develop appropriate safeguards to ensure continued free, unrestricted access for non-commercial, research, and small to moderate commercial use. — Strategy Recommendations, Increase the Sustainability of Our Movement

Autofinancement

Serving the needs of a group of highly intensive reusers of Wikimedia content is ambitious. Those needs are valid. However, using the Wikimedia Foundation's existing financial resources to respond to these needs would mean subsidizing the software development needs of some of the world's largest commercial organizations with donor money. The Wikimedia Enterprise API avoids this through self-funding.

Making the Enterprise API service self-funding allows for the hiring of dedicated support for those customers without needing to take financial resources away from supporting existing volunteer editors’ and readers’ needs. In the long term, this frees up the existing Wikimedia infrastructure and staff to focus on community and movement needs. The costs of development of the Enterprise platform, ongoing maintenance, and any additional expenses resulting from it will be fully covered by that revenue.

Conserver notre indépendance

The Wikimedia Foundation is primarily funded by readers from around the world who give an average of $15 responding to appeals in banners and email. This funding model has supported the foundation's growth while maintaining our independence. Approximately 8 million readers will contribute to the Wikimedia Foundation this year. We want to be very clear: This is the best and most important support that the movement receives. It gives us independence and keeps us aligned to serving our readers. We will not let nor do we expect revenue from the Wikimedia Enterprise API eclipse the generous support we receive from our donors. If it grows to be a significant source of revenue, we will come back to the community to discuss ways to insulate the Wikimedia Foundation from the influence that might come from it.

It is also critical to realize that the small donation model is partially dependent on desktop and mobile traffic. Even as global access to the internet continues to grow, Wikimedia readership has remained effectively static for the last several years. One of the biggest changes is that an increasingly significant proportion of interactions with Wikimedia content is no longer on the Wikimedia websites themselves. Since 2015 the Wikimedia Foundation identified this change as something that could severely impact the movement’s ability to support itself in its long-term and ongoing work. As more people access Wikimedia content beyond our own websites—often through services the Enterprise API will aim to support—it is important to diversify the movement's funding sources. This will increase the resilience of the Wikimedia movement in the event that traffic to wikipedia.org decreases. The project therefore helps to ensure the financial sustainability of the movement.

Ensuring commercial investment in free knowledge

It is important to ensure that large for-profit organizations recognize the value that Wikimedia brings to their product. High-volume reusers increasingly rely on Wikimedia projects, as well as on the Wikimedia volunteer community that creates and curates that content, while becoming increasingly profitable. Speaking of corporate donations in 2019, Katherine Maher stated, “We want people all over the world to use, share, add to, and remix Wikipedia...At the same time, we encourage companies who use Wikimedia’s content to give back in the spirit of sustainability.” Enabling large-scale for-profit reusers to have a contractual relationship with Wikimedia Enterprise means that as their reliance on Wikimedia increases, their investment in the Wikimedia movement will increase commensurately. This will increase the revenue available for the Wikimedia movement to invest in the movement strategy recommendations, our 2030 strategic direction and the Wikimedia Endowment, which ensures the long-term sustainability of the Wikimedia projects. It will also ensure that donations from our readers will not be used to cover the expenses of large corporate reusers. They will be paying their own way, and also contributing back to the cultural and intellectual commons of humanity.

What services do commercial reusers need

The focus of the Enterprise API is on Wikimedia content reusers that wish to repurpose all or most of our content in a for-profit environment. Our current hypothesis is that these reusers have four immediate needs from a service that supports large-scale content reuse: system reliability, high-frequency or real-time access, content integrity, and machine readability. At present, we offer some of these services, but in a piecemeal, disconnected way. Bringing them together into a single platform that offers a better user experience is the immediate goal of the Wikimedia Enterprise project.

Fiabilité du système

High-volume reusers tend to use our content in ways that are critical to the function of their services. This means that the reliability of their systems and services depends to some extent on the reliability of ours. Currently, many of our APIs and data services (EventStreams API and Dumps) are not designed with the large-scale use cases of for-profit reusers in mind. For-profit reusers expect not only an extremely high volume of content to be available with high system reliability, but most importantly, a contractual guarantee of that reliability. Wikimedia Enterprise API aims to provide these service guarantees, offering a way for for-profit entities and services to be more confident when incorporating Wikimedia content in business-critical settings.

At this early stage of the project, where long-term direction and success remain uncertain, we are building this service on an externally owned and operated cloud infrastructure (AWS) in conjunction with contracted engineers. This ensures that our own infrastructure, and staff, are not being burdened with or disrupted by the contractual requirement of system reliability that only affects a very small number of for-profit reusers. It also ensures donor money is spent on Wikimedia’s own infrastructure and not used to subsidize major companies’ technical requirements.

High frequency or real-time access

Access to bulk data services in Wikimedia is currently available through our SQL/XML dumps on a fortnightly basis, through HTML scraping directly by the user and querying of Wikimedia APIs. Select immediate updates, such as recent changes, can also be accessed through the EventStreams API. Providing access to the full Wikimedia dataset at a faster cadence would allow content reusers more flexibility in using our data to suit the needs of their specific use case.

Content integrity

For certain types of Wikimedia content, there can be tension between content that is recent and content that has been reviewed by the community. It is sometimes the case that content which is more recent is more susceptible to vandalism, misinformation, or disinformation, in comparison with content that has been exposed to several hours or days of visibility and review from the community.

Depending on their context of reuse, some reusers have a preference for recency (such as a researcher looking to examine the state of a particular project at a specific point in time), whereas others have a preference for accuracy (such as a search engine looking to provide biographical summaries of notable people). Providing a methodology by which content reusers can choose to access the type of content they need is critical to support a wide range of content reuse cases.

Structure

Content reusers already make significant use of the content of all Wikimedia projects, but the way in which each project (and language edition) is used, processed, and integrated by full-corpus reusers is unique in case: different methodologies, different formats, and different frequencies pertain to each case. This is partly the result of the unstructured nature of many of our projects, but is also due to editorial practices and presentation choices fundamentally differing from wiki to wiki. While this aspect makes Wikimedia amazingly useful, it also creates challenges for full-corpus reusers.

Augmenting Wikimedia’s content and data to put additional structure behind our unstructured content will allow content reusers to adapt more easily to their individual requirements, while allowing us to provide more inputs, including attribution, licensing, and content quality—all in one place.

The Wikimedia Enterprise API will not directly affect Wikidata, or the Query Service. Also, at this stage of development, the Enterprise API does not serve data from Wikidata or Wikimedia Commons. While WDQS is a significant service for bulk Wikidata reusers for baselining their knowledge graphs, currently the goals of the Enterprise API are focused on streaming near real time content – which is a different service than WDQS. Eventually, some information that Enterprise API customers currently obtain via the WDQS might now be obtained via the API instead, which may decrease the amount they use the WDQS service.

What will the Enterprise API users be charged for

Based on interviews with current users and potential customers, we are identifying what reusers need most, what they are willing to pay for, and what we are able to deliver. Most elements are likely to be provided as part of a commercial contract known as a Service Level Agreement (SLA) with Wikimedia Enterprise API's users. This contract will relate to things like the frequency of data updates, the reliability [uptime] of the service, and the availability of technical support.

The format of the Wikimedia content provided to Wikimedia Enterprise API customers will be more tailored to the specific needs of large scale usage. This could include the grouping and filtering of multiple [publicly available] API outputs into a single “package” of content, the re-parsing of Wikimedia content into a different output (for example HTML as opposed to Wikitext), and the duration of the contract to access the service.

As this project is still at an early stage, the specific business model and the most appropriate pricing levels are still being investigated. Similarly, we are still exploring the most efficient and effective way for the Wikimedia Foundation Board to provide oversight as part of its standard governance responsibilities.

As we develop and adjust based on feedback, we have produced a list of Wikimedia Enterprise operating principles to guide its activities as a project, and help determine what its for-profit customers can do.

Free access for some users

The Enterprise API is designed to be used by high-volume for-profit customers but will allow for free access by some users, in cases where existing APIs do not meet specific needs. Other technology organizations offer similar kinds of free-access exemptions to their paid services; for example, Github Premium is available gratis to nonprofits.

The Wikimedia Enterprise team is working with Wikimedia Technical Engagement to add free community support through cloud services by June 2021. In the mean time, Access to the Enterprise API services will be provided at no charge in use-cases which are both: highly related to the Wikimedia mission (especially where the use is non-commercial or open-access in nature); requiring high frequency data services not served by the existing APIs or database dumps. Academic research and mission-aligned non-profit services are two such potential cases. Volunteers and researchers wishing to request free access should directly contact the team.

Comment cela sera-t-il structuré juridiquement ?

For now, with the approval of the Wikimedia Foundation Board, the Wikimedia Foundation has set up a single-member, US limited liability company (LLC) to provide these services. The use of subsidiaries by mature non-profits is common even within FLOSS and free knowledge:

  • Creative Commons owns a Canadian subsidiary for contractual purposes when operating in Canada.
  • Mozilla Foundation has a wholly-owned for-profit subsidiary in the form of the Mozilla Corporation focused on revenue generation.
  • Linux Foundation owns a number of wholly-owned subsidiaries for a variety of purposes, including revenue generation through the for-profit provision of training services.
  • Open Data Institute and Open Knowledge Foundation both use for-profit revenue to fund their activities.

Based on advice from the Foundation’s legal team, we settled on the simple LLC structure to allow us to test this service model. This approach should limit startup costs and unnecessarily complex government reporting requirements through the early phases. The LLC structure will also insulate the Foundation from liabilities generated by the service. That said, the Foundation is still required under US law to publicly disclose the LLC’s revenues and expenses in our annual tax filings (find the previous audited financial reports here). As the project matures, we may change the specific legal structure, but we will always retain the same operating principles. This might include moving to a more permanent legally robust structure in the long term. The LLC will operate under the auspices of the Wikimedia Foundation, all its staff will be Wikimedia Foundation staff, and is ultimately subject to the governance of the Wikimedia Foundation Board of Trustees.

Regardless of the legal structure, all Wikimedia Enterprise revenue will unequivocally be used to support the Wikimedia mission—for example, to fund Wikimedia programs or help grow the Wikimedia Endowment.


To facilitate discussion any of the issues raised in this essay, the Wikimedia Enterprise team of will host regular open “office hours” meeting for at least the initial development phase, and will continue to be available through asynchronous communication channels (such as the project’s Meta talkpage).