Voici quelques réponses délibérément courtes à des questions courantes. Pour plus de détails et de contexte sur l'un de ces sujets, veuillez lire l'essai, les principes et la documentation technique. Ce projet était auparavant connu sous le nom d’« Okapi ».
Qu’est-ce que Wikimedia Enterprise ?
Wikimedia Enterprise est avant tout une interface de programmation (API) pour le contenu Wikimedia. Elle est conçue pour les exigences des très grandes organisations qui utilisent le contenu Wikimedia dans leurs services commerciaux et qui ont des besoins considérables en termes de volume, rapidité, et fiabilité de service. Ce service sera accompagné d'une garantie contractuelle (un « Service Level Agreement » ou SLA) pour les clients payants.
The service include the content of all language editions of all Wikimedia sister projects except Wikimedia Commons and Wikidata. Including Wikidata information is high a priority on the development roadmap. You can learn more about the development progress via the monthly technical updates on the project's MediaWiki.org homepage.
Specifically, the API service is provided in three products:
- Snapshot: Retrieve an entire Wikimedia project, updated daily
- On-Demand: Retrieve any article from any Wikimedia project at anytime
- Realtime: Stream real-time updates from any Wikimedia project
Cela m'affectera-t-il en tant que contributeur ou dresseur de robot ?
Non. Cela ne changera pas l'expérience de contribution en tant qu’utilisateur (par des humains ou par des robots). Toutes les API actuelles seront toujours disponibles.
L'API Enterprise affectera-t-elle les sauvegardes et les API actuelles ?
The system of freely-provided database dumps and APIs remains in place and continues to be supported. They are not being removed or restricted and will continue to receive support and development. Part of the reason the Enterprise API is being built separately is in order not to disrupt the existing ecosystems.
In fact, the existing API ecosystem is currently being revamped under the "API Gateway" initiative. Until now, the APIs and services provided for re-use of Wikimedia content had to be able to support the needs of individuals and also extremely large companies such as search engines.
This resulted in a practical example of The tragedy of the commons – where making a single service equally available to all, allowed the largest to dominate, which reduced the quality of the service for everyone else. The existence of the Enterprise API, designed for the needs of the larger users, will allow the new API Gateway to be built for individuals and with restrictions against the larger commercial users (e.g. rate limits).
Pourquoi ce projet s'appelle-t-il « Entreprise » ?
Le projet, l'équipe et l'API s'appelaient auparavant tous « Okapi » ; il s'agissait d'un nom de code temporaire utilisé jusqu'à ce qu'un nom officiel définitif soit déterminé. Un okapi est un mammifère d'Afrique dont le nom contient par hasard les lettres « A P I ». Le nom « Wikimedia Enterprise » (et « API Enterprise ») est destiné à indiquer clairement qui sont les utilisateurs prévus du service : les organisations commerciales. Les critères importants pour la sélection de ce nom étaient qu'il n'implique pas que le contenu de l'API soit commercial ou exclusif, ou que les API existantes étaient en train de changer. L'expression « API Enterprise » apparaît également dans la stratégie de mouvement et est donc cohérente avec l'utilisation précédente dans le mouvement. Enfin, il était important de trouver un nom qui n'interférait pas avec les noms existants des sites web, de groupes affiliés, de projets et d’équipes de Wikimedia.
Le projet et l'API ne doivent pas être confondus avec le groupe « MediaWiki Stakeholders group », une organisation affiliée indépendante de Wikimedia qui défend les besoins des utilisateurs de MediaWiki en dehors de la Wikimedia Wikimedia, y compris les entreprises commerciales. Wikimedia Enterprise est également différent d’« Enterprise MediaWiki Conference », une série de conférences pour cette communauté.
Cela affectera-t-il directement le contenu de Wikimedia ?
Non. L'API permet un accès et une réutilisation en masse et à haut débit du contenu Wikimedia. Elle n'a aucun contrôle technique ou éditorial sur le contenu des projets Wikimedia. Bien sûr, conformément aux droits accordés dans le cadre de la licence libre de Wikimedia, les réutilisateurs de Wikimedia sont autorisés à créer des œuvres dérivées à partir du contenu.
By accessing Wikimedia content through this new single ingestion method, and by signing a contractual SLA for it, we will be able to ensure that large-scale reusers are more consistent and accurate with the display of attribution and copyright licensing for Wikimedia content. Any reduction in the inadvertent re-publication of vandalism by large-scale reusers benefits the community: it strengthens our community’s reputation for curating reliable content, and it reduces the stakes for those community members dedicated to fighting vandalism. Over time, the Wikimedia Enterprise team hopes to build mechanisms to help reusers reduce the likelihood that they would ingest content vandalism into their products. If this work results in better vandalism detection, any lessons learned and/or code developed will be shared back to the community in order to improve tools and workflows and, by consequence to improve knowledge integrity.
Will this stop errors/vandalism from appearing in search engine results
It will help.
By making a more consistent Wikimedia content-ingestion process for third-party organisations who operate at high scale and high speed, it will reduce the likelihood that they display vandalism and/or reduce the duration that it is displayed. The API feeds will not include exclusive vandalism detection features unavailable to the public, but it will enable existing signals to be more accessible to our reusers (such as ORES scores and the frequency with which an article is currently receiving edits). This will enable Enterprise's customers to have more tools at their disposal in order to make decisions for what to display and when.
Consistent with the principle of free-cultural-works, the Wikimedia Foundation does not control how reusers display Wikimedia projects' content, what context it is displayed, or with what other datasets it is combined. If you find an instance of Wikimedia content being used in an inappropriate context in a search engine result, its operator will have a procedure for providing feedback about it. By way of example, Google has a policy for "how to report a featured snippet".
Comment cela se rapporte-t-il à la stratégie du mouvement
In the movement strategy recommendations Increase the sustainability of our movement and Improve User Experience there are recommendations to, respectively: “Explore new opportunities for both revenue generation and free knowledge dissemination through partnerships and earned income - for example...Building enterprise-level APIs” and "Make the Wikimedia API suite more comprehensive, reliable, secure and fast, in partnership with large scale users.... and improve awareness of and ease of attribution and verifiability for content reusers."
At the same time, improving our API contributes significantly to our progress moving towards our Strategic Direction and our vision with significant contributions to Knowledge as a Service and Knowledge Equity. In the words of the recommendation making “the Wikimedia API suite more comprehensive, reliable, secure and fast, in partnership with large scale users where that aligns with our mission and principles”, improves “the user experience of both our direct and indirect users, increase the reach and discoverability of our content and the potential for data returns, and improve awareness of and ease of attribution and verifiability for content reusers.”
On top of the two aforementioned recommendations which Enterprise is explicitly connected to, it also has a role to play in several of the Strategy Initiatives. These include: "3. Increased awareness about the Wikimedia Movement", "36. Identify the impact of Wikimedia projects & content" and "misinformation", and "45. Adaptive policies". Many of the strategy recommendations imply increased revenue across the movement: it is an ambitious and ultimately expensive strategy to enact. Therefore, building the Enterprise API over the next several years allows us to develop this new revenue stream which will help to sustainably support the rest of these recommendations. Therefore: also initiative "7. Revenue generation for the movement".
We recognize that in the community vote to prioritize the order that the recommendations should receive movement-wide attention, that these API-specific recommendations were low on the list. We acknowledge and fully expect that the recommendations would not be of popular interest. This is an activity that does not directly impact the editing community. However, this is one of the few recommendations which sits entirely within the responsibility of the WMF to respond to. This means the WMF can start this project immediately and independently to any other strategy activities without interrupting, diverting attention from, or deprioritizing any of the rest.
Où cela a-t-il été discuté précédemment ?
The Wikimedia Foundation has offered paid data services since shortly after its inception, providing feeds to enable third-parties to host their own local databases. The creation of this service was what led to the initial hire of Brion Vibber, and was used to bootstrap the Wikimedia Foundation in the early years. The service was closed to new customers in 2010 and the service was finally decommissioned in 2014 mainly due to lack of maintenance.
Revisiting large-scale data services to help ensure the success of the movement, irrespective of changing discovery methods of Wikimedia content, was discussed as a possible avenue for exploration in 2015 and again on Wikimedia-l in 2016. The idea was put forward by two working groups during phase 2 of the movement strategy process, and work on improving third-party API usage was identified twice in the final strategic recommendations (1, 2). The start of work on the Enterprise API project specifically was raised on Wikimedia-l in mid-2020.
Note: This FAQ was published in March 2021. At that time a Wikimedia blogpost was published, notices were placed in various mailinglists and on wiki, and many mainstream media stories covered it–most notably in WIRED. This resulted in significantly more community discussion on this talkpage, on central discussion hubs on many wikis, and in social media. A comprehensive list of independent media which wrote stories about this topic is published at Wikimedia_Enterprise#Press. In October 2021 a WMF Press Release announcing the product is commercially available, and a report by the Open Future institute were published. This was followed in June 2022 with another Press Release announcing the project's first customers and the self-signup system. A new "news" page for the project has been created at https://enterprise.wikimedia.com/news/ and all future announcements from the project will be published there.
Is this “selling” or “forcing big tech to pay” for Wikipedia
No. All Wikimedia content is available under free licenses and can be used by anyone for any purpose. That will not and cannot be changed. The Enterprise API service is a new method of delivering that content at a volume and speed designed specifically for the needs of major for-profit organizations that are already using Wikimedia content commercially. The Enterprise API is selling the service of this new method of access, but it does not stop anyone (including those potential customers) from using the existing free methods of access.
Many governments and professional sectors (such as journalism) around the world are currently debating how to build a financially sustainable model while working with "big tech." Building the Wikimedia Enterprise API creates a way for those for-profit organisations that have built business models from the use of freely-available Wikimedia content to also invest in the Wikimedia movement in a reliable and ongoing manner.
Will the community be able to access the Enterprise API without paying
Yes. For bulk access, a copy of the API output is provided via the public database dumps service, updated fortnightly. This is the same frequency that other XML dumps are already provided.
A "trial" version of the live service will also be available via the product's website at no cost. This version is primarily designed to allow potential commercial customers to investigate the service and therefore it has a restricted maximum rate/usage. Nonetheless, it is allowed (and indeed encouraged!) for Wikimedians to register and use this service for themselves too. People with a mission-relevant use-case for the paid version of the service that is not addressed by the above, or by other Wikimedia services, can be provided with ongoing free access.
Comment l'argent sera-t-il dépensé ?
The strategic direction we aim to reach by 2030 requires large-scale expansion into underserved languages around the world, among other goals, and this will require significant revenue growth. Beyond covering for the costs of the project itself, all the funds generated from Enterprise customers will be used to support the Wikimedia mission. This includes investment in the Wikimedia projects, the community, our movement organizations, and the Wikimedia Endowment. All revenue received via Enterprise customers is treated in the same way as other unrestricted revenue received by the Wikimedia Foundation. That is, the revenue goes into the same "pot" as donations via emails, or fundraising banners, and is allocated according to the Wikimedia Foundation Annual Plan.
In these early days, it is difficult to predict when Wikimedia Enterprise will reach profitability and even more difficult to accurately predict how much profit it will produce over the next few years. Once we have a more clear picture of timing and profitability, the Board of Trustees can plan for how they want to invest the profits to support the mission. That is likely to be at least a year away.
Combien d'argent cela va-t-il rapporter ?
The first financial report is published here. This covers the 2022 calendar-year, which represents the project's first year in business.
Unsurprisingly, this is one of the most important questions from a business-model perspective, and it is also impossible to answer in advance. Significant research has been undertaken to learn what the Enterprise API's potential customers need and want, which has informed the product development and, consequently, the estimates of potential revenue over time. One thing is clear: This will not replace our need to be funded by reader donations. In accordance with the Wikimedia Enterprise operating principle of financial independence and associated Wikimedia Foundation Board Statement on Wikimedia Enterprise revenue principles, unrelated business income from Wikimedia Enterprise and other sources will not exceed 30% of the Wikimedia Foundation's total revenue. That means that at least 70% of funding will always come from donations and grants etc.
In accordance with the Wikimedia Enterprise operating principle of honesty and transparency we will publish overall revenue and expenses, differentiated from those of the Wikimedia Foundation in general, at least annually. Furthermore, as per the Wikimedia Foundation Board's statement, it will be notified in advance of all agreements expected to generate revenue in excess of $250,000 USD annually, allowing time for any concerns to be raised. This is consistent with how the Wikimedia Foundation treats large corporate donations.
As per the project's financial goals that were initially defined during the development-phase, the 2021-22 Annual Plan predicts "$10.2 million in contractual revenue and approximately $3.6 million in expense for Wikimedia Enterprise...".
Cela affectera-t-il les dons pour la collecte de fonds ?
No, the Wikimedia Foundation will continue to receive the vast majority of its support from readers. We believe this is important in order for Wikipedia to remain independent. Funding derived from millions of reader donations averaging $15 aligns us to the public interest. Revenue from Wikimedia Enterprise will supplement our reader support, but it will not eclipse it. The Enterprise API is a way for the corporate users who already profit from their reuse of Wikimedia content to contribute to the projects, as well.
Is it Open Source
Yes. It is published as "stable versions".
The specific purpose of this API's code, and this service, is to be useful to very large commercial organizations and their unique infrastructural, legal, and metadata requirements. Those organizations are unique not just in the sense of sheer scale but also unique relative to each other: with their own mutually-incompatible way of dealing with similar problems. Given that one of the stated principles of the project is "no exclusivity" of the API - either by contract, or by features - we need to ensure that no user (free or paid) should be unintentionally excluded from being able to use it. Therefore, it was considered preferable to publish stable versions. This ensures that no one builds upon, or has expectations for, code that is not yet fit for everyone's purposes.
Meanwhile, all of the development work itself is tracked as per Wikimedia standard practice publicly and “live” on phabricator.
Why are you using externally-operated cloud infrastructure/AWS
A major need for Wikimedia Enterprise is to have the ability to rapidly prototype and build solutions that could scale to the needs of the Enterprise API's intended customers. To do this, we have optimized for fast iteration, infrastructural separation from critical Wikimedia projects, and utilization of downstream Service Level Agreements (SLAs). At the start, external cloud services provide us with these capabilities. While there are many advantages of using an external cloud for our use case, we acknowledge there are also fundamental tensions, given the culture and principles of how applications are built at the Foundation. The needs of the Enterprise API's potential customers are important for achieving our mission of making knowledge available to all people. However, using the Wikimedia Foundation's existing resources to develop products to respond to those needs would subsidize the hardware requirements of some of the world's largest for-profit organizations.
The Wikimedia Enterprise API is hosted on Amazon Web Services (AWS) – a very commonly used system for this kind of purpose. Nonetheless, it is not contractually, technically, or financially bound to use AWS infrastructure. We are storing publicly available Wikimedia content, general logging data, and lightweight usage data on AWS. We are looking to provide Service Level Agreements (SLAs) to customers with guarantees similar to those of Amazon. We don't have equivalent uptime information from the Wikimedia Foundation's existing infrastructure. However, this is something we are exploring with Wikimedia Site Reliability Engineering.
In the meantime, we are researching alternatives to AWS (and remain open to ideas that might fit our use case) when this project is more established, and we are confident in knowing what the infrastructure needs are in reality. Meanwhile, the WMF hosting infrastructure remains wholly owned, independent, and unaffected by the Enterprise API.
Pourquoi est-ce un site Web en .com ?
The homepage of the service is enterprise.wikimedia.com, rather than .org like other websites operated by the Wikimedia Foundation, for the following reasons:
- Data Privacy and Security Boundaries. DNS domains act as technical boundaries for policies on data privacy and security. Since Wikimedia Enterprise operates on separate infrastructure, with separate policies and controls, it is more secure to not blur any of these technical boundaries by hosting Wikimedia Enterprise on a domain such as "wikimedia.org" where the Wikimedia Foundation operates existing sites. The Wikimedia Foundation does not operate any other sites within "wikimedia.com", so this provides a clean boundary.
- Authenticity. It is permitted for a for-profit project owned by a non-profit organisation to use a .org domain. However, the Wikimedia Enterprise team felt that it is more accurate and honest that the website should be .com since it is a for-profit project.
How will this affect Wikidata or the Wikidata Query Service
The Wikimedia Enterprise API will not directly affect Wikidata, or the Wikidata Query Service (WDQS). Also, at this stage of the development, the Enterprise API does not serve data from Wikidata (or Wikimedia Commons). While WDQS is a significant service for bulk Wikidata reusers for baselining their knowledge graphs, currently the goals of the Enterprise API are focused on streaming near real-time content, which is a different service than WDQS. Eventually, some information that Enterprise API customers currently obtain via WDQS might now be obtained via the API instead, which may decrease the amount they use the WDQS service.
Pourquoi ne construisent-ils pas cela pour eux-mêmes
All of the Enterprise API's initial potential customers are already using Wikimedia content in their products to varying degrees. Independently of each other, they invest in extracting, restructuring, and standardizing our content for their needs. However, what they cannot do internally is ensure the speed, consistency, and reliability of how Wikimedia services provide that content. This is something only the Wikimedia Foundation can provide. Furthermore, by providing a product available for any customer, the Enterprise API makes a level playing field for smaller businesses wishing to use Wikimedia content in their services, but which do not have internal resources of their larger competitors to do the necessary data conversions.
What are Credibility Signals
This is the name for a feature of the API dataset. It helps to make the contextual information that Wikimedia editors use to make their editorial decisions more understandable within an API feed. For a full essay documenting the nature and purpose of this, see its dedicated documentation page.
When reviewing edit histories of articles editors will often take note of many factors. These include as whether an article has: suddenly received a lot of new edits; been recently edited by several newly-created accounts or by different ‘anonymous’ editors; had frequent edits being reverted; had its protection status or quality rating changed; or suddenly received more pageviews than normal. Credibility Signals is the turning of this contextual information and into data points in the API which third-party reusers can apply to make their own decisions about how to treat new revisions – in real time. For example: if an article is identified as being related to “breaking news” some reusers may wish to respond to this “signal” by updating their copy as rapidly as possible while others may wish to temporarily pause their updates.
This feature is not scores, filters, rankings, or value judgements of “good vs. bad edits”. Nor is it an AI making decisions about content accuracy, truth or quality. A full list of fields can be drawn from in the creation of any given “signal” is listed at https://www.mediawiki.org/w/api.php In advancing the strategic goal of "Knowledge as a service", we hope that Credibility Signals will both expand the number of third parties which incorporate real-time information from Wikimedia, lower the barrier to entry of doing so, and decrease the instances in which vandalism etc. proliferates and in turn engender trust in Wikimedia content and the movement’s work.
What is Breaking News
Just like "Credibility signals", this is the name for a a feature of the API. It identifies new and likely "newsworthy" events as they are being written about across Wikipedia language editions at any given moment. These events are then marked with a boolean field, allowing API users to easily identify this kind of content within their copy of the dataset. For a full essay documenting the nature and purpose of this, and how you can access it yourself, see its dedicated documentation page.
Consistent with the Wikimedia Enterprise principles (in particular that of "no exclusive content") the information this API feature is built upon is already public information that Wikimedia editors are already commonly using in their content moderation workflows - for example "does this article have a sudden increase in the number of pageviews, or of unique editors?", "was this article recently created/moved and have a 'current event' template?". The feature turns that kind of information into a feed of articles which API users can treat differently, if they wish. For example to re-index these articles more rapidly, or to pause re-indexing entirely until the content becomes more stable. This feature does not imply a change of Wikipedia editorial policy: most particularly regarding notability, reliable sources - summarised in the English Wikipedia policy "NOTNEWS".
What is Structured Contents
Within the Enterprise API suite, Structured Contents refers to the features making Wikimedia data more machine readable: these efforts are both focussed on pre-parsing Wikipedia snippets as well as connecting the different projects closer together. The Structured Contents endpoint (beta) has been released in September 2023 to the On-demand API, as a means to both facilitate more frequent updates as well as improve transparency in the development process. This first version includes pre-parsed Wikipedia summaries, main images and infoboxes. Features will be continued to be added to this endpoint, see the Mediawiki Updates page.
Pourquoi est-ce géré par une filiale
The Wikimedia Foundation has created a single-member limited liability company (LLC), and it is this LLC that will sign contracts with the Enterprise API's customers. The LLC structure will insulate the Foundation from liabilities generated by the service. This is a standard approach when a non-profit organization operates a for-profit activity, and will help us both manage risk and promote transparency. That said, the Foundation is still required under US law to publicly disclose the LLC’s revenues and expenses in our annual tax filings (view the audited financial reports here). The LLC operates under the auspices of the Wikimedia Foundation, its staff are Wikimedia Foundation staff, and is ultimately subject to the governance of the Wikimedia Foundation (WMF) Board of Trustees. The board of the LLC overseeing the project are from Wikimedia Foundation leadership, representing their WMF staff roles, and the LLC's "president" is the Business Development manager of the WMF.
You can view the contracts which form the legal relationship between the Wikimedia Foundation and this LLC at the Governance Wiki. Respectively, they are the:
- LLC operating agreement which formally establishes the LLC and the WMF as its sole member
- Inter-company license agreement which discusses the right of the LLC to use Wikimedia trademarks etc.
- Cost-sharing agreement which discusses how the LLC's revenues and expenses are accounted for with the WMF
The LLC's legal registration can be found at the State of Delaware, Division of Corporations, Entity name: Wikimedia, LLC, File number: 7828447. In the United States, Establishing a legal entity in the State of Delaware is common because the body of corporate law in Delaware is well-developed and easily understood. Using the LLC to operate Wikimedia Enterprise will help insulate the Wikimedia Foundation from exposure. The clarity of Delaware corporate law furthers that objective and also reduces legal costs in both the short and long term. As a non-profit, the LLC being registered in the state of Delaware creates no difference in our United States federal tax liability, or financial transparency in tax reporting requirements, than being registered in any other state.
The assessment of appropriate tax treatment of the LLC activities has been coordinated with the Wikimedia Foundation auditors KPMG.
Qui seront les « clients » ?
The Enterprise API has been initially designed for the needs of a very small number of technology organisations who are some of the world’s largest and richest companies, commonly referred to as "Big Tech". As there will be no exclusive contracts nor exclusive content, developing this product will also help provide the ability for smaller for-profit organizations to benefit from the use of Wikimedia content in their products. This is also described by the Open Futures project's description of this project as "lowering the playing field", and is consistent with the Strategic Direction's discussion of "Knowledge as a Service".
By interviewing many organisations across many commercial sectors, it became clear that there are many more potential customers than just "big tech". Their needs are different and so the product roadmap will be adjusted over time to meet this demand – focusing on making it easier to integrate and understand Wikimedia's complex ecosystem of information (through things like credibility signals and subsets of information – see roadmap).
As stated in the original press release, Google and the Internet Archive are the first to receive paid and free access (respectively) but we have not publicized the subsequent customers (paid or free) who have signed up to the service. Maintaining a public and comprehensive list of paying and free/trial customers would look like advertising or promotion of those customers. It would also introduce a new privacy (and potentially security) problem: In the same way that it would be inappropriate to make a public list of "all individuals who have used the Wikidata Query Service this month" (for example), making a public list of all organizations who have used this service goes against our privacy culture. Nonetheless, we do intend to be making "use case" blog posts, which will describe how some users (either general categories or individual cases with their permission) are benefiting from the service in the real world. No one is required to publish whether, or how, they read or reuse Wikimedia content. This is consistent with that practice.
As per the Wikimedia Foundation Board's statement, it will be notified in advance of all potential customers expected to generate revenue in excess of $250,000 USD annually, allowing time for any concerns to be raised. This is consistent with how the Wikimedia Foundation treats large corporate donations. As an organization based in the USA, it is legally not allowed to do business with organizations based in certain proscribed countries, as determined by the Office of Foreign Assets Control.
What is in the contracts
Customer contracts will typically include terms governing the duration of the engagement, the type of customer support and uptime expected, the cost, mechanisms for resolving disputes, assurances on context-appropriate attribution and licensing information, and restrictions on reusing the API to create a competing business (while affirming the content’s underlying free culture license). As described in the principles document, the contract will not grant exclusive content, exclusive access, private/user data, or editorial influence; and it will not include restrictions on how the content can be used which are contrary to the copyleft licenses of the content itself.