ウィキメディア・エンタプライズ/よくある質問

From Meta, a Wikimedia project coordination wiki
Jump to navigation Jump to search
This page is a translated version of the page Wikimedia Enterprise/FAQ and the translation is 52% complete.
Wikimedia Enterprise
Wikimedia Enterprise logo.svg
提携関係ならびに料金設定により、新しい機会を創設して収益獲得と無料の知識の普及の両面に役立てる。
enterprise.wikimedia.com
プロジェクトメインページ原則
よくある質問 & 私論 (2021年3月)
技術的事項MediaWiki更新情報PhabricatorGit
API 説明文書

これらWikimedia Enterprise API のよくある質問は2021年3月に公表

以下に述べる内容は、よくある質問に対して意図して短くまとめた回答です。これら主題について詳細および文脈は私論(essay)、方針技術面の解説文書のいずれかをご参照ください。このプロジェクトの旧称はオカピー "Okapi" でした。

全般

これは何ですか?

ウィキペディア・エンタプライズ(Wikimedia Enterprise)とはwholly-owned LLC として、API サービスを介し、第三者によるコンテンツの再利用に関するサービスを手がけます。ウィキメディアのコンテンツを対象に、大量で信頼性の高いアクセスを提供し、商用の組織向けの設計として検索エンジン、音声アシスタンス、技術のスタートアップに対応します。このサービスは有償顧客に対し、契約条件(Service Level Agreements)によって保証されます。

At launch, the service will include the content of all language editions of all Wikimedia sister projects except Wikimedia Commons and Wikidata. Being able to also include Wikidata information is high a priority on the development roadmap. You can learn more about the development progress via the monthly technical updates on the project's MediaWiki.org homepage.

編集者もしくはボット所持者に影響はあるか

ありません。(人間でもボットでも)編集者の編集体験を変更しません。これまでの既成の API 類は引き続き、利用できます。

エンタプライズ API は現状のダンプや API 類に影響を与えるか

無料で提供しているデータベースのダンプならびに API 類は継続し、今後もサポートの対象です。これらは除去対象でも規制を受けることもなく、開発、サポートは今後も続きます。エンタプライズ API を別個のものとして構築する理由のひとつは、既存のエコシステムを 乱さない ことです。

事実として、既存の API のエコシステムそのものは「API ゲートウェイ」(API Gateway)というイニシアティブのもと、改訂を進めています。現在までのところ、ウィキメディアの掲載内容の再利用のために提供する API 類やサービスは、顧客ばかりかさらに 非常に規模の大きな企業体として検索エンジンなどのニーズに対応する必要があります。 This resulted in a practical example of The tragedy of the commons – where making a single service equally available to all, allowed the largest to dominate, which reduced the quality of the service for everyone else. The existence of the Enterprise API, designed for the needs of the larger users, will allow the new API Gateway to be built for individuals and with restrictions against the larger commercial users (e.g. rate limits).

なぜ「エンタプライズ」という名称なのか

The project, team, and API, were previously all called “Okapi”; this was a temporary code name used until a final official name was determined. An okapi is a cute mammal from Africa that conveniently happened to include the letters a-p-i in its name. The name "Wikimedia Enterprise" (and "Enterprise API") is meant to make it clear who the intended users of the service are: for-profit organisations. Important criteria for selecting this name were that it does not imply the content of the API is commercial or exclusive, or that the existing APIs were changing. The phrase "Enterprise API" also appears in the movement strategy and thus it is consistent with previous usage in the movement. Finally, it was important to find a name that did not interfere with any existing names of Wikimedia websites, affiliates, projects, and teams.

The project, and the API, should not be confused with the MediaWiki Stakeholders group or the Enterprise MediaWiki Conference – respectively, an independent Wikimedia affiliate organisation which advocates for the needs of MediaWiki users outside the Wikimedia Foundation including for-profit enterprises, and a conference series for that community.

ウィキメディアの掲載内容に直接の影響はあるか

ありません。当該の API は高速で大量のデータにアクセスを認めてウィキメディアの掲載内容の再利用ができるようにするためにあります。ウィキメディアのプロジェクト群に対して、技術面でも編集面でもなんらコントロールすることは不可能です。もちろん、ウィキメディアの二次使用者(reusers)にはウィキメディアの定める無料の文化ライセンスの枠組みに従って、掲載内容を使った派生の作成物を作ることが認められています。

By accessing Wikimedia content through this new single ingestion method, and by signing a contractual SLA for it, we will be able to ensure that large-scale reusers are more consistent and accurate with the display of attribution and copyright licensing for Wikimedia content. Any reduction in the inadvertent re-publication of vandalism by large-scale reusers benefits the community: it strengthens our community’s reputation for curating reliable content, and it reduces the stakes for those community members dedicated to fighting vandalism. Over time, the Wikimedia Enterprise team hopes to build mechanisms to help reusers reduce the likelihood that they would ingest content vandalism into their products. If this work results in better vandalism detection, any lessons learned and/or code developed will be shared back to the community in order to improve tools and workflows and, by consequence to improve knowledge integrity.

Longer-term, the Wikimedia Enterprise team also hopes to explore methods by which new information (e.g. "microcontributions") can be fed back to Wikimedia projects from the general public who are using products made by the Wikimedia Enterprise customers. This is in accordance with the movement strategy recommendation Improve User Experience which speaks of using APIs for the “...the potential for data returns”. At that time, appropriate community consultation will be made to ensure that such contributions could be sought in response to actual community needs, and in a manner that is compliant with Wikimedia editorial culture, privacy policy, terms of use, etc.

これは検索エンジンの結果にエラーや荒らしが表示されないようにするのか

役には立つはずです。

By making a more consistent Wikimedia content-ingestion process for third-party organisations who operate at high scale and high speed, it will reduce the likelihood that they display vandalism and/or reduce the duration that it is displayed. The API feeds will not include exclusive vandalism detection features unavailable to the public, but it will enable existing signals to be more accessible to our reusers (such as ORES scores and the frequency with which an article is currently receiving edits). This will enable Enterprise's customers to have more tools at their disposal in order to make decisions for what to display and when.

Consistent with the principle of free-cultural-works, the Wikimedia Foundation does not control how reusers display Wikimedia projects' content, what context it is displayed, or with what other datasets it is combined. If you find an instance of Wikimedia content being used in an inappropriate context in a search engine result, its operator will have a procedure for providing feedback about it. By way of example, Google has a policy for "how to report a featured snippet".

運動戦略との関与がわからないが

In the movement strategy recommendations Increase the sustainability of our movement and Improve User Experience there are recommendations to, respectively: “Explore new opportunities for both revenue generation and free knowledge dissemination through partnerships and earned income - for example...Building enterprise-level APIs” and "Make the Wikimedia API suite more comprehensive, reliable, secure and fast, in partnership with large scale users.... and improve awareness of and ease of attribution and verifiability for content reusers."

At the same time, improving our API contributes significantly to our progress moving towards our Strategic Direction and our vision with significant contributions to Knowledge as a Service and Knowledge Equity. In the words of the recommendation making “the Wikimedia API suite more comprehensive, reliable, secure and fast, in partnership with large scale users where that aligns with our mission and principles”, improves “the user experience of both our direct and indirect users, increase the reach and discoverability of our content and the potential for data returns, and improve awareness of and ease of attribution and verifiability for content reusers.”

On top of the two aforementioned recommendations which Enterprise is explicitly connected to, it also has a role to play in several of the Strategy Initiatives. These include: "3. Increased awareness about the Wikimedia Movement", "36. Identify the impact of Wikimedia projects & content" and "misinformation", and "45. Adaptive policies". Many of the strategy recommendations imply increased revenue across the movement: it is an ambitious and ultimately expensive strategy to enact. Therefore, building the Enterprise API over the next several years allows us to develop this new revenue stream which will help to sustainably support the rest of these recommendations. Therefore: also initiative "7. Revenue generation for the movement".

We recognize that in the community vote to prioritize the order that the recommendations should receive movement-wide attention, that these API-specific recommendations were low on the list. We acknowledge and fully expect that the recommendations would not be of popular interest. This is an activity that does not directly impact the editing community. However, this is one of the few recommendations which sits entirely within the responsibility of the WMF to respond to. This means the WMF can start this project immediately and independently to any other strategy activities without interrupting, diverting attention from, or deprioritizing any of the rest.

これまでどこで協議を進めてきたのか

ウィキメディア財団には創設からまもなく有償データサービスがあり、サードパーティーがホストする固有のデータベース運用に向けフィードを提供してきました。このサービス開発は Brion Vibber を雇用するきっかけとなり、サービスにより財団の初期の活動をブートストラップする役割を果たしました。2010年以来、新規顧客募集を終了し、最終的に2014年をもって廃止した主な理由は管理業務に手が回らなくなったからです。

運動の成功を担保する方法を考える時、ウィキメディアのコンテンツを消費者が見つける方法の変化を考慮から外して、大規模なデータサービスを再考するかどうか、調査の道すじとしてはあり得ると2015年ならびに2016年にはメーリングリスト Wikimedia-l で提起がありました。この発想は運動戦略プロセスの第2フェースで作業部会2件が検討し、最終的な戦略勧告においてサードパーティー向け API の使用をめぐり改善作業への言及は2回ありました(12)。エンタプライズ API プロジェクトの作業開始に限定するなら 2020年半ばの Wikimedia-l で提起されています。

注記:ここにあげた「よくある質問」は2021年3月に掲出したものです。当時、ウィキメディアのブログ投稿を公開し、お知らせはオンウィキに加えさまざまなメーリングリストに載せており、主流のメディア – 最も目立つところで ワイアード(WIRED)– に紹介されました。 それを受け、当トークページでコミュニティの議論が実に活発に交わされ、多くのウィキのメインの議論のハブにも、SNS にも波及しました。この話題を取り上げた独立系メディアは、Pressタブに網羅しました。 2021年10月時点のウィキメディア財団記者発表(WMF Press Release)によると、この製品は商用利用も可能であり、オープン・フューチャー研究所による報告書 Open Future institute が公開されました。それに続き、2022年6月に第2報を発表、初期の顧客と自発的に登録するシステムが紹介されています。 プロジェクトに https://enterprise.wikimedia.com/news/という「ニュース」ページを新設し、プロジェクトの今後の発表はそちらに掲載の予定です。

Financial

Is this “selling” or “forcing big tech to pay” for Wikipedia

No. All Wikimedia content is available under free licenses and can be used by anyone for any purpose. That will not and cannot be changed. The Enterprise API service is a new method of delivering that content at a volume and speed designed specifically for the needs of major for-profit organizations that are already using Wikimedia content commercially. The Enterprise API is selling the service of this new method of access, but it does not stop anyone (including those potential customers) from using the existing free methods of access.

Many governments and professional sectors (such as journalism) around the world are currently debating how to build a financially sustainable model while working with "big tech." Building the Wikimedia Enterprise API creates a way for those for-profit organisations that have built business models from the use of freely-available Wikimedia content to also invest in the Wikimedia movement in a reliable and ongoing manner.

Will the community be able to access the Enterprise API without paying

Yes. For bulk access, a copy of the API output is provided via the public database dumps service, updated fortnightly. This is the same frequency that other XML dumps are already provided.

Daily dumps + hourly diffs are provided via the Data Services portal, available to anyone with a Wikimedia cloud services account.

A "trial" version of the live service will also be available via the product's website at no cost. This version is primarily designed to allow potential commercial customers to investigate the service and therefore it has a restricted maximum rate/usage. Nonetheless, it is allowed (and indeed encouraged!) for Wikimedians to register and use this service for themselves too. People with a mission-relevant use-case for the paid version of the service that is not addressed by the above, or by other Wikimedia services, can be provided with ongoing free access.

How will the money be spent

The strategic direction we aim to reach by 2030 requires large-scale expansion into underserved languages around the world, among other goals, and this will require significant revenue growth. Beyond covering for the costs of the project itself, all the funds generated from Enterprise customers will be used to support the Wikimedia mission. This includes investment in the Wikimedia projects, the community, our movement organizations, and the Wikimedia Endowment. In these early days, it is difficult to predict when Wikimedia Enterprise will reach profitability and even more difficult to accurately predict how much profit it will produce over the next few years. Once we have a more clear picture of timing and profitability, the Board of Trustees can plan for how they want to invest the profits to support the mission. That is likely to be at least a year away.

How much money will this raise

Unsurprisingly, this is one of the most important questions from a business-model perspective, and it is also impossible to answer in advance. Significant research has been undertaken to learn what the Enterprise API's potential customers need and want, which has informed the product development and, consequently, the estimates of potential revenue over time. One thing is clear: This will not replace our need to be funded by reader donations. In accordance with the Wikimedia Enterprise operating principle of financial independence and associated Wikimedia Foundation Board Statement on Wikimedia Enterprise revenue principles, unrelated business income from Wikimedia Enterprise and other sources will not exceed 30% of the Wikimedia Foundation's total revenue. That means that at least 70% of funding will always come from donations and grants etc.

In accordance with the Wikimedia Enterprise operating principle of honesty and transparency we will publish overall revenue and expenses, differentiated from those of the Wikimedia Foundation in general, at least annually. Furthermore, as per the Wikimedia Foundation Board's statement, it will be notified in advance of all agreements expected to generate revenue in excess of $250,000 USD annually, allowing time for any concerns to be raised. This is consistent with how the Wikimedia Foundation treats large corporate donations.

As per the project's financial goals that were initially defined during the development-phase, the 2021-22 Annual Plan predicts "$10.2 million in contractual revenue and approximately $3.6 million in expense for Wikimedia Enterprise...".

Will this affect fundraising donations

No, the Wikimedia Foundation will continue to receive the vast majority of its support from readers. We believe this is important in order for Wikipedia to remain independent. Funding derived from millions of reader donations averaging $15 aligns us to the public interest. Revenue from Wikimedia Enterprise will supplement our reader support, but it will not eclipse it. The Enterprise API is a way for the corporate users who already profit from their reuse of Wikimedia content to contribute to the projects, as well.

Technical

Is it Open Source

Yes. Here it is: https://github.com/wikimedia/OKAPI

Why are you using externally-operated cloud infrastructure/AWS

A major need for Wikimedia Enterprise is to have the ability to rapidly prototype and build solutions that could scale to the needs of the Enterprise API's intended customers. To do this, we have optimized for fast iteration, infrastructural separation from critical Wikimedia projects, and utilization of downstream Service Level Agreements (SLAs). At the start, external cloud services provide us with these capabilities. While there are many advantages of using an external cloud for our use case, we acknowledge there are also fundamental tensions,  given the culture and principles of how applications are built at the Foundation. The needs of the Enterprise API's potential customers are important for achieving our mission of making knowledge available to all people. However, using the Wikimedia Foundation's existing resources to develop products to respond to those needs would subsidize the hardware requirements of some of the world's largest for-profit organizations.

The Wikimedia Enterprise API is hosted on Amazon Web Services (AWS) – a very commonly used system for this kind of purpose. Nonetheless, it is not contractually, technically, or financially bound to use AWS infrastructure. We are storing publicly available Wikimedia content, general logging data, and lightweight usage data on AWS. We are looking to provide Service Level Agreements (SLAs) to customers with guarantees similar to those of Amazon. We don't have equivalent uptime information from the Wikimedia Foundation's existing infrastructure. However, this is something we are exploring with Wikimedia Site Reliability Engineering.

In the meantime, we are researching alternatives to AWS (and remain open to ideas that might fit our use case) when this project is more established, and we are confident in knowing what the infrastructure needs are in reality. Meanwhile, the WMF hosting infrastructure remains wholly owned, independent, and unaffected by the Enterprise API.

技術面の解説文書は次をご参照ください。mw:Special:MyLanguage/Wikimedia Enterprise#Application Hosting

なぜサイトのアドレス末尾が .com なのか

The homepage of the service is enterprise.wikimedia.com, rather than .org like other websites operated by the Wikimedia Foundation, for the following reasons:

1) Data Privacy and Security Boundaries. DNS domains act as technical boundaries for policies on data privacy and security. Since Wikimedia Enterprise operates on separate infrastructure, with separate policies and controls, it is more secure to not blur any of these technical boundaries by hosting Wikimedia Enterprise on a domain such as "wikimedia.org" where the Wikimedia Foundation operates existing sites. The Wikimedia Foundation does not operate any other sites within "wikimedia.com", so this provides a clean boundary.

2) Authenticity. It is permitted for a for-profit project owned by a non-profit organisation to use a .org domain. However, the Wikimedia Enterprise team felt that it is more accurate and honest that the website should be .com since it is a for-profit project.

At present the DNS for all of "wikimedia.com", including "enterprise.wikimedia.com", is served by the Wikimedia Foundation's DNS servers. We are aware this creates a Service-level agreements (SLAs) dependency issue that will need to be resolved before Wikimedia Enterprise can offer SLAs to customers. The plan for achieving DNS independence for Wikimedia Enterprise is a work in progress.

ウィキデータ、あるいはウィキデータの問い合わせサービス(クエリ)に与える影響とは

ウィキメディア・エンタプライズ API(Wikimedia Enterprise API)がウィキデータ、あるいはウィキデータのクエリサービス(WDQS=Wikidata Query Service)に直接与える影響はありません。また、開発のこの段階では、同 API はウィキデータからデータを採取しません(ウィキメディア・コモンズも同様。)WDQS は確かに知識の図式化においてベースラインを引くため、ウィキデータを大量二次使用する顧客にとって重要なサービスではありますが、現状で同 API の目標はコンテンツをほぼリアルタイムにストリーム配信することであって、WDQS とは別のサービスです。現状で顧客がある程度の情報を WDQS 経由で入手しているとして、それらはゆくゆくは API 経由で手に入るようになり、すると WDQS サービスの利用率が低減するかもしれません。

なぜ組織内で構築しないのか

All of the Enterprise API's initial potential customers are already using Wikimedia content in their products to varying degrees. Independently of each other, they invest in extracting, restructuring, and standardizing our content for their needs. However, what they cannot do internally is ensure the speed, consistency, and reliability of how Wikimedia services provide that content. This is something only the Wikimedia Foundation can provide. Furthermore, by providing a product available for any customer, the Enterprise API makes a level playing field for smaller businesses wishing to use Wikimedia content in their services, but which do not have internal resources of their larger competitors to do the necessary data conversions.

法務

なぜ下部組織が運営するのか

財団では単一会員を保つ有限責任会社(LLC)を設立、この法人(LLC)がエンタプライズ API の顧客と契約を交わします。LLC の構造により財団はサービス提供がもたらす法的責任から隔離されます。この方式は非営利活動組織(NPO)が営利活動を実施する場合の標準的な取り組み方であり、私たちにはリスク管理と透明性の促進の両面で助けになります。その論点に立ち、財団はアメリカ合衆国法のもと当該 LLC の収支報告を公開する義務を負い、各年度の納税申告に付記します(詳細はこちらの会計監査報告書をご参照ください。)LLC の活動はウィキメディア財団が後援し、職員は両組織に籍を置き、究極にはウィキメディア財団理事会の組織統治の対象となります。LLC のプロジェクトを監督する理事会はウィキメディア財団(WMF)の幹部で構成し、それぞれ WMF 職員としての職掌を代表しており、LLC の「会長」は WMF 事業開発責任者(Business Development manager)が兼務します。

ウィキメディア財団とこの新設された LLC(組織統治ウィキ配下)の法的な関係は、契約として閲覧できます。それぞれ、次のように規定されます。

  • LLC 運営契約は正式に LLC ならびに WMF のみをその契約者とする
  • ライセンスに関する社内契約(Inter-company license agreement)は LLC に付与されるウィキメディアの商標利用権ほかについて説明
  • 経費分担契約(Cost-sharing agreement)は LLC の収益ならびに経費を WMF とどのように精算するか説明

LLC の登記情報は次の実体名で、アメリカ合衆国デラウェア州企業局(State of Delaware, Division of Corporations)に掲載されています。Wikimedia, LLC, File number: 7828447。アメリカ合衆国では法的実体の登録にデラウェア州を選択することが一般化しており、すなわちデラウェア州が規定する法人法が緻密に作られ理解しやすいためです。ウィキメディア・エンタプライズの経営に LLC を用いることにより、ウィキメディア財団を露出から隔離する助けになります。デラウェア州法人法の明確さによりその目的を促進し、かつ短期的にも長期的にも法務コストを低減します。

ウィキメディア財団の監査をつかさどるKPMGによって、LLC 活動に関する適切な税務処理の評価を受けました。

「顧客」とは誰のことか

エンタプライズ API は本来は ごく少数の、世界屈指の規模で資金が潤沢な技術系組織のニーズを対象に設計され、一般に「技術大手」(Big Tech)と呼ばれています。独占契約あるいは独占のコンテンツは存在しないため、この製品開発から、より規模が小さくNPO(非営利活動団体)にもウィキメディアのコンテンツを各自の製品に使う道を開き役立つ可能性があります。この点はこのプロジェクトに関する Open Futures プロジェクトの構想の解説にも「活用の場を身近にする」と述べてあり、さらに(よくある質問の)戦略の方向性に示した「サービスとしての知識」の議論とも首尾一貫しています。

商業界のさまざまなセクターを横断して多くの組織に聞き取り調査を行ったところ、「大手技術系」ばかりではなく可能性としてもっと「多く」の顧客がありそうだとわかりました。そのニーズに差異があり、需要に応じるには時間につれて製品のロードマップを調整し – ウィキメディアの複雑な情報エコシステムの統合と把握しやすさをより手軽に実現することに注力します(例えば情報の信頼性徴候credibility signals および部分集合 subsets - ロードマップをご参照)。

ウィキペディア財団の声明によると、年間の収益が25万アメリカドル相当を超過する顧客候補は、全て事前に通知を受領するものとし、何か懸案がある場合に提言できる時間を確保するとしています。これはウィキメディア財団が大口の企業献金を受けた場合の取り扱いと矛盾しません。アメリカ合衆国に本拠を置く組織として、法律により特定の敵性国家に本拠を置く組織と事業を行うことは認められず、その規制は外国資産管理局(OFAC、アメリカ財務省傘下)が定めるところに依拠します。

契約要件とは

一般に顧客契約というものの示す規約には契約期間、顧客サポートの種別、システムの予測稼働時間、料金、紛争解決の仕組み、文脈に適切な帰属表示の保証の規程に加え(無料の文化に関するコンテンツの基本ライセンスを守ったとしても)禁止事項として競合する事業を遂行する目的で API を再利用することがあげられます。詳細は原則の解説(principles document)に示したとおり、本契約はコンテンツの独占も排他的なアクセスも与えず、非公開データ・利用者データを開示せず、編集に影響を与えることを認めません。さらに掲載したコンテンツがそのコピーレフトのライセンスに反するような利用を規制することも含みません。