Wikimedia Enterprise/Essay
Wikimedia Enterprise | |
---|---|
enterprise.wikimedia.com | |
Descripción |
|
Técnica | |
fue escrito por el equipo Enterprise y representa sus opiniones y las de la Fundación Wikimedia. Publicado en Marzo de 2021.
En inglés la palabra “free” tiene dos significados, que en español son dos palabras muy diferentes: Libre y Gratis.
Los proyectos Wikimedia son, han sido y serán siempre libres. Los principios de las obras culturales libres significan que cualquiera puede usar Wikimedia sin restricciones, incluso comercialmente. Como movimiento, aceptamos esto. Por eso rechazamos las licencias "no comerciales", ya que limitarían los tipos de reutilización posibles. Y por eso consideramos que la reutilización comercial es un medio importante para distribuir el conocimiento al público.
Igualmente, los proyectos Wikimedia son, han sido y serán siempre gratuitos. La capacidad de acceder libremente al conocimiento disponible en todos los proyectos de Wikimedia siempre ha sido el núcleo de la misión de la Fundación y del movimiento. Proporcionamos este acceso no sólo a las personas que visitan nuestros sitios web, sino también a las máquinas de forma programada para que nuestro contenido pueda ser reutilizado en otros entornos. El corpus completo del contenido de Wikimedia siempre ha estado, y continuará estando, disponible para su reutilización en varias formas (incluyendo, pero no limitándose, a volcados de bases de datos, APIs y scraping) sin costo alguno.
Como resultado, nuestro contenido es reutilizado con frecuencia por organizaciones comerciales que dependen de él para apoyar sus modelos de negocio, y que, en consecuencia, obtienen ingresos de él. Fuera de las donaciones corporativas voluntarias a la Fundación Wikimedia, el movimiento nunca ha recibido beneficios de ninguno de estos ingresos a través de la inversión de retorno. En reconocimiento de esto, bajo el título de Aumentar la sostenibilidad de nuestro movimiento, el proceso de la Estrategia del Movimiento pidió a la Fundación Wikimedia que explorara, entre otras cosas, "APIs a nivel empresarial... modelos para los reutilizadores comerciales a escala empresarial, teniendo cuidado de evitar las dependencias de los ingresos u otra influencia externa indebida en el diseño y desarrollo del producto". Además, bajo el título Mejorar la experiencia del usuario, otra recomendación decía: "Hacer que el conjunto de APIs de Wikimedia sea más completo, fiable, seguro y rápido, en colaboración con los usuarios a gran escala donde se alinee con nuestra misión y principios, para mejorar la experiencia del usuario tanto de nuestros usuarios directos como indirectos, aumentar el alcance y la capacidad de descubrimiento de nuestro contenido y el potencial de retorno de datos, y mejorar el conocimiento y la facilidad de atribución y verificabilidad para los reutilizadores de contenido."
El equipo del proyecto Enterprise está desarrollando un nuevo recurso dirigido a los reutilizadores de contenidos comerciales, que tienen requisitos de productos, servicios y sistemas que van más allá de lo que ofrecemos gratuitamente. El uso de esta oferta no será necesario para la reutilización de contenidos comerciales; las empresas pueden seguir utilizando las herramientas actuales disponibles sin costo alguno. Todos los ingresos de la API empresarial se utilizarán inequívocamente para apoyar la misión de Wikimedia, por ejemplo, para financiar los programas de Wikimedia o ayudar a aumentar la dotación de Wikimedia.
Este proyecto representa un nuevo tipo de actividad en la Fundación. El proyecto se encuentra en una fase muy temprana que debe considerarse un periodo de aprendizaje. Tendremos éxitos, cometeremos errores y tendremos que adaptar nuestras estrategias. El equipo se compromete a escuchar, participar en diálogos y, en la medida de lo posible, integrar los comentarios que recibamos sobre nuestro trabajo. Este documento es orgánico y refleja el pensamiento actual del equipo; estamos intentando documentar todo el trabajo posible de forma abierta. Hasta ahora, nuestro trabajo ha sido moldeado por una serie de entrevistas iniciales con miembros de la comunidad, la Junta Directiva y el personal de la Fundación Wikimedia, investigadores y reutilizadores.
Commercial reusers of Wikimedia
Beyond normal reading of pages, access to Wikimedia content by reusers is currently achieved through three broad means: Scraping of web pages; data dumps; and APIs. These services are provided freely to all reusers of Wikimedia content. They are and will remain free, libre and gratis, to everyone.
High-volume for-profit entities, independent smaller initiatives, and individual volunteer reusers rely on the same services and the same bandwidth, accessed at the same time and with the same rate-limits and update frequency. What many of the largest commercial technology organizations require in order to effectively utilize Wikimedia content goes beyond what we currently provide. Consequently, each of these large companies independently re-builds Wikimedia projects internally to address their very similar use-cases. This significant investment is not only duplicated effort, but also represents resources spent within each company rather than in support of Wikimedia itself, or the broader free knowledge ecosystem.
Some well known examples of high volume commercial reuse of Wikimedia content include:
- The ‘infoboxes’ or knowledge graphs shown in search engine results
- Voice-operated virtual assistants such as Siri and Alexa
- Augmented information provided on digital maps, such as in-flight entertainment systems or smartphones
The Wikimedia Enterprise API is a new service focused on use cases of high-volume for-profit reusers of Wikimedia projects, that those entitites can use at scale, and for which they will be charged.
Why charge
Sustainability
The Wikimedia movement strategy process has generated the strategic direction that lays out the ultimate challenges we want to try to solve. The movement aims to provide a platform that provides open knowledge to the world, in whatever medium, by removing the social, political, and technical barriers that prevent the creation and access to free knowledge. That is a huge challenge. There are technological gaps we need to solve, knowledge gaps we need to fill, and gaps in knowledge access to address, as well. Complementing the strategic direction are the movement strategy recommendations that many hundreds and possibly thousands of people have poured their time and energy into, which address the ways we hope to tackle the challenges we face in working towards the strategic direction.
From a resource perspective, this is about setting up the movement to thrive for decades to come, to weather any storm, and to genuinely stand a chance at achieving the mission first conceived 20 years ago. We’re going to need more resources, more partners, and more allies if we are going to achieve the goals implicit in our vision statement and 2030 strategic direction. The key will be making sure that support is diverse, unrestricted, and removed from direct program influence. That’s why it’s important to make sure the movement can sustain itself both now and for the future in perpetuity.
Consequently, one of the movement strategy recommendations specifically requests the creation of what is now known as The Wikimedia Enterprise API:
Explore new opportunities for both revenue generation and free knowledge dissemination through partnerships and earned income [...] Building enterprise-level APIs [...] Engage partners in the development wherever appropriate, incorporating the needs of a spectrum of small, non-commercial, and larger commercial reusers. Explore fees or sustainability models for enterprise-scale commercial reusers, taking care to avoid revenue dependencies or other undue external influence in product design and development. Develop appropriate safeguards to ensure continued free, unrestricted access for non-commercial, research, and small to moderate commercial use. — Strategy Recommendations, Increase the Sustainability of Our Movement
Self-funding
Serving the needs of a group of highly intensive reusers of Wikimedia content is ambitious. Those needs are valid. However, using the Wikimedia Foundation's existing financial resources to respond to these needs would mean subsidizing the software development needs of some of the world's largest commercial organizations with donor money. The Wikimedia Enterprise API avoids this through self-funding.
Making the Enterprise API service self-funding allows for the hiring of dedicated support for those customers without needing to take financial resources away from supporting existing volunteer editors’ and readers’ needs. In the long term, this frees up the existing Wikimedia infrastructure and staff to focus on community and movement needs. The costs of development of the Enterprise platform, ongoing maintenance, and any additional expenses resulting from it will be fully covered by that revenue.
Maintaining our Independence
The Wikimedia Foundation is primarily funded by readers from around the world who give an average of $15 responding to appeals in banners and email. This funding model has supported the foundation's growth while maintaining our independence. Approximately 8 million readers will contribute to the Wikimedia Foundation this year. We want to be very clear: This is the best and most important support that the movement receives. It gives us independence and keeps us aligned to serving our readers. We will not let nor do we expect revenue from the Wikimedia Enterprise API eclipse the generous support we receive from our donors. If it grows to be a significant source of revenue, we will come back to the community to discuss ways to insulate the Wikimedia Foundation from the influence that might come from it.
It is also critical to realize that the small donation model is partially dependent on desktop and mobile traffic. Even as global access to the internet continues to grow, Wikimedia readership has remained effectively static for the last several years. One of the biggest changes is that an increasingly significant proportion of interactions with Wikimedia content is no longer on the Wikimedia websites themselves. Since 2015 the Wikimedia Foundation identified this change as something that could severely impact the movement’s ability to support itself in its long-term and ongoing work. As more people access Wikimedia content beyond our own websites—often through services the Enterprise API will aim to support—it is important to diversify the movement's funding sources. This will increase the resilience of the Wikimedia movement in the event that traffic to wikipedia.org decreases. The project therefore helps to ensure the financial sustainability of the movement.
Ensuring commercial investment in free knowledge
It is important to ensure that large for-profit organizations recognize the value that Wikimedia brings to their product. High-volume reusers increasingly rely on Wikimedia projects, as well as on the Wikimedia volunteer community that creates and curates that content, while becoming increasingly profitable. Speaking of corporate donations in 2019, Katherine Maher stated, “We want people all over the world to use, share, add to, and remix Wikipedia...At the same time, we encourage companies who use Wikimedia’s content to give back in the spirit of sustainability.” Enabling large-scale for-profit reusers to have a contractual relationship with Wikimedia Enterprise means that as their reliance on Wikimedia increases, their investment in the Wikimedia movement will increase commensurately. This will increase the revenue available for the Wikimedia movement to invest in the movement strategy recommendations, our 2030 strategic direction and the Wikimedia Endowment, which ensures the long-term sustainability of the Wikimedia projects. It will also ensure that donations from our readers will not be used to cover the expenses of large corporate reusers. They will be paying their own way, and also contributing back to the cultural and intellectual commons of humanity.
What services do commercial reusers need
The focus of the Enterprise API is on Wikimedia content reusers that wish to repurpose all or most of our content in a for-profit environment. Our current hypothesis is that these reusers have four immediate needs from a service that supports large-scale content reuse: system reliability, high-frequency or real-time access, content integrity, and machine readability. At present, we offer some of these services, but in a piecemeal, disconnected way. Bringing them together into a single platform that offers a better user experience is the immediate goal of the Wikimedia Enterprise project.
System reliability
High-volume reusers tend to use our content in ways that are critical to the function of their services. This means that the reliability of their systems and services depends to some extent on the reliability of ours. Currently, many of our APIs and data services (EventStreams API and Dumps) are not designed with the large-scale use cases of for-profit reusers in mind. For-profit reusers expect not only an extremely high volume of content to be available with high system reliability, but most importantly, a contractual guarantee of that reliability. Wikimedia Enterprise API aims to provide these service guarantees, offering a way for for-profit entities and services to be more confident when incorporating Wikimedia content in business-critical settings.
At this early stage of the project, where long-term direction and success remain uncertain, we are building this service on an externally owned and operated cloud infrastructure (AWS) in conjunction with contracted engineers. This ensures that our own infrastructure, and staff, are not being burdened with or disrupted by the contractual requirement of system reliability that only affects a very small number of for-profit reusers. It also ensures donor money is spent on Wikimedia’s own infrastructure and not used to subsidize major companies’ technical requirements.
High frequency or real-time access
Access to bulk data services in Wikimedia is currently available through our SQL/XML dumps on a fortnightly basis, through HTML scraping directly by the user and querying of Wikimedia APIs. Select immediate updates, such as recent changes, can also be accessed through the EventStreams API. Providing access to the full Wikimedia dataset at a faster cadence would allow content reusers more flexibility in using our data to suit the needs of their specific use case.
Content integrity
For certain types of Wikimedia content, there can be tension between content that is recent and content that has been reviewed by the community. It is sometimes the case that content which is more recent is more susceptible to vandalism, misinformation, or disinformation, in comparison with content that has been exposed to several hours or days of visibility and review from the community.
Depending on their context of reuse, some reusers have a preference for recency (such as a researcher looking to examine the state of a particular project at a specific point in time), whereas others have a preference for accuracy (such as a search engine looking to provide biographical summaries of notable people). Providing a methodology by which content reusers can choose to access the type of content they need is critical to support a wide range of content reuse cases.
Structure
Content reusers already make significant use of the content of all Wikimedia projects, but the way in which each project (and language edition) is used, processed, and integrated by full-corpus reusers is unique in case: different methodologies, different formats, and different frequencies pertain to each case. This is partly the result of the unstructured nature of many of our projects, but is also due to editorial practices and presentation choices fundamentally differing from wiki to wiki. While this aspect makes Wikimedia amazingly useful, it also creates challenges for full-corpus reusers.
Augmenting Wikimedia’s content and data to put additional structure behind our unstructured content will allow content reusers to adapt more easily to their individual requirements, while allowing us to provide more inputs, including attribution, licensing, and content quality—all in one place.
The Wikimedia Enterprise API will not directly affect Wikidata, or the Query Service. Also, at this stage of development, the Enterprise API does not serve data from Wikidata or Wikimedia Commons. While WDQS is a significant service for bulk Wikidata reusers for baselining their knowledge graphs, currently the goals of the Enterprise API are focused on streaming near real time content – which is a different service than WDQS. Eventually, some information that Enterprise API customers currently obtain via the WDQS might now be obtained via the API instead, which may decrease the amount they use the WDQS service.
What will the Enterprise API users be charged for
Based on interviews with current users and potential customers, we are identifying what reusers need most, what they are willing to pay for, and what we are able to deliver. Most elements are likely to be provided as part of a commercial contract known as a Service Level Agreement (SLA) with Wikimedia Enterprise API's users. This contract will relate to things like the frequency of data updates, the reliability [uptime] of the service, and the availability of technical support.
The format of the Wikimedia content provided to Wikimedia Enterprise API customers will be more tailored to the specific needs of large scale usage. This could include the grouping and filtering of multiple [publicly available] API outputs into a single “package” of content, the re-parsing of Wikimedia content into a different output (for example HTML as opposed to Wikitext), and the duration of the contract to access the service.
As this project is still at an early stage, the specific business model and the most appropriate pricing levels are still being investigated. Similarly, we are still exploring the most efficient and effective way for the Wikimedia Foundation Board to provide oversight as part of its standard governance responsibilities.
As we develop and adjust based on feedback, we have produced a list of Wikimedia Enterprise operating principles to guide its activities as a project, and help determine what its for-profit customers can do.
Free access for some users
The Enterprise API is designed to be used by high-volume for-profit customers but will allow for free access by some users, in cases where existing APIs do not meet specific needs. Other technology organizations offer similar kinds of free-access exemptions to their paid services; for example, GitHub Premium is available gratis to nonprofits.
The Wikimedia Enterprise team is working with Wikimedia Technical Engagement to add free community support through cloud services by June 2021. In the mean time, Access to the Enterprise API services will be provided at no charge in use-cases which are both: highly related to the Wikimedia mission (especially where the use is non-commercial or open-access in nature); requiring high frequency data services not served by the existing APIs or database dumps. Academic research and mission-aligned non-profit services are two such potential cases. Volunteers and researchers wishing to request free access should directly contact the team.
How this will be structured legally
For now, with the approval of the Wikimedia Foundation Board, the Wikimedia Foundation has set up a single-member, US limited liability company (LLC) to provide these services. The use of subsidiaries by mature non-profits is common even within FLOSS and free knowledge:
- Creative Commons owns a Canadian subsidiary for contractual purposes when operating in Canada.
- Mozilla Foundation has a wholly-owned for-profit subsidiary in the form of the Mozilla Corporation focused on revenue generation.
- Linux Foundation owns a number of wholly-owned subsidiaries for a variety of purposes, including revenue generation through the for-profit provision of training services.
- Open Data Institute and Open Knowledge Foundation both use for-profit revenue to fund their activities.
Based on advice from the Foundation’s legal team, we settled on the simple LLC structure to allow us to test this service model. This approach should limit startup costs and unnecessarily complex government reporting requirements through the early phases. The LLC structure will also insulate the Foundation from liabilities generated by the service. That said, the Foundation is still required under US law to publicly disclose the LLC’s revenues and expenses in our annual tax filings (find the previous audited financial reports here). As the project matures, we may change the specific legal structure, but we will always retain the same operating principles. This might include moving to a more permanent legally robust structure in the long term. The LLC will operate under the auspices of the Wikimedia Foundation, all its staff will be Wikimedia Foundation staff, and is ultimately subject to the governance of the Wikimedia Foundation Board of Trustees.
Regardless of the legal structure, all Wikimedia Enterprise revenue will unequivocally be used to support the Wikimedia mission—for example, to fund Wikimedia programs or help grow the Wikimedia Endowment.
To facilitate discussion any of the issues raised in this essay, the Wikimedia Enterprise team of will host regular open “office hours” meeting for at least the initial development phase, and will continue to be available through asynchronous communication channels (such as the project’s Meta talkpage).