Jump to content

Talk:OWID Gadget

Add topic
From Meta, a Wikimedia project coordination wiki
Latest comment: 1 month ago by Sleyece in topic Meeting via Voice

Disappointment is not the word

[edit]

The WMF breaks things. The WMF doesn't unbreak those things. Volunteers ask for features. The WMF doesn't deliver any solution. Volunteers work on solutions. The WMF blocks those solutions. Volunteers solve the issue. The WMF tries to break those again. Disappointment is not the word for that. We are dying. This is why. -Theklan (talk) 19:55, 26 April 2024 (UTC)Reply

I don't feel like the issue has been solved. There is a major difference in sending users to a third Party and bringing a third party's tech into the movement. -- Sleyece (talk) 22:20, 26 April 2024 (UTC)Reply
For sure, but the WMF doesn't want to bring a third party tech into the movement, while breaking all other options to make interactive graphs. They don't even have a plan for working on interactive content in the next fiscal year. Theklan (talk) 12:10, 27 April 2024 (UTC)Reply
We can bring this tech into the movement, and that was the previous strategy. We might need to look at it again to go multilingual. Doc James (talk · contribs · email) 17:36, 27 April 2024 (UTC)Reply
No, we can't. We have tried, but we can't. It's pretty clear that the WMF doesn't want interactivity. Is not disappointment, nor frustration. It's way more deppresing. Theklan (talk) 18:26, 27 April 2024 (UTC)Reply
If the WMF won't allow gadgets to come in, then we have to be allowed to build them internally. There are only two outcomes. -- Sleyece (talk) 19:28, 27 April 2024 (UTC)Reply
Yes, but both are blocked now. Theklan (talk) 19:49, 27 April 2024 (UTC)Reply
We have now build both methods. Hopefully one will be permitted :-) There are benefits and drawbacks to each however that will need to be balanced. Hopefully demands for perfection will not stop all progress. Doc James (talk · contribs · email) 20:18, 27 April 2024 (UTC)Reply
They have answered clearly: they won't do one, and they won't permit the other. The only thing left is platform death. Theklan (talk) 21:15, 27 April 2024 (UTC)Reply
I can code an internal, scalable, gadget that uses cited material to generate a facsimile interactive graph of the third party data. It'll take me until Monday, and I'll link to the Test Wiki gadget in a new section of this talk page. If anyone is pleased with my work, I'm running in the U4C election, and I'd be honored to be potentially earn some votes by solving a systemic issue with my own coding skills; I'll build the gadget regardless if anyone votes for me as long as it may help the movement. -- Sleyece (talk) 23:47, 27 April 2024 (UTC)Reply

┌─────────────────────────────────┘
Excellent. Interested to see what you can build. These were our internal-ish graphs.[1] Doc James (talk · contribs · email) 02:42, 28 April 2024 (UTC)Reply

Change of plans; as a precaution I won't be posting a build until the question period is closed for all U4C candidates. I'm not sure I can help until May 9th. -- Sleyece (talk) 02:29, 29 April 2024 (UTC)Reply
I can start work on this gadget after the end of UTC today. -- Sleyece (talk) 21:55, 9 May 2024 (UTC)Reply
I can start this gadget after the scrutinizing process is complete. -- Sleyece (talk) 03:18, 13 May 2024 (UTC)Reply
I will start the work on this Gadget tomorrow. However, with the first U4C currently only having 7 members, a lot is up in the air. I will roll out the fix slowly and link to periodic milestones in a new section; I'll be inviting comments at each significant step in the ongoing process. -- Sleyece (talk) 22:53, 4 June 2024 (UTC)Reply
[edit]

This is not the first gadget to use a consent pop-up within our movement. This pathway was based off what has been done for years on WikiVoyage with respect to topographical overlays for maps. Doc James (talk · contribs · email) 20:12, 26 April 2024 (UTC)Reply

Hi @Doc James. Two notes here. First, a consent pop-up is an important part of privacy compliance, but it's not the whole thing. Kartographer, and Wikivoyage maps more generally, are part of a service that the Foundation hosts and that staff monitor and maintain. It has policies and security and legal review associated with it. It's not that Wikimedia projects can never send people to third parties, it's that not all third parties are the same. Some of them may collect more data than just IPs, or pose a greater privacy and security risk to users. In that context, a consent pop-up alone doesn't do the trick: we have to review the terms of how the third party uses data and either agree to them as an organization or execute a custom contract with them to handle data more carefully if their terms aren't good enough. The consent pop-up then becomes one piece of properly informing users of how their data is being used by a third party service provider in line with the Foundation's privacy policy. If you just do the pop-up without the review, it doesn't get there as far as privacy compliance is concerned. Second, I'm actually not sure about the quality of the particular gadget here. It looks like that got into place circa 2016, which was a wildly different privacy and security landscape than 2024. Expectations were much lower in terms of hosting a website around the world. If anything, pointing to this example is a second reason why we need a policy now to make sure we're not creating privacy and security risk going forward and are handling issues like this consistently. Jrogers (WMF) (talk) 21:57, 26 April 2024 (UTC)Reply
Then host the software. Theklan (talk) 22:01, 26 April 2024 (UTC)Reply
Their privacy policy is here. We have the software hosted on the wmcloud[2] but from what I understand it needs to be on production servers and it needs to have a technical team within the WMF dedicated to it before it becomes usable on WP. Doc James (talk · contribs · email) 22:08, 26 April 2024 (UTC)Reply
Hosting it would solve the graphs issue by large. But the WMF approach here is just let the projects die. Theklan (talk) 22:13, 26 April 2024 (UTC)Reply
I certainly hope not. After mobile functionality, "multimedia and rich content" was the second highest request from our readers in a 2015 strategic consultation done by the WMF. Doc James (talk · contribs · email) 01:58, 27 April 2024 (UTC)Reply
And still it is. It is part of our strategy, and what @MIskander-WMF and @SDeckelmann-WMF call "multi-generational pursuit". Still, even if this is strategic, there are exactly zero bytes dedicated to it in the Annual Plan. (Wikimedia_Foundation_Annual_Plan/2024-2025/Goals/Infrastructure, Wikimedia Foundation Annual Plan/2024-2025/Product & Technology OKRs, Talk page where @MMiller (WMF) says that they don't have plans for improving interactive content next year).
It would be great to have the foundation recognizing that we solved one of the problems they have forgotten for months. Instead of that, what we have is a call to stop and refrain from improving our infrastructure, our user's experience, and the re-use of free knowledge material. Theklan (talk) 08:28, 27 April 2024 (UTC)Reply
I think the better solution than asking for consent before loading graphs would be to just host the open source/content OWID stuff on WMF servers. I haven't followed the prior conversations, could you please briefly explain why this is not/cannot be done here or elsewhere? Maybe this issue could be included in what I proposed here for efficacy. Prototyperspective (talk) 11:56, 7 June 2024 (UTC)Reply
You have part of the discussion here: task T324989. There's no path to interactivity, every option has been closed, banned or not funded. Theklan (talk) 12:37, 7 June 2024 (UTC)Reply
The main objection there seems to be privacy risks due to This mitigation is limited since a maintainer or any malicious actor with access to the WMCS instance could alter the code but I wasn't suggesting this to be hosted on some WMCS instance but on official WMF-maintained Wikimedia servers. Has this option been considered and more devs drawn in to think about how to solve this? Other than that, see the link for what I'd suggest to be done to get more people aware and working on this alongside changing WMF attitude on this. Interactive OWID graphs on Wikipedia are a quite significant reporting-worthy likely-interesting-to-many subject.

Also the concern by Sleyece isn't really constructive, just add info about that the IP will be shared and that this has/may have privacy implications. "Because they are reading an Encyclopedia" is not an impactful Con as nearly nothing should be assumed about readers, not even that they can read, instead that this info is there is. I think from there it would be much easier to get a WMF-hosted version that doesn't require consent before loading. Note that one could use this to get more people to sign up by adding an option to always load OWID charts without having to consent each time so people sign up so the charts always load right away. Prototyperspective (talk) 10:52, 8 June 2024 (UTC)Reply
Yes I have met with multiple people within the WMF about them helping to support this. We were interested in moving to production servers but OWID graphs were not viewed as a priority last time I met with folks about a year ago. BTW with the current consent pop up, you need to only agree once and than it remembers for a bit. Doc James (talk · contribs · email) 16:38, 8 June 2024 (UTC)Reply
User:Prototyperspective one needs staff involvement to host content on WMF production servers. We already have the material on wmcloud here. This was one of the prior approaches we tried. Doc James (talk · contribs · email) 07:21, 8 June 2024 (UTC)Reply

┌─────────────────────────────────┘

I agree. Users explicitly consent to doing so when utilising these features (many of which I find incredibly useful), which nullifies any privacy concern. --SHB2000 (talk | contribs) 02:08, 27 April 2024 (UTC)Reply
I don't think we would be having this conversation if all privacy concerns were nullified with a pop up. How do you know the user understands the consent popup? -- Sleyece (talk) 03:46, 27 April 2024 (UTC)Reply
Because they are reading an Encyclopedia. Theklan (talk) 05:23, 27 April 2024 (UTC)Reply
People routinely "consent" to absurd terms (check the fine print of Google's). There is no realistic chance for a user to make an informed decision to agree to the terms of most websites, other than just deciding to surrender their rights. I very much like WMF's stance, to defend their users, and that's a major reason for me to contribute to these projects. And, unlike ordinary users, WMF has the resources and the power to choose what terms to agree with. –LPfi (talk) 09:53, 27 April 2024 (UTC)Reply
The WMF is not "protecting their users", is forbidding their users to get knowledge in new and interactive ways. Theklan (talk) 11:32, 27 April 2024 (UTC)Reply
LPfi, regarding "other than just deciding to surrender their rights", there is always the option to just hit disagree and then ignoring the feature in question. --SHB2000 (talk | contribs) 07:29, 28 April 2024 (UTC)Reply
The same way WMF is "forbidding their users to get knowledge in new and interactive ways" by not using some other useful services. We choose in what way we help our users, and we do that in a way that respects our principles, which include respecting privacy.
There is no such thing as making a well-informed decision to agree to e.g. the Google terms. The only way to agree is "whatever, do as you please". The alternative is to disagree, as you say, but that's not what we are discussing.
LPfi (talk) 08:04, 2 May 2024 (UTC)Reply

Jrogers: You write "we need a policy now to make sure we're not creating privacy and security risk going forward" -- of cousre, we could use a consistent policy. But please don't use this as a reason to ratchet up the iron law of increasing bureaucracy. A very salient risk is that people stop using our sites and go elsewhere because we no longer meet normal expectations of usability. This is also a wildly different web in terms of access to interactive knowledge. When we don't have OWID-level visuals, even I send people directly to OWID for an overview of a topic: which exposes them to many of the same putative risks and never introduces them to a wiki at all.

As we intend to be a pillar of support for the whole free knowledge ecosystem, our benchmark should be the security of that whole ecosystem, not 'security of people who come to our sites, even if that means having fewer people come to our sites'. In the latter case, the surest way to minimize security and legal risks is to stop having readers altogether. –SJ talk  03:52, 28 April 2024 (UTC)Reply

We want to work with the community to strike the right balance between contributions and innovation while making sure there are checks and balances in place to ensure this happens in a way that is safe and legal. Our current request was to PAUSE on the further rollout of the OWID gadget while we could consider privacy, security, legal and performance issues. As we assess, we are taking into account the benefit this service would provide to readers and current interest among editors, which is why we are looking into whether there are options that might allow this work to continue in a safe and sustainable manner. ACooper-WMF (talk) 12:08, 3 May 2024 (UTC)Reply

Risks

[edit]

How is the risk any different than having a reference for a graph that includes a url linking to OWID? When one clicks on such a url it brings you to OWID and shares your IP address and machine details with them. We have millions of references that include urls without warnings. Doc James (talk · contribs · email) 20:12, 26 April 2024 (UTC)Reply

Clicking an external link is at your own risk. The gadget is a potential threat to the movement because OWID is in control of tech embedded on our turf. A simple consent form is extremely inadequate risk mitigation. -- Sleyece (talk) 22:23, 26 April 2024 (UTC)Reply
The privacy and security teams are currently conducting an in depth look into the design and code of the OWID gadget. A lot of care is taken to ensure that our users are protected and this is one reason why we have dedicated security and privacy specialists who have in depth knowledge of web technologies and vulnerabilities. Even when you click on an external link in a Wikipedia, that experience has been carefully designed to ensure you remain as protected as possible with modern browsers. In a world of third-party content advertising where there are many threats to privacy and the disclosure of information, we want to be a place that people can trust to be different and to protect your personal information and data. This is especially important to some users where the disclosure of information to third parties such as what articles they are reading and interacting with could be a safety concern. There are specific legal, privacy and security risks when content is embedded in a browser IFrame compared to an external link especially as this is new code and does not necessarily adopt the same protections that we have built in elsewhere in the platform. ACooper-WMF (talk) 10:37, 2 May 2024 (UTC)Reply
And that's precisely why the WMF should be adding functionalities and not trying to restrict novelty. There's no solution if there are staff hours used to evaluate improvements instead of developping them. Theklan (talk) 12:10, 2 May 2024 (UTC)Reply

may be required to disallow loading content from ourworldindata.org

[edit]

The whole point of this approach is that there is no content loaded from ourworldindata.org, only a static image loaded from commons. As Doc James says there is a reference link to the original source of that image and a warning about viewing it, which is actually more risk averse than the normal practice of allowing exit without warning to the reference url, which has no sandboxing. Tim-moody (talk) 21:29, 26 April 2024 (UTC)Reply

The static images from commons are a welcome step as they mitigate the privacy/legal/security concerns, but those concerns are introduced at the point the ourworldindata.org is loaded within the frame that is embedded within the page, despite the click through warning, and that has been the target of our ongoing discussions. If there was a use-case which included only the static image, and not the click through with the embedded frame, that would address the current concerns, but it seems that trades off useful functionality aswell, so we are looking into the embedding aspects in detail to see what are the risks and what might be possible to do about them. ACooper-WMF (talk) 10:45, 17 May 2024 (UTC)Reply

OWID Gadget vs Graph extension

[edit]

To be clear the Graph extension relied on third party libraries and was, I would guess, at least two orders of magnitude greater in scope and complexity than the OWID Gadget, which I expect is less than 100 lines of code. This code has already had three contributors who agree to collaborate on a central source. So reviewing this code should not prove difficult, and it speaks well to the subject of maintenance. Tim-moody (talk) 21:38, 26 April 2024 (UTC)Reply

In terms of coding complexity, the concern is less that the gadget is difficult to code and more about how it would be designed and maintained. As I shared in my update, the OWID gadget raises broader security, privacy, and legal concerns around future globally deployed gadgets. These potential risks, which we’re working to understand better, also mean that maintenance is important to plan for if and when gadget creators and users drift away. In the case of the OWID gadget specifically, we hope to have a clearer picture of the risks soon which we can share with all of you here. ACooper-WMF (talk) 12:09, 3 May 2024 (UTC)Reply

Not a workaround

[edit]

I realize that this is frustrating for people here who have been working on OWID and are excited about it as a work around while graphs are disabled.

I want to clarify that this is not a work around while graphs are disabled. This is a solution. Not a workaround. The so-called solution for graphs is light years behind this. If it is solved in the future (it has been broken for a year, and will take at least one more year), it will be non interactive images, and only lines. If the WMF decides to go once more against our strategic goals and close the OWID solution, the interactivity it brings won't come with the proposal mentioned on the graphs discussion. It won't even be near that. Theklan (talk) 21:59, 26 April 2024 (UTC)Reply

I agree, this is a strategic discussion. We will achieve the sum of all human knowledge more rapidly by collaborating with like minded organizations than just going it alone. We need to balance our ideals with the reality of the world around us. Doc James (talk · contribs · email) 22:11, 26 April 2024 (UTC)Reply
The sum of all knowledge will be pay-walled by for profit third parties if we're not careful. -- Sleyece (talk) 00:25, 28 April 2024 (UTC)Reply

Three ideas

[edit]

Hi! As one of the developers of the OWID gadget, I must admit that after sleeping over the initial setback, I'm glad to see how the security team cares for our users, since I'm also one of them, and that means the care extends to me. Thanks! <3

I have three ideas that may help handle, if not the entire issue of gadgets loading external content, at least the concerns raised for this particular OWID gadget:

  • Add an Attention or Warning title to the consent popup, to make it even more conspicuous.
  • Modify the code to remove the cookie that stores consent, so that users have to give consent every time they open the gadget.
  • Replace the play button overlayed on the image, with a standard Codex button in the image caption, with a label like "Open chart from Our World in Data" or similar, so that users may anticipate what will happen even before they see the consent popup.

I think it would be awesome to reach a consensus about this gadget (and the broader issue) that somehow allows it, since it could really help our shared vision to bring all knowledge to all people! Sophivorus (talk) 11:21, 27 April 2024 (UTC)Reply

Thanks for the thoughtful feedback and the understanding. We are taking this into consideration in our planning and discussions. ACooper-WMF (talk) 11:00, 2 May 2024 (UTC)Reply

Not a security issue but still a growing concern about OWID CC licensing policy

[edit]

Hello, this is not a security issue, but I want to bring something that I've realized with OWID. It's very good to see an organization releasing their graphs under a CC license compatible with Wikimedia projects, but it has come to my attention that they are normally just compiling other data from third-party sources and then re-licensing it. Example: OWID has compiled these charts on pesticide use, where they claim that the source for this is FAO stats "with major processing by OWID". However, if you go to the original source, FAO STATS, the licensing policy for their data is under a CC BY NC SA (as described here; I think the graphs are dynamic but it's just two steps of doing this same search - "total pesticide use FAO"). In certain cases (and countries), compiling data might not be protected under copyright, but some of these data might have indeed database rights. This means that if they did any "major processing" to the data provided by FAO, then they should use a CC BY NC SA, not a CC BY. To me, their copyright clearance process is less than clear and I think downstream re-users (including Wikipedia) might need to be extremely careful on how they can actually use and re-use these graphs. (On a personal note, I don't think FAO's legal departament will ever go after someone for not respecting their CC licensing statement, but that's an entirely different discussion).

This is not to say that's cool to break the hearts of people that have been working hard on bringing interactive graphs (and I think that issues raised in most of these threads bear some true), but these copyright issues keep on appearing when re-using third-party content where you don't have control of the copyright clearance process. My two cents. Scann (talk) 19:31, 27 April 2024 (UTC)Reply

Yes so the question is did they get the FAO to drop the NC for their reuse. Or have they just used the FAOs facts to create their own separate dataset. With facts themselves not being copyrightable. Have asked... A bigger issue is the flicker upload tools we have, were paid editors upload works of others to flicker, claim to release them under an open license, and then move them to Commons. Doc James (talk · contribs · email) 20:17, 27 April 2024 (UTC)Reply
Data cannot be copyrighted in the USA so i don't think FAO's license matters. Bawolff (talk) 08:57, 28 April 2024 (UTC)Reply
Hello Bawolff, no, it actually does matter because that same criteria then should be applied to all the things that can be fair-used in the US. While in general I'm in favor of adopting some more flexible criterias in certain cases when it comes to copyright, you might create a problem for downstream reusers outside of the US. Scann (talk) 03:15, 1 May 2024 (UTC)Reply
I suppose some legal expertise is needed. Does the UK (where it appears OWID is based) have database rights? If OWID can legally use the data to create a public CC-BY-licensed database, then I assume reusers are allowed to use their licence, not underlying licences. Having to check sources of underlying data every time you include public data in your works seems unworkable. –LPfi (talk) 08:24, 2 May 2024 (UTC)Reply
@Scann Are you also opposed to using PD-Art images? Because it is basically the same argument we use for them. Bawolff (talk) 18:20, 2 May 2024 (UTC)Reply
Hi all, replying here. Database rights tend to always be a difficult question because they are regionally varied and therefore pose different risks to different people depending on where they are located. They are also a bit different than PD-art, which is about what does and doesn't receive copyright in different countries, whereas database rights are a separate right from copyright and can be enforced separately even where no copyright exists. Looking at OWID, we have a couple things. First, they themselves aren't helpful here. Their legal disclaimer explicitly disclaims anything about infringement, so they're not promising to have done any sort of checking or confirmation that what they have is okay. That puts the burden on us (using "us" broadly here to mean community, Foundation, or reusers, whoever might need to do the review or be best positioned to do it). Second, from the Foundation's perspective, this kind of risk is usually okay for us to still host something. Because the US doesn't have database rights and the Foundation is protected by the Digital Millennium Copyright Act, we could accept OWID's representation that the material is CC licensed until such time as someone contacted us with enough information to do a takedown. If that happened, we'd be obligated to remove under our normal DMCA policy. Third, from a reuser perspective, there might be some risk here. The UK does have database rights, which further vary in how broadly they apply based on their timing with respect to Brexit. See https://www.gov.uk/guidance/sui-generis-database-rights. The risk of violating database rights varies and depends on how much of someone else's database was used relative to the whole, so this would most likely be a case by case risk for each different dataset. -Jrogers (WMF) (talk) 17:38, 3 May 2024 (UTC)Reply
Some generalities can be said, on the ground of which case-by-case assessment can be done. It seems that the OWID does not need to honour EEA database rights that have emerged since 2021. Does "laundering" data through the OWID (in the UK) free a reuser from respecting database rights of an original EU data set? What about the cases where the data is from the US (which does not recognize database rights) but the reuser is in the EEA? Can the US owner sue the EU user in an EU court? Are reusers obligated to determine the origins of data in OWID works? –LPfi (talk) 07:03, 4 May 2024 (UTC)Reply
In any case, we are downloading those data and uploading them to Commons. We have been doing that for some years now. If there's any copyright issue, we are saying where the data comes from. Theklan (talk) 11:09, 4 May 2024 (UTC)Reply

This page gets the issues at stake exactly backwards

[edit]

This is a strangely bureaucratic and risk-averse take on a loving exercise in trying to find a way to integrate valuable knowledge at scale from a partner community into our projects. Can we please stop stepping on our own feet in this way?

Let's start from the end. Are there other concerns or risks not listed above? What else should we be thinking about?

The first thing we should be thinking about is the benefits here to the sum of all knowledge. That's why we exist. Sharing knowledge is a risky business, and we exist to find ways to do it all the same. We should take care not to develop a culture that forgets this and tries to simply minimize risk by doing as little as possible.

Benefits

[edit]

OWID is an extraordinary, award-winning source of the best visualizations in the world for a certain class of knowledge. They have some 10,000 charts providing overviews of available data on important current issues. It is our mission to empower communities to help that knowledge reach people, including by collaborating with them, integrating their work into ours, and vice-versa.

Where communities are trying to do this and failing, we should support them, offer improvements, show ways to succeed. Including in scalable, secure ways; but also in temporary jury-rigged ways until scalable ways exist.

The benefits to reaching a new network with strong visualization and data skills, and an interest in holistic overviews about the state of the world, are significant. And the benefits of supporting the work of dozens of our existing community groups for which this would improve upon and replace their current hand-updated statistics, in time and joy and accuracy, are just as great. If we had a way to directly integrate OWID data on COVID for instance, we could have avoided duplicating thousands of hours of work in recent years.

Risks 2

[edit]

Primary risks are that we fail to become infrastructure of any sort for the free knowledge ecosystem, become known as a place where creative ways to share free knowledge go to get tied up in red tape, and become less useful than alternatives to readers, editors, and reusers. Two of the greatest self-imposed obstacles to the adoption of free-software were the weaknesses of its user interfaces, and the unfriendly dogmatism of the FSF during its heyday. Being dogmatic about privacy and security in a way that preemptively blocks experiments in better interfaces to knowledge, rather than helping make such experiments work, combines a bit of both.

  1. Partners who develop other world-class free knowledge may stop trying to collaborate with us, and come to see us as a time sink that is hard to work with and in the end cannot help advance the breadth or reach of free knowledge unless it fits into a narrow and increasingly old-fashioned box.
  2. New sources of knowledge, new developers, and new ideas for presenting knowledge will stop thinking of us as a natural place to invest time and energy, and stop thinking of our approach as the gold standard for sharing knowledge. We have already weakened once-strong relationships with other technical communities which are pillars of free knowledge by being hard to work with. Here we have no existing relationship with OWID, but this could be the beginning of one, as we have an active community of interest. In a healthy ecosystem we would support 100s of these. Asking 'how can we do this well + efficiently at scale' is a great question, but framing it as 'do we want to do this / is this feasible and a good use of resources' is not.
  3. The above can naturally result in our projects becoming less relevant, new generations working on new projects which are as open to creative approaches as we once were, not bound by increasingly arcane restrictions. To the extent that maintaining such restrictions is in line with our mission, it is on us to make that work for our community members and for their collaborations.

We are already on a slippery slope to becoming insular and isolated, and I would be interested to see any metrics that indicate we are becoming infrastructure for anyone other than our own wiki contributors. But we don't have staff or community groups dedicated directly to ameliorating risks around this sort of knowledge expansion, partnership, and community building. So while these seem to me the clear primary risks, we mainly talk about secondary risks with named associated tasks.

Secondary risks include increasing internal demotivation and friction within our existing communities, risks to privacy and security of readers and editors, reduced article quality, and less maintainable workflows.

  1. Internal demotivation: the desire to include OWID data is not new. The extensive and thoughtful efforts of a dozen people to make this possible has not been well summarized or honored on this page, nor has the importance of the end goal been acknowledged. The fact that many hours of staff time are going into risk assessment while none are going into building a better integration may reflect where we have spare capacity more than the best way to achieve our mission.
  2. Internal friction: recently there has been good momentum around this initiative, which from the perspective of advancing public knowledge is fantastic. This momentum includes specific tasks and efforts currently planned for the Wikimedia Hackathon, where many interested people will be in person. A request to pause all work, without suggesting a new direction to focus technical energies, for 'two weeks' covering the week just before and after the hackathon, is perfectly timed to kill that momentum.
  3. Privacy and security risks: Mainly the relative increased risk of cross-project integration, over the combination of accessing both projects separately. These concerns seem to be the primary source of the roadblocks recommended by staff, and the threat to 'disallow loading content from [owid]'.
  4. If all integration is blocked, other risks include having outdated information, or separate silos for parallel libraries and workflows, which would each be less robustly maintained than shared infrastructure.

Thanks ACooper and Quiddity for thinking about this, and for developing your thoughts here in public. –SJ talk  06:17, 28 April 2024 (UTC)Reply

Hi all - I’m Mark Bergsma and I’ve been involved in the TechOps/Site Reliability Engineering (SRE) space at Wikimedia as a technical volunteer and then staff member since 2004. For the past ten years my role has been to manage the Site Reliability Engineering (SRE) teams, and currently I’m Vice President of SRE & Security at the Wikimedia Foundation (WMF).
Thank you, SJ and others, for engaging on these topics, and for a reminder of what's also important: of what we are here to do. Projects like ours rely on a certain can-do attitude to come to life and for innovation. I remember that being in full display in the early days of the project, when it was the only way we could get things done, and I have internally argued for it and the importance of preserving it in front of all staff at WMF. This OWID project, driven by community members, is also an example of that, and in many ways it's lovely to see.
In thinking about how Wikimedia has worked from the beginning, it’s also important that we talk more openly about what has changed. In short, as the size, scale, and global prominence of Wikimedia projects have grown, so have our security and maintenance needs. The Internet as a whole has evolved as well, and plays a bigger role in peoples’ lives, positively and negatively, raising the stakes. Some solutions that were acceptable (or even the norm!) 20 years ago, aren’t anymore today, as the potential negative impact on people’s wellbeing is more severe, or more likely. As noted above, I do understand your points and concerns about limiting momentum, innovation, collaboration with partners, progress towards our mission - but at the same time I feel that dismissing security and privacy concerns as less important and not worthy of careful consideration is not entirely fair. These aren’t necessarily of equal concern to everyone, but several concrete examples have already been mentioned of what can happen if we simply ignore them, with potentially very serious consequences for some users, and/or the projects as well. The bar for privacy is now higher, the security challenges are bigger, all while the Internet also depends on us more. While the role of Wikimedia volunteers remains as essential as ever, it has become more complicated.
This means that for considering projects, like OWID embedding, the question we must ask is not “Does this project have benefits?” (yes, it does!) but “Is this project the best use of our time and resources over the long term?”, and “How can we minimize any downsides?” The development of new tools and gadgets presents both direct risks, such as the security challenges we’ve outlined above, and opportunity costs for what other things we forgo doing when we take on new work. This is a very real challenge in our open source movement, and at WMF we face that every day. As I recently wrote, we already spend over half of WMF engineering staff time on supporting and sustaining existing technical projects, while the footprint of new features, services, and code development - many of them created by volunteers without a long term plan for maintenance - grows faster than our staffing resources to support them. These considerations apply to both projects driven by WMF staff as well as volunteers. The tradeoff I’ve seen mentioned here and elsewhere (“staff time spent on reviewing volunteer code instead of developing it themselves”) is therefore not a thing - if only it was. :)
Beyond our planned work on graphs, interactive content (referring to visuals like 3D images or timelines) will indeed not be a focus of the Wikimedia Foundation’s next annual plan. This is because we believe that there are other efforts that are more impactful to spend time on right now, also including things like sustaining and evolving MediaWiki. One notable example is a set of experiments around how existing Wikipedia content can be remixed and presented in new ways - pursuing new generations similarly to the “interactive content” concept, but with substantially less effort required.
Volunteer technical contributions and open source workflows are the lifeblood of the Wikimedia platforms. This leaves us with some important questions: How can volunteers and WMF staff navigate the challenges of scale, security, and sustainability together? To me, the way forward is to continuously evolve how Foundation staff and Wikimedia volunteers collaborate and plan together around meaningful, long term work that makes space for volunteer creativity and contributions alongside these considerations. We believe that our revamped wishlist process will be one way, and we’d love more volunteer developer participation there as well. What other steps could we take to better collaborate in a sustainable way? Mark Bergsma (WMF) (talk) 15:21, 6 June 2024 (UTC)Reply
Come on, please. We deserve a better answer than this. Theklan (talk) 15:43, 6 June 2024 (UTC)Reply
It was a thorough answer; you didn't understand it. -- Sleyece (talk) 18:36, 6 June 2024 (UTC)Reply
So my fix needs to be quantum software? I was planning on that anyway. I refuse to make a derivative bot that does ChatGPT calls, which would be its own problem. Also, I don't want the Foundation's money, not for this, and I never asked for it. You don't need to manually check my fix. It will be self replicating and send reports of it's own work to whoever has rights and wants them. -- Sleyece (talk) 18:35, 6 June 2024 (UTC)Reply

Not sure what you mean? No one has "dismissing security [or] privacy concerns"; however, these are impossible to eliminate completely. For example URLs on WP may lead to viruses, etc. If we want to improve privacy could we not simply create an intermediary layer which pulls the content from OWID and than have that intermediary layer serve it to our readers? If we are worried about this appearing under a Wikipedia url, we could place it under a different url to emphasize the risks. Or are you simply saying "no"? Best Doc James (talk · contribs · email) 19:02, 6 June 2024 (UTC)Reply

What you're basically asking for would require placing an eggshell of encryption around commons data before it's fed to the reader. That takes a ton of server space to pull off; I'm guessing the notebook would be $4/hr, 24/7 and 365. It would be impossible without money, and Mark has said we will get no money for this fix even if we need some. -- Sleyece (talk) 19:22, 6 June 2024 (UTC)Reply
Hi Mark, thanks for weighing in. Our technical ops are some of the best-scaled and -quantified parts of the wikiverse, so I particularly appreciate your assessment. However please note that these technical assessments of requests to enable new knowledge formats rarely try to quantify advances to knowledge breadth or depth, or expansions to our communities of practice. This is where I see the greatest gaps between how {editors and reusers} and {technical planners} currently see proposals and opportunities. (Also why we currently accept so few file formats...)
My comment above was getting as this sort of quantitative question: what benefits does integration have, not a binary "are there any benefits?"
Specifically: OWID is a well-defined and constrained visualization library, used by 100 million people a year, maintained by an organization that would make a worthy partner. Learning how to integrate well with their updates and drive visibility and support to their work strengthens our shared ecosystem. This would also support and streamlines the work of many highly active editors who currently maintain those data connections by hand. When a proposal has this sort of immediate impact and built-in support, it should ideally qualify for consideration on its own merits, rather than as a messy theoretical tradeoff with all other possible priorities.
A less self-sustaining aspect of this discussion is about the benefits to readers and to our mission from editors finding new uses for visualization. That's also important but harder to quantify, and might be discussed alongside other technical and aesthetic wishes.
You mention "dismissing security and privacy concerns as less important and not worthy of careful consideration" -- as Doc James said, I haven't see anyone suggesting that in this instance. That seems an unfair characterization. The years-long discussion about how to integrate OWID charts has been characterized by people asking how best to do this in a secure way. There are obvious solutions that are specific to OWID; these keep being deprecated in favor of more general ideas that have not materialized.
Please consider separating discussions about cleanly scoped ideas with quantified benefits and maintenance plans, from those about interesting new features with unknown impact, or finding solutions to unsolved problems. While the latter two might share a single global wishlist with capped investment and uncertain timelines, the former are rare and worth catching. They could be supported by a separate opportunity fund dedicated to being able to act quickly when they appear, as long as they cross some threshold for impact. "Support" can range from something as simple as "helpful, prompt, internally consistent technical recommendations for self-maintained community efforts" to "proposing a checklist of tasks to advance those recommendations" to actually knocking off or owning items on that checklist. –SJ talk  21:29, 12 June 2024 (UTC) 22:58, 18 June 2024 (UTC)Reply

Similar conversation 10 years ago

[edit]

We had a similar conversation in 2014/2015 regarding external map layers were the solution was to "anonymize (proxy) it through our servers, and/or show a privacy warning". Further discussion was here. User:Slaporte (WMF) among others did a legal review in July 2015 at the request of Lila Tretikov. Doc James (talk · contribs · email) 15:34, 28 April 2024 (UTC)Reply

Thanks, we are bearing this in mind in looking into what is the best approach here and also the most expeditious use of resources. We are looking into this as a potential option. ACooper-WMF (talk) 11:02, 2 May 2024 (UTC)Reply

Why not copy OWID content to our own platform?

[edit]

Why pull content from OWID when that incurs risks of users leaving our platform's security?

The content is open, right? I see the concerns about dubious provenance but we can work through that. Can we mirror their openly-licensed, copyright compatible data here, then serve this content entirely from the wiki platform?

Bluerasberry (talk) 16:41, 17 May 2024 (UTC)Reply

Yes, it can be done. I'll do it later, but not yet, per above. -- Sleyece (talk) 15:39, 18 May 2024 (UTC)Reply
The data is mirrored completely at Commons. The WMF didn't accept the software mirroring approach. Theklan (talk) 16:03, 18 May 2024 (UTC)Reply
Yeah, no, the WMF isn't going to accept copy-paste data. It's going to have to be a little more complex. -- Sleyece (talk) 17:27, 18 May 2024 (UTC)Reply
The data is available in Commons. Theklan (talk) 19:38, 18 May 2024 (UTC)Reply
Yes, I've seen the data. -- Sleyece (talk) 19:40, 18 May 2024 (UTC)Reply

┌─────────────────────────────────┘
We also mirrored it all here and you can see this method functioning here. Doc James (talk · contribs · email) 01:54, 21 May 2024 (UTC)Reply

Is there any pointers where you discussed with WMF about the mirroring approach where it loads data from Wikimedia Commons? (Least https://mdwiki.org/wiki/WikiProjectMed_talk:OWID/Archive_1 loads data from https://owidm.wmcloud.org. ) --Zache (talk) 18:21, 6 June 2024 (UTC)Reply
Not sure what you mean? Doc James (talk · contribs · email) 20:03, 6 June 2024 (UTC)Reply

Mid-may update

[edit]

Thanks @ACooper-WMF for the update. As I understand, you are proposing to just shut down the OWID gadget. Am I right? Theklan (talk) 19:45, 18 May 2024 (UTC)Reply

Why are you tagging WMF to ask if they'll do a full shut down? It's very dismissive of my intent to fix the gadget; it ignores other users interested in a solution. -- Sleyece (talk) 15:15, 19 May 2024 (UTC)Reply
I haven't seen any intent yet. Anyway, I suspect that any intent will be dismissed the same way, as any innovation is halted by the WMF. Theklan (talk) 07:58, 20 May 2024 (UTC)Reply
There's a lot of irons in the fire currently; I encourage you to have patience. Look at the big picture. -- Sleyece (talk) 01:59, 21 May 2024 (UTC)Reply
I have been working on this for around 10 years... No delusions that things move quickly :-) Doc James (talk · contribs · email) 19:35, 21 May 2024 (UTC)Reply
The big picture is even worse. Theklan (talk) 06:26, 22 May 2024 (UTC)Reply

@ACooper-WMF: re "Disclosure of personally identifiable information to OWID such as articles a user is reading, their unique IP address, and location. We now have identified targeted code changes that would reduce some of the data sharing..." - Can you be more specific what this is referring to? I don't see any edits to the gadget, public proposals for changes or discussion in public phab tasks (Presumably private discussion happened on T364033, which I of course do not have access to). If there are ways to make things better, I think we should talk about that publicly. Even if the end result ends up not being sufficient for use on Wikimedia, I still think it is valuable to be public about potential improvements so that other re-users who have less strict requirements can use them. That said, I find it hard to believe there is anything that can be done to prevent the sharing of IP address and broad location (perhaps the allow attribute could prevent fine-grained geo-location). Is the concern about leaking what article the user is reading based solely on the idea that OWID could scape all wikipedia articles and figure out which graphs are used on which articles, and connect the dots that a user iframing a specific graph can only come from a specific set of articles? Or is it based on something else? Bawolff (talk) 11:31, 25 May 2024 (UTC)Reply

One could prevent sharing IP information with a two step process. The user requests the content from WMF servers. The WMF servers gets the content from OWID. And then the WMF servers gives it to the user. We had this basically by hosting the content on the wmcloud servers, but that was insufficient. Doc James (talk · contribs · email) 16:35, 29 May 2024 (UTC)Reply
@Bawolff: do you know if we have any current proxy wrappers hosted on WM servers for content regularly requested from outside servers?
As to phishing -- can we make "viewing OWID data on a wiki page" safer than first viewing the wiki page, then viewing the OWID page? –SJ talk  20:09, 6 June 2024 (UTC)Reply
WMF just declared up the page the fix has to be no budget and from scratch without taxing Foundation resources. No money; no server space. -- Sleyece (talk) 21:09, 6 June 2024 (UTC)Reply
The upload by url feature, citoid, content translation and openstreetmap (kartographer) are examples of features that request content from the outside world and do something with it on behalf of the user. While doing something along those lines is the most proper solution to implementing what is desired here, there are some reasonable concerns that stand in the way:
  • this is a bunch of moving parts that someone has to take care of. Not just create once, but take care of over the long term. While in principle these moving parts could be taken care of by volunteers, practically speaking it seems doubtful. Depending on the details of what is proposed (there are many variations on this idea with different levels of complexity ranging from just straight proxying to having significant amount of pre-processing) there are probably only a limited pool of people who have demonstrated they are qualified to do so via a track record of doing so at the level of WMF (none of whom are involved in this discussion). Even if that was worked out, there is the question of how long would such parties be interested. All this taken together probably makes the proposal to WMF sound less like, "we will build this feature", and more like, "if we do the first step, will you do the rest?", which isn't the most compelling of requests.
  • this suddenly adds a new thing that page rendering depends on. If gadgets break, the gadget author gets blamed. When an extension breaks WMF gets blamed. The expectations around quality and reliability for extensions is much higher than gadgets. Worst of all, a lot of this is outside WMF control. If 5 years from now, OWID suddenly changes their website breaking the thing, WMF gets blamed, etc
all of these things have ways of mitigating risk. A lot of it comes down to a cost vs benefit question. Which is the other factor here that is not spoken about all that much - how much would wikipedians really benefit from embedded owid graphs? How much of a benefit is there in embedding vs just linking? How many articles is this relavent to?
All this is to say, if we really want to push the server side version forward, then this should be treated in a more "corporate way". Create a joint development proposal. Be really clear on what the benefits to Wikimedia reasonably would be (and proposed metrics to verify). Be really clear on what parts WMF would be responsible for, and what parts non-WMF actors would be responsible for. Remove the uncertainty from the equation. I'm not sure how WMF sees all this right now, but they could very reasonably see it as a request that WMF take on unclear & unspecified responsibilities for very little gain, which sounds like a bad deal. The trick is to make WMF see the value while reassuring them that the extra work they would be signing up for is both minimal, but more importantly predictable (unknowns are scary).
Bawolff (talk) 22:02, 6 June 2024 (UTC)Reply
I'll take all the blame if my fix fails. It doesn't bother me because it won't fail. I don't need Foundation server space, and WMF can take all the praise when it works long term with little to no maintenance internally. I do however need the goalpost to stop shifting and for there to be some sort of agreement on what it's supposed to be. -- Sleyece (talk) 23:44, 6 June 2024 (UTC)Reply
I'm not sure who you see as the audience for this comment, but if you and Doc James agree on a fix it can be seen and used in action on other MediaWiki sites like mdwiki, as a way to get real feedback. –SJ talk  21:58, 12 June 2024 (UTC)Reply
@Bawolff: They have clearly stated that they are not going to work on anything that adds interactivity. Not now, not in the next year. The idea is to let the platforms die, and meanwhile try to justify it by claiming they are trying to improve TikTok or Flickr. Theklan (talk) 06:00, 7 June 2024 (UTC)Reply
Are we even allowed to fix the gadget if we do all the work and take up no foundation server space? If we give them everything in exchange for nothing will they still shut it down? -- Sleyece (talk) 14:21, 7 June 2024 (UTC)Reply
If you assume everyone is evil as a starting point, eventually it becomes a self-fulfilling prophecy and you will find the evil you seek. Bawolff (talk) 15:46, 7 June 2024 (UTC)Reply
The user's frustration is somewhat warranted, if overly verbose. The Foundation just declared their focus will be to spend the disposable income on adding a button on some social media platforms they don't even have a contract with. So, the budget through 2026 may or may not be flushed down the toilet. It feels like the business equivalent of spending rent money on shares of AMC. -- Sleyece (talk) 16:12, 7 June 2024 (UTC)Reply
I'm not asuming evil, I'm reading the comments. I have tried to discuss this for ages, the result is zero hours commited to improving user's experience, and one year of resources trying to add a button to a platform without having even permission to add it. Theklan (talk) 18:30, 7 June 2024 (UTC)Reply
Am hoping to convince folks at the WMF to meet and discuss this by voice to see what options would be available. Doc James (talk · contribs · email) 07:23, 8 June 2024 (UTC)Reply
Threading buttons into foreign social media API is far more risky than fixing this gadget. -- Sleyece (talk) 22:03, 8 June 2024 (UTC)Reply
One technical aspect is that Our World in Data is written using w:React_(JavaScript_library). However, Wikimedia uses mw:Vue.js for its Javascript/web framework, which means that if it is maintained by WMF staff adding React to the mix requires additional skills and increases complexity and maintenance. It would be also premanently a thing which would be in a list that it should be re-written as Vue to make it nicely integrate to other systems. --Zache (talk) 01:27, 11 June 2024 (UTC)Reply
It definitely does not need to be maintained by WMF if coded properly. It's really a question of whether or not staff is going to allow us to fix it. They certainly don't need to build an entire React framework and maintain it through manual labor. No one is suggesting putting an official staff button masher on payroll. I don't want that.-- Sleyece (talk) 21:35, 11 June 2024 (UTC)Reply

Bawolff Removing uncertainty sounds right. What's an example of a good joint development proposal?

  1. Implementation: OWID have ~12k graphs, which do not change quickly. We should cache the graph-definitions and data for each on our own servers to remove half of the security and privacy issues. No "OWID website update" could break this, it would just become harder to update.
  2. Reach and impact: Each graph is relevant to a couple of different articles in big encyclopedias. There might be 10-20 per topic, and we might only include one or two in an article. Say 2,000 high-visibility pages would use them per language. These are popular topics; they get 100M visitors a year, and are referenced in 20k international reliable sources a year. Wiki articles include additional references where we rely on their data aggregations (in addition to the underlying sources they feature).
  3. Maintenance: Even without a working gadget, we currently have community members in many languages spending time maintaining hand-synchronized versions of their data on a range of pages, on Commons to support bespoke gadgets, and on cousin projects that do directly embed these visuals. These updates could be better automated.
  4. Viz library: To Zache's point, OWID's grapher library has its own maintainers. We don't need to independently maintain it, but can contribute to that open project.

SJ talk  21:58, 12 June 2024 (UTC)Reply

@Sj I think any such plan would include:
  • the objective (not just "embed owid" but something like, allow users to explore data related to high impact pages). Using language of OKRs (or similar frameworks) may make this more appealing to WMF management.
  • how this is complementary to existing WMF plans and goals.
    • one thing that might be worth emphasizing is that this might be a low cost way to gather data on what types of interactive graphs wikipedians find useful, which might feed into later plans around graphs.
  • a proposed architecture. With diagrams and the whole shebang
    • this should not be written in stone, but making it concrete allows people to talk about it, and either accept or reject it (and refine it based on objections). When its just up in the air, its hard to give feedback
    • my opinion, the most viable architecture given currently known requirements, would be to take the toolforge based work, and turn it into a production wmf service.
  • metrics to determine if the project is succesful or not (things like X number of pages on wikipedia use these graphs, users click on graph Y times, users spend Z minutes playing with graph widget)
  • A breakdown of what WMF would have to provide, and what wikimed (or whomever else) would be responsible for. This should be concrete as possible. One of the big risks from a wmf perspective, is they get a half finished product that they are then expected to drop everything and fix. You want to prove to WMF that you fully understand what is involved and there won't be surprise work for them. Anything that is ambigious in terms of what WMF would have to provide significantly increases the liklihood they say no.
  • A post deployment plan. E.g. this might mean wikimed (or someone) is responsible for fixing bugs that occur for 1 year after deployment. After a years time we review how the project is doing against various metrics and decide what to do next, with the assumption being that if it has failed then it gets shutdown. If it has succeded it transitions to being maintained by WMF (presumably after a year the kinks are worked out so it would be low maintenance). I think its very important to spell this part out, and especially to have agreed upon exit strategy if the project doesn't work out. One of the big risks from a wmf perspective is that they get stuck maintaining something nobody uses forever.
  • a list of potential risks, along with an explanation on why that risk is either acceptable or mitigated (or in theory transferred).
  • [optional] other options considered and rejected, along with why they were rejected.
To be clear, i don't think doing this will neccesarily make WMF say yes, just that it is the bare minimum for WMF to treat the idea of a joint development plan seriously. Otherwise this whole thing is basically a community wishlist proposal Bawolff (talk) 04:57, 13 June 2024 (UTC)Reply

Meeting via Voice

[edit]

Have reached out to a number of folks who are involved from the community side and the WMF side to discuss by voice. If there are folks from WMF able to join please reach out. Doc James (talk · contribs · email) 06:16, 12 June 2024 (UTC)Reply

Please keep this talk page updated on the potential of such a meeting. -- Sleyece (talk) 15:31, 21 June 2024 (UTC)Reply