Product and Technology Advisory Council/August 2025 draft PTAC proposals for feedback/Experimentation
This document is an archive of the work done by the Experimentation working group in PTAC to come up with the recommendations.
Purpose
[edit]How WMF can build understanding and support for experimentation and product iteration.
Working Group Team Members
[edit]Problem statement
[edit]Clearly communicating with many audiences about experiments, with the goal of getting the most useful feedback out of an experiment (aka, avoid derailment). How do we collect feedback through experimentation that involves a representative part of the communities and has clear success and failure criteria.
Diagnostic
[edit]Lots of pushback, which might be due to:
- Lack of trust, due to past mistakes
- Lack of predictability (when does it end, what are the criteria for going to the next experiment vs failure criteria cancellation).
- Institutional momentum. Timelines slip, sunken cost etc.
- Is this the right thing to build/test?
- Only dealing with 1 side of the experiment not the other (classic example: newcomers, but no attention for extra work for editors)
- Not having a full solution because this is still the experiment
Communities sometimes are not aligned in their feedback
- Different priorities of different communities
- Different levels of experience of groups
- Differences between wikis in how disruptive an experiment can be
Communities are often surprised and have many questions surrounding process
- Information fragmentation and overload
- "I was not told"
- Too many places for staff and community to keep track of
- The experiment's documentation page can be very complex information if you have to read all of it. Often, the page is very domain and process specific, not always user focused.
- The community tries to avoid disruption. Disruption messes with your ability to work and it changes the carefully built status quo.
Experiments are often big and require lots of work before they can be shared with the community, opening us up to sunken cost fallacy, but yet still skipping steps that are important for a full solution. Discussion focused community. Everything becomes a discussion. This can make a mess out of trying to summarize the answers to the questions asked by the experiment (lack of focus).
Hypotheses
[edit]If there is a central overview of all ongoing and upcoming experiments, people will be less surprised
- Build a calendar keeping track of the major projects and the phases (but it’s different for every wiki…yikes)
Better definition of what type of experiment a community has to deal with.
- Defining labeled phases for experiments: Exploration, testing, validation, iterative improvement, beta, graduation, termination/concluded.
- Set end dates for experiments and keep to those dates.
- What feedback are we seeking from which user? [not sure if this works as a hypothesis to test]
Smaller experiments that are easier to validate, shorten the feedback loop and avoid sunken cost.
- Smaller is easier to test and faster to course correct on.
- More JS gadgets, Toolforge tools and user scripts instead of full on-wiki dev?
Make a youtube video introduction for every experiment as it will be easier to digest than all the written content. (lots of languages to communicate the content. Monthly meetings conducted by some teams were forms of this to some degree) Centralized place to gather output of experiment(s) data (A/B testing graphs) Having better prepared feedback section results in clearer answers (would be interesting to validate) Getting people involved earlier (but we already tried many variations of this, I think)
Things we wonder about
[edit]- Should the community have a say in the failure criteria or if the experiment takes place in the first place (counterpoint: the community often might not have the domain knowledge to evaluate and understand the context behind the experiment or might have inertia towards certain features, for example, Vector 2022)
- Is what the community perceives as feedback the same as what the product teams perceive as feedback? If not, how can we close that gap?
- Giving communities data upfront might help, but often it’s data we are actually in search of. Something we have tried and seems like it often simply causes more distraction?
- Should the Wikimedia Foundation skew its development towards making smaller, iterative prototypes? (counterpoint: this is often not possible in the context of some features)
Experiment
[edit]- Building a central overview of all ongoing and upcoming experiments
- Clearly defining and communicating labeled phases for experiments: Exploration, testing, validation, iterative improvement, beta, graduation, termination/concluded.
- Validate whether having a better-prepared feedback section results in clearer answers.
Appendix: Previous experiments
[edit]For the previous experiments, this table is meant to give a sense of the sorts of experiments we have run and that we want to run more of, and more quickly/easily. It’s a sample of about 10 experiments from the last couple of years, including some for readers, some for editors, some successes, and some failures.
| Analysis of 10 experiments | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|