User:Danilo.mac/Evaluate initiative

This is a personal initiative to organize ideas and develop tools related to the Evaluate, Iterate, and Adapt recommendation, that is part of the Movement Strategy.

What and how to evaluate

Below are some topics and ideas about how to evaluate them. The majority of topics was taken from the recommendation page.

Strategic plan implementation

The initial work is already done, the strategic plan was divided in recommendations and initiatives. We need to try to predict the implementation time and the hierarchy of each initiative, the initiatives that will take more time to implement and those that need to be done before others need to start earlier. We also need to periodically evaluate and update the progress of each initiative, when possible evaluate also the progress of the tasks inside each initiative.

Initial work was done in User:Danilo.mac/Movement Strategy progress.

Equity

Equity involves that everyone that wants to participate in some specific task or decision can do that, even when their capacities on that matter are lower than other participants. So, to evaluate that we need to determine who wants to participate on determined task or decision, then we need to verify if the people with relative lower capacities on that matter have all the info and help they need to give their contribution, and if the people with more capacities related to that matter are patient to wait and help those with less capacity. That involves subjective assumptions, what makes it very hard to automate.

Diversity

At first, we need to understand that we simply can not have high diversity in everywhere due to language and educational barriers, what we need to verify is if we are doing the best we can to reduce those barriers. An approach to do that is list all possible barriers and determine for each one if what is been done is enough or we could do something better.

Inclusion of newcomers

Newcomers can be considered included if they continue to edit months after their first edit, that is what we call user retention, I created the retention tool to show that. The graph in the tool is not very easy to understand, because the user retention is not a very simple metric, so we still need some research on the data the tool shows to determine which activities had a positive impact in user retention, which had negative and which had no impact.

Distribution of resources

To evaluate it we first need to understand it better. In the current annual plan we can see the programmatic breakdown in page 36, that breakdown need to be more detailed, what can be achieved creating an extra breakdown graph for each sector. Once we have those details we can separate what is fundamental, what is important but not fundamental and what is not so important but worth a try. For what is not fundamental we need to measure the impact and compare with the amount of resources spent on that. It is also a good idea try to identify if some practice common to many sectors may be wasting money.

To understand the impact of activities that receive financial resources it is important to create a culture to write reports to all projects, events and activities that received some financial resource or have WMF employees spending significant payed time on it.

Content growth and coverage

Content growth is a trivial data, we can see it now in wikistats. Content coverage is not so simple, we need some reference to compare to. For small wikis we can use as reference the pages that have high number of pageviews in many big wikis, what indicate that those subjects are requested in many languages, so the small wikis probably should have those pages too. We can create a tool to generate those lists. For the big wikis it is harder to find a reference, maybe the affiliates can help with this consulting academic institutions about what kind of information should be added to our projects.

Community health

We need a collection of metrics for that.

Number of blocks, separating by type of target (username, single IP or IP range) and duration
Number of users that broken the three-revert rule
Number of users that stopped edit after being reverted
...

Also, it would be good if we could develop an easy way to identify non constructive discussions.

Equitable governance

All important decisions need to be in a list, indicating who took or will take the decision, where people can opine, and when there is some restriction regarding who can opine in that matter, the justification of that restriction. The ideal would be if all stakeholders would be consulted in every decision that impact the Wikimedia projects. But it is not always possible, sometimes because the decision need to be taken in a short period of time that is not enough to consult all the Wikimedia movement, sometimes it is not possible because the decision requires a specialized knowledge, or the information needed to take the decision include private data that can not be shared with all wikimedians.

Skills development

First we need to define which skills need to be developed. For each skill we can try to get the number of active users that have that skill, and then repeat that evaluation once a year to verify the evolution. In my experience trying to develop technical skills, the people that want to develop the skills usually underestimate the time it take to develop the skill, so my idea is estimate the time it may take to develop each skill and put that information in the skills list.

Partnerships

First it is necessary to clarify the objectives of each partnership and how that aggregate value to the Wikimedia movement. Then establish objective metrics to evaluate the results of the partnership. Different partnerships with different objectives will have different ways to be evaluated, we need to define who will be in charge of the evaluation and ensure it is done periodically.

Infrastructure scalability and sustainability

The technical team that maintain the servers probably know good metrics to evaluate this. It is important to ensure that who have lower technology knowledge can also understand the metrics.

Technology efficiency

Technology efficiency is about how fast and simple someone can find, read, edit and interact with the content. There is a pitfall when evaluate it that we need to be aware, different users have different perceptions about what is efficient, and many volunteers (probably the majority) believe the flexibility is more important than the efficiency. We can see that by what happened with Flow, that seems more efficient but lack the flexibility the traditional wiki page have, what led the majority of the movement to reject its use. So, we need to also evaluate the technology flexibility as something so or more important than the efficiency, if something improve the technology efficiency but reduce the flexibility it will probably annoy the volunteers (some can get mad), so that may not be a good idea.

Quality assurance

Some wikis has quality predicting algorithms like ORES, if that can be expand to all wikis we can use it to evaluate the quality of all wikis. We can also use some simpler metrics like the page size, number of sections, paragraphs, <ref> tags, templates that indicate problems, etc. All those automatic quality predicting can't assess the text quality and can't identify false information in the text, so a research by sampling can be a complement method.

Bug fixing

That can be evaluated consulting phabricator data, the data need to be processed and shown in graphs. Also we need to evaluate if the task priorities set in phabricator follow correct criteria, for example the #1 wish in community wishlist this year is a task requested in 2006 and was in the wishlist of other years, despite of that it is still marked as low priority, what is an indicator that the priorities are not considering the community opinion.

Platform usability and accessibility

This topic can be separate into two subtopics: readers and editors. Readers need to find and read the information they are looking for, editors need resources to make their collaboration more productive.

Iterate and adapt

While evaluating we need to consider that the data must be shared with the communities/stakeholders in a way they can understand the lessons the data contains. It will need also a space for clarifications about the data and to plan actions when the data indicate some change is needed.

Tools

Retention

Status: Done

The retention tool is working, it shows the data processed from all revisions in a wiki, each horizontal line represents the group of users that had their first edit in a specific month, and each "pixel" of that line is a month, starting from the month of first edit till the most recent month, the color indicate how many of the users of that group edited in that month. The more users that keep editing months after their first the more user retention.

That graph is not very simple as a bar graph because we can not join all users in a simple graph, we need to wait for months after the first edit to evaluate the user retention, and users that started to edit in different periods can have different retention. With that 3 axis graph we can see more details, identify events that affected the user retention, and we can select specific month to generate the a bar graph.

List of popular articles

Status: Done in User:Danilo.mac/Popular articles in big wikis

The pageviews indicate how a subject is demanded by the reader, so a subject demanded in many languages can be seeing as an universal high demanded subject, thus we can use that list as a reference to evaluate coverage, mainly for small wikis. It can be also an alternative to the list of articles every Wikipedia should have. The disadvantage is that the list would select articles by popularity instead of importance, and the advantage is that would be an automatic list, with no need to manual select or discussions to define what is important.

When I saw the list, I realized that it is not as useful as I initially thought, the great majority of the list are topics about entertainment, sport, global news in the period, and counties. There is some few topics about history and almost none about science. In resume, too few encyclopedic topics to be useful, that worked more to learn that it is not the better path.

Form tool

Status: Just an idea for now

It would be useful if we have an online form tool that use OAuth to make surveys and other tasks that demand a online forms. That would reduce our dependency on Google Form and other external tools, some people may be more comfortable to share information to an internal tool, and the OAuth would make it easier to verify the user that is filling the form.

Natural language queries

Status: Initial development

Instead of have many tools to query and process data from the databases and the Quarry tool that can only be used by people with SQL knowledge, we could have a tool that accept queries in natural language, translate it to SQL and other ways to collect data, and return a list and/or a graph, that would make it easier to get different types of data. Data that are not in the database replicas (e.g. pageviews) could also be available by adding it a local database, when the data is too big it can be preprocessed to reduce its size before store it in the local db. That would work as an all-in-one tool to show a large variety of data.

That is an old idea I have but have never tried to develop it due to its complexity. However, the advantages of have an all-in-one tool that can be expanded by adding more data to the local database and by adding algorithms to process new types of queries can compensate the complexity of the development.