Research:Visualizing Wikiproject Activity

From Meta, a Wikimedia project coordination wiki
This page documents a completed research project.


logo for WikiProject:Pulse

Topic[edit]

Background[edit]

In the June 27 - July 1 sprint, Shawn and Jonathan presented a picture of new user participation in a set of large, popular wikiprojects. We discovered that the overall frequency with which new users joined these projects was lower than we expected overall (on the order of <10 newbies joining each project every year, in recent years). However, we also discovered that over time there seemed to be a slight increase in the proportion of joiners to these projects who were newbies.

Overview: Current Research[edit]

We decided we wanted to get a better sense of new user participation in Wikiprojects, as well as overall Wikiproject activity. We have created a set of database tables that list various activity metrics for WikiProjects (e.g. # pf joiners, pages claimed, and Wikiproject page activity). We use these tables to track how Wikiproject participation overall has changed over time. This may give researchers and community members a better sense of whether wikiprojects as a whole are still proving to be a vital mechanism for both onboarding new users and helping existing users coordinate their work activities, as well as provide the Foundation with tools for identifying Wikiprojects that are currently inactive or that are struggling to recruit new members.

During the course of this work, we decided it would be worthwhile to structure our research around the idea of a 'dashboard' for visualizing the various metrics we'd collected, at the level of individual WikiProjects. Many of these metrics had never been tracked before, at least not on such a grand scale, and for good reason.

However, before we started building a tool, we needed to know what kind of problems current Wikiprojects are facing--whether in recruiting new members, tracking progress towards goals, or measuring activity levels--in order to make sure the visualizer we created would actually be useful to the community. So we conducted a series of interviews with core contributors to major Wikiprojects (Military History, Feminism, Law, Albums, and the Wikiproject Council), and asked these Wikipedians what their Wikiproject experience was like, why they contributed, what they did, and what challenges they experienced.


We're still looking for more interview participants. If you're interested, contact User:Jtmorgan.

Motivation: Why study Wikiprojects?[edit]

Number of days it takes English Wikipedians to join WikiProjects vs. the number of edits to English Wikipedia after joining. Color coded by whether the joiner was a newbie (i.e. had less than 100 edits) when they joined.

Knowing Wikiprojects offer one of the best opportunities for new users to log facetime on the front lines with veterans, and learn from them. Learning about the small minority of newbies who participate in Wikiprojects early on in their careers will give us important insights into not only what kind of mentorship is happening in these projects, but also what kind of work newbies do in these Wikiprojects. If wikiprojects offer newbies a better way of learning while contributing in a meaningful and enjoyable way without earning the ire of their peers, the community and the foundation should consider making it a priority to direct more new users towards Wikiprojects that may be relevant to their interests.

Wikiprojects also provide communities within Wikipedia that can help forge productive work relationships, friendships, and give veteran editors a reason to continue contributing. Publishing a list of which Wikiprojects are currently active not only helps us socialize new users (for instance, by pointing them to active spaces where they can interact with real Wikipedians), it can also give the Foundation and the community a better sense of which wikiprojects are working best in 2011. This inquiry is not meant to make less active wikiprojects look bad: rather, we want to see if we can find out what it is about these highly active wikiprojects that makes them active.

For instance, some features that might contribute to wikiproject success are (just guessing here):

  • a dedicated group of core contributors
  • regular communication among project members on the project talk page
  • outreach to new users
  • a high rate of templating/claiming new pages
  • presence of active task forces and regular sprints
  • regularly updated and well-maintained task lists

Research Questions[edit]

Wikiproject Status

  • what is the overall status of WIkiprojects in 2011?
    • how many people are joining Wikiprojects?
    • how many articles are Wikiprojects claiming?
    • how much coordinating activity are Wikiproject pages experiencing?
    • what are the 'top' Wikiprojects? (by these or other important measures?)

Wikiprojects and Newbies

  • which Wikiprojects are newbies joining?
  • which Wikiprojects have the greatest positive impact on newbie retention on Wikipedia?

Wikiproject Health

  • what is the optimal size for a Wikiproject?
  • why do Wikiprojects split (or 'bud') other Wikiprojects?
  • why do Wikiprojects become inactive?
  • what tools do Wikiproject members currently use to communicate, coordinate, advertise, track work?


(RQ2.15) (RQ3.10)

Process[edit]

Wikiproject Datasets: Activity, Membership and Scope[edit]

We created a set of MySQL database tables which can be used to evaluate Wikiprojects across a variety of features and factors. These tables will allow us to perform a variety of analyses, create compelling visualizations, and provide more robust sampling criteria for further qualitative analysis. We will also publish the queries and scripts used to compile these tables from the basic slave tables, so that they can be recompiled in future and potentially used to power WikiProject:Pulse or other visualization tools.

Challenges of measuring WikiProject activity. Because Wikiprojects track their membership in several different ways, it is difficult to come up with a comprehensive list of every Wikipedian who has joined a Wikiproject. One of the main contributions of the current collaboration has been to create such a comprehensive list and also define a methodology for automatically updating those datasets in future.

Wikiproject Membership: Some examples of how people add themselves as WikiProject members

  • Members add their name to a sub-page of the main project page: Wikiproject_Llamas/Members, Wikiproject_Javelinas/Participant_List
  • Member adds their name to the member list of a Wikiproject task force (but not the main WikiProject member list): Wikiproject_Llamas/Monty_Python_Task_Force/Participants
  • Members add their name to a section of the main Wikiproject page: Wikiproject_Jaguarundis#contributors
  • Member adds Wikiproject userbox to their own userpage: {{User_Slow_Loris}}, {{participant|Capybara}}
  • Member adds Wikiproject member category to their userpage: Category_Civet_Cat_Members
The number of editors who join Wikiprojects from 2005-2011 via three primary methods by which editors signal project membership: adding their username to members section of the Wikiproject page, adding username to member subpage, and adding member category to their own userpage.

As is evident above, consolidating these different membership mechanisms is difficult not only because of the different spaces where members signal their membership, but also because there are a range of possible syntactic options ("members" vs. "participants" vs. "contributors_list"). And in some cases a member will add their name to a task force page, but NOT the main member list. In addition, a member may have the Wikiproject_Llamas userbox on their user page, AND have added their name to the Wikiproject_Llamas/Monty_Python_Task_Force/Members page, AND have added the Wikiproject_Llamas_Participants category to their userpage. In cases like this, we choose the one that they did first, and assign them a join date for that WikiProject based on that revision.

These difficulties are compounded by the way the database is structured: for instance, if Wikiproject_Llamas tracks membership through a category page, there is currently no key on the categorylinks tables that associates that category page with the Wikiproject Llamas page. You have to parse the string, which might be Wikiproject_Llamas_members OR Wikiproject_Llamas_participants... or some other variant. Nonetheless, we soldier on :)

Interviews: Editor Experience and Design Requirements[edit]

We have conducted interviews with subject matter experts (esp. people currently involved in Wikiprojects as creators, admins and regular members) to get a better sense of the current status of Wikiprojects as communities and persistent collaborations on the English language Wikipedia. Our first interview (on July 27) was with Maggie Dennis--WMF Community Liaison and active member of Wikiproject Albums. (interview questions below). We subsequently interviewed 11 (as of August 30) other Wikiproject members--most of whom were members of WikiProject Military History, but also members of the Wikiproject Council, Wikiproject Feminism, and other projects.

Sample Interview questions. Interviews were conducted in person, over IM/IRC or over email (if necessary). A sample of list of interview questions are listed below. A fuller list is available by contacting user:Jtmorgan. Face-to-face, IM and IRC interviews were semi-structured, meaning that not all interview questions on the document were necessarily asked during the course of the interview, and additional followup, clarification or spontaneous questions were asked as appropriate. Interview transcripts were analyzed for recurring themes and important issues, especially those which might be tied to design requirements for WikiProject:Pulse.


  • Wikiproject Military History
    • What are you doing in your current role evaluating Wikiproject Military History?
    • What are its current challenges, activities and priorities?
    • How does it handle recruitment and retention?
    • Compare Military History to other Wikiprojects (on whatever criteria you think are important)?
  • Wikiprojects in general
    • What makes a healthy/successful/productive/fun/cool/good/sweet/righteous/hella tight Wikiproject to you?
    • What are ways that the community (proj. leaders, members, etc.) evaluate health/success of Wikiprojects?
    • Are there any themes, trends or changes you're seeing among Wikiprojects lately?
    • Tell me about a Wikiproject you can think of that have been successful in the past, but which are currently struggling? Why?
    • Tell me about some Wikiprojects that successfully attract new members?
    • What are some (other) examples of healthy Wikiprojects?
    • What role do you think that Wikiprojects currently play in Wikipedia? How important are they?
  • Follow-up
    • What role have you played in as a Wikiproject member in the past?
    • What questions do you have about Wikiprojects?
    • Are there any Wikiprojects that you are particularly interested in learning about (that we could research)?
    • We're interested in creating a resource (a kind of browsable guide to Wikiprojects) based on the data we've collected, and sharing that resource. Who would be interested in that resource? What kind of metrics should we include? What kind should we be careful about including?
    • What kind of resources (previous research studies, publications, or editors) already exist that relate to Wikiprojects, that we might not know about?
    • Are there any Wikiproject members you know who would be interested in chatting with us over voice/IM?

Results and discussion[edit]

Top Lists[edit]

Our new data tables allow us to analyze Wikiproject activities according to a variety of measures that were previously difficult to get at and/or combine. This lets us see which Wikiprojects are running full speed ahead, and which ones seem to have slowed or may be struggling to find new members, coordinate project activities or track their work. The lists below present data on the activities on the currently active Wikiprojects. Unless otherwise specified, all data are for the time period of January 1st through August 30th, 2011.

Top 20 Wikiprojects by newly joined, newbie member edits to Wikiproject-claimed articles, 1/1/2011 through 8/30/2011
wikiproject # of edits by newbies to articles claimed by the wikiproject
WikiProject Football 21152
WikiProject Film 16396
WikiProject Biography 13685
WikiProject India 12404
WikiProject Olympics 9037
WikiProject Television 6089
WikiProject Albums 5765
WikiProject United States 5126
WikiProject Indonesia 5042
WikiProject United States Public Policy 4796
WikiProject Death 4334
WikiProject Comics 3627
WikiProject Swimming 3539
WikiProject Jazz 3321
WikiProject Guild of Copy Editors 3090
WikiProject Lincolnshire 2977
WikiProject Trains 2927
WikiProject Science Fiction 2882
WikiProject Archaeology 2731
WikiProject Mixed martial arts 2660


Top 20 Wikiprojects by number of new project members, 1/1/2011 through 8/30/2011
wikiproject new members who are newbies new members who are Wikipedians all new members
WikiProject Wikipedians against censorship 458 23 481
WikiProject United States Public Policy 34 229 263
WikiProject United States 179 69 248
WikiProject Guild of Copy Editors 68 140 208
WikiProject Popular Culture 113 20 133
WikiProject Biography 36 69 105
WikiProject Film 33 61 94
WikiProject India 27 59 86
WikiProject Wikify 29 54 83
WikiProject Pakistan 54 15 69
WikiProject Anime and manga 43 25 68
WikiProject Football 25 42 67
WikiProject Military history 20 38 58
WikiProject Television 32 18 50
WikiProject Aviation 15 35 50
WikiProject Video games 13 37 50
WikiProject Stub sorting 44 2 46
WikiProject Women\'s History 26 20 46
WikiProject Spaceflight 22 21 43
WikiProject Albums 16 27 43
Top 20 Wikiprojects by edits to project page and talk page, 1/1/2011 through 8/30/2011
Wikiproject name total edits to Wikiproject page and associated talk page, 2011
WikiProject Football 8003
WikiProject Video games 3364
WikiProject Birds 3326
WikiProject Military history 3320
WikiProject Mathematics 2962
WikiProject Ice Hockey 2268
WikiProject Film 1779
WikiProject Medicine 1382
WikiProject Anime and manga 1374
WikiProject Spam 1365
WikiProject Ireland 1328
WikiProject Cricket 1296
WikiProject National Register of Historic Places 1237
WikiProject Automobiles 1217
WikiProject Plants 1092
WikiProject Baseball 1072
WikiProject Aircraft 1072
WikiProject Ships 1052
WikiProject Women's History 1045
WikiProject United States 1017
Top 20 Articles claimed by the most Wikiprojects as of August 2011
Article Number of Wikiprojects claiming the article
Borean languages 32
Burlington Northern Railroad 24
Western Asia 23
Union Pacific Railroad 23
Arabic language 22
Replicas of the Statue of Liberty 22
Judaism 21
Middle East 20
Indo-European languages 19
Ron Paul 19
Eurasiatic languages 19
Languages of Asia 18
Memory of the World Register - Asia and the Pacific 17
Derek Stingley 17
Union for the Mediterranean 17
Group of 9 16
List of World War I flying aces 16
Plantations in the American South 16
Plantation complexes in the Southeastern United States 16
Proto-Slavic 16

Decline in Wikiproject membership[edit]

Graph with trendlines showing number of newbies (<=100 edits), Wikipedians (>100 edits) and all editors (newbies + Wikipedians) joining all Wikiprojects, 2005-2011
Graph with trendlines showing number of newbies (<=100 edits), Wikipedians (>100 edits) and all editors (newbies + Wikipedians) joining all Wikiprojects, 2005-2011. Includes trendlines.

The decline in Wikiproject membership roughly matches the decline in Wikipedia membership as a whole, with a substantial decline between 2007 peak and mid-2009, and a more level (but still apparent) decline since then. The large spike in June 2009 seems to be due to an organic but transient increase in the number of Wikipedians joining Wikiprojects with Wikiproject Turkey (70 new members) and Wikiproject Songs (66 new members) leading the pack. The large spike in December 2010 was due to a large number of editors (574) joining Wikiproject Anime and Manga. The large spike in March 2011 is mostly due to a large number of Wikipedians (471 new members) joining the Wikiproject Wikipedians Against Censorship and to a lesser extent Wikiproject Popular Culture (108 new members).

Interestingly, neither of these membership spikes had a very large effect on the rate at which newbies joined Wikiprojects. The number of newbies joining Wikiprojects has remained relatively stable since mid-2009.

Interview Results, Part 1: Requirements for Wikiproject Pulse (dashboard)[edit]

Based on the results of 11 interviews with Wikiproject members, here are a set of requirements for the creation of an integrated tool that provides metrics about Wikiproject activity. Some of these recommendations may also be useful for considering other interface improvements that help editors create, maintain, and join Wikiprojects. Possible challenges of implementation, as well as some additional thoughts and considerations, are listed at the end.

General Design Rationale

  • a tool for both tracking the health of a Wikiproject and helping project members track their work towards project goals
    • lots of currrent tools and templates for tracking these things, but there is no unified framework and it's costly to set up and maintain these existing mechanisms. Also, they don't cover everything that members want to see/do.
    • Stakeholders: who would be interested in a Wikiproject Dashboard?
      • members of the Wikiproject in question
      • members of other Wikiprojects
        • potential collaborators (between projects)
      • people interested in joining the Wikiproject
        • newbies
        • active Wikipedians with an interest or a history of activity in the topic/process space

Dashboard Feature Requirements

  • Represent Member activity
    • is the project active?
      • project talk pages are where a lot of this activity takes place
        • p2: "There wasn't much formalized interaction at that point [when milHist started to grow], mainly just conversations on the project talk page"
        • p2: "Probably [include] high*level activity metrics. For example, metrics related to featured content production, project size and growth (in articles and/or members), volume of traffic on project discussion pages, etc... It's probably the low*hanging fruit of those metrics... Non*members tend to be interested in whether a project is active enough to be worth becoming involved in."
        • p3: "In terms of signs [of a healthy Wikiproject], lots of active editors and 'busy' project talk pages are obviously the key factor."
        • p9: "...the WikiProject Talk page, that's were most of the action is."
      • who is active in the project?
    • Advertise Article Quality Achievements and Milestones
      • what are the FAs/GAs that are in progress in this project?
      • what is the percentage of FAs/GAs (recent and otherwise) that this project has produced or aided in?
      • Making project metrics available to interested outside parties, such as other projects that serve similar functions
        • p1: "some automated ways of announcing milhist article reviews at ACR, FA and GA would be useful... I mean, announcing that an article has been nominated for one of those processes. At the moment it's done by hand"
    • Support internal processes
      • p1 (Q.18): "From what I've seen in smaller projects the administrative overhead is comparatively larger because it falls on proportionally fewer editors. Anything that reduces that would no doubt be welcome, so I guess (but it is a guess) that things like tracking open tasks, reviews, progress towards targets, article metrics and the like would be more useful for them."
      • p1: "we've sort of put together the bare bones of an Academy (WP:MHA) at mihist, where we've gathered essays and articles from across the project. The intention is to develop it into a resource for project members (any anyone else) where they can find out how to do tasks like review an article, create a map, write decent prose, manage a project, be a coordinator, deal with POV pushing etc etc. Development work is one of those perennial jobs that never seems to get done because there's always more stuff to do... If you have any ideas in that regard * for example, developing a template or interface that could be used as the basis for self*study, that would be received with interest."
      • p2: "To be successful, a WikiProject needs to have a core of people doing administrative work, maintaining processes, building infrastructure, and so forth. This isn't necessarily very fun or very glamorous work. So most people won't do it, or will stop doing it after a brief stint."
    • Represent project growth and scope
      • Total Article count
        • p1: "Total article count might be useful - at the moment that's hard to get to"
        • p2: "One of the main reasons WikiProjects fail is because they choose an unsuitable scope. The proposal process provides an opportunity for someone to point that out, and ideally redirect the proposer's energy towards something more productive. In practice, though, people tend to create projects for their favorite topics regardless of any advice to the contrary."
      • Activity of project members within project scope
        • p2: "Statistics for the articles in the project scope, meanwhile, are interesting but not necessarily related. A project can be completely inactive even while the articles it covers are a hive of activity. If the editors involved do all their discussion one*on*one or on individual article talk pages. Overall, it's probably useful to look at both "direct" activity (i.e. participation within a project internally) and "indirect" activity (i.e. work on related articles) to get a full picture of how active a project is."
    • Facilitate Information-sharing between projects
      • example: milHist 'shares' its review process with other projects (Ships, Aviation).
        • p1: (Q.19) "Possibly where we share reviews, though in reality that means milhist is conducting the review and we just update their banner template when we're done. However a facility might be good. Also, it's difficult for us to track milhist articles at GA especially * anything around this area that could list nominations would be very handy."
        • p2: "After the first few months, [milHist] had started to develop a fairly sophisticated infrastructure, and we offered the target project the opportunity to leverage it"
    • Help with Editor Recruitment
      • who is editing in this topic/process space but is not a member of the project?
        • p1: "What I'd really like is some way to identify new editors we could invite to join up..."
        • does this person want to be contacted? (connection to "structured profiles" design idea). I would recommend that there be an "opt in" feature for this, since it could be perceived as invasive


Potential Challenges for Implementation

  • we need to be careful designing ‚for people who are used to designing for themselves.
      • p1 (re: Q.16): "I'd need to take some more time to think about this to be honest * our current interface etc was designed by consensus and we're pretty happy with it"
      • p2: "Process statistics (articles undergoing different reviews, deletion nominations, RFCs, etc.), currently generated by a combination of manual work and the article alert bots. One potential difficulty here is that some of these items are different between projects. Some projects use non*standard assessment schemes, or have specialized review processes, and won't be happy with a tool that doesn't take those into account. These also tend to be the more active and well*developed projects, so getting them on board will probably be important to gain wide acceptance of the new tool."
    • one of the key factors in the success of a Wikiproject is strong leadership/dedication by a few key members, which is hard to 'measure' in the same way as other metrics.
    • we should avoid ranking, especially ranking individual project people
      • p2: "anything that ranks individual project members (e.g. by edit counts, by edit volume, by articles written, etc.) needs to be approached carefully. People tend to get upset over such things, particularly if the "wrong" people come out on top."


Other Notes/Considerations

  • Importance of assisting with bureaucratic overhead
      • Often, it seems like a few core people end up doing most of the work (esp. the bureaucratic overhead of maintaining the project page, adding templates, tracking processes status, etc.) In general, assisting these core members with the bureaucratic overhead stuff seems like it would be welcome.
    • Need to replicate (but don't replace) current functionality
      • p2: "Depending on whether [a Wikiproject Dashboard] is meant to replace the existing methods or merely supplement them, it would presumably need to replicate the current functionality"
    • some things, like FAs and GAs, are considered to be Wikiprojects even though they don't have "Wikiproject" in the title.
      • This has ramifications for the Wikiproject Dashboard, since we're currently only looking at projects that CALL themselves Wikiprojects

Interview Results, Part 2: On The Importance of WikiProjects[edit]

We are in the process of conducting additional analysis (that is less software-oriented) on the interview transcripts. These results are geared towards representing why WikiProjects matter on Wikipedia in 2011, and will include more discussion and quotes related to general themes and trends that arose during the interviews themselves, such as the role of Wikiprojects in coordinating editing work, and the positive and negative experiences of Wikiproject editors.

Wikiprojects: What are they good for?

"In terms of the role of WikiProjects. I think they're quite important when they function well. A lot of the mid*level work (e.g. subsidiary reviews prior to the featured content system, topic*specific guideline and standard development, informal dispute resolution) takes places within WikiProjects. Most of these functions are too broad, and require too much manpower and infrastructure, to function on the level of individual articles. But there are too many different articles and areas involved for a centralized process to work. Groups split up by topic area are a natural solution. However, a lot of WikiProjects are inactive, so the effectiveness of the system varies greatly by topic area. Some areas have effective processes in place, while others have nothing significant happening at the WikiProject level, and are limited to article*level collaboration. This is something that needs to be fixed for WikiProjects to become really effective as a Wikipedia*wide system, rather than as a collection of isolated groups."

"Wikiprojects are particularly useful as a repository of corporate knowledge and encouraging a community of like-minded people. The corporate knowledge function is particularly important in helping new editors get started as the 'old hands' can offer advice from their experience and point to helpful resources."

"...many wikiprojects are inactive or largely inactive. Some serve as central points for discussion of their topic area, but only a few actively support editors and articles. So I would say the CONCEPT of wikiprojects is largely irrelevant, but there still are a few wikiprojects that are useful."

"In my view they are maybe THE primary engine for improving article quality (rather than just quantity). And as projects like en.wiki mature, they are going to have to shift focus from quantity to quality. WikiProjects are the most logical way to organize this work."

"I think they still have a part to play in the imploding Wikipedia. The projects can still marshal the organisation and resources needed to do things beyond the capacity of individuals."

Future work[edit]

Here at summer's end, the future of this project is uncertain. We believe that WikiProjects are still important for Wikipedia in 2011. In fact, given that the community and the Foundation are currently seeking ways to a) draw in new productive contributors, and b) focus on quality over quantity, WikiProjects might be even more important now than ever, since they provide ready-made interest communities for new members to become a part of as they learn the ropes, and since they focus so much effort on improving article quality and coverage within topics.

We also believe that WikiProject Pulse is a good idea, and that in general WikiProjects would benefit from better tools to help them coordinate and track their work (especially ones that replace or supplement difficult aspects of the interface, rather than an adding levels of complexity) so that they can spend more time editing and having fun.

Anyone interested in helping push forward this work should contact the researchers listed above. Otherwise, we will continue to track progress on this page, as it happens.

Resources and Previous work[edit]

Research from previous sprints shows that new users don't have a lot of opportunities to interact positively with veteran Wikipedians. However, Wikiproject participants form close-knit groups of people at all levels of editorship who have shared interests, so they should provide ideal sites for this kind of interaction. This sprint follows up on previous research on how participating in Wikiprojects helps new users learn the ropes of Wikipedia.

Other resources: