Research:CSCW15 workshop/Notes

From Meta, a Wikimedia project coordination wiki
Jump to navigation Jump to search

Open collaboration communities like Wikipedia, GalaxyZoo, Reddit, and Imgur support self-organized, cooperative work by distributed networks of volunteers. Both CSCW researchers who study these online communities and industry practitioners who support these communities through research need to understand phenomena related to governance, quality control, motivation, and socialization within these systems. However, there are few opportunities for academic and industry researchers to communicate and collaborate.

This workshop will bring researchers from these two arenas together to develop a framework for performing open collaboration research that supports the work of the communities being studied and also advances scientific understanding. Workshop participants will share methods, tools, data, and research findings in order to identify key research questions in open collaboration research and discuss best practices for academic/industry research partnerships.



Open collaboration system (OCS)

[1] defines an open collaboration system as an online environment that
  1. supports the collective production of an artifact
  2. through a technologically mediated collaboration platform
  3. that presents a low barrier to entry and exit, and
  4. supports the emergence of persistent but malleable social structures.


  • Identify key areas where research is necessary to support open collaboration
  • Develop shared understanding of open questions for practitioners and academics
  • Pilot a data repository by publishing a small set of well-documented and compelling OC datasets
  • Develop shared understanding of the kind of data that academic researchers have/need
  • Develop shared understand of the kind of data that industry researchers have/need
Future collaboration
  • Propose a framework for collaboration between researchers in industry/academia
  • Academic researchers perform research that can provide actionable insights for indistry data scientists, community managers

Title ideas[edit]

  • Researching Online Communities Workshop
  • Nature and Data of Open Collaboration Workshop
  • Open Collaboration Workshop
  • Towards Industry/Academic Partnership on Researching Open Collaboration
  • Sharing Data & Goals: Industry/Academic Partnership on Researching Open Collaboration





Open collaboration communities have inverted the traditional publishing model: Through the use of technologies built for the internet, information consumers can take part in (or even take ownership of) production itself. While removing barriers to participation dramatically boosts the number of person-hours that can be dedicated to production, it introduces a novel set of problems. These include <list topics from below here>

Allocation (of work)
  • Nature:
    • Natural patterns of work & collaboration in open source software [2]
    • Natural patterns of newcomers to start by fixing mistakes & adding missing content in Wikipedia [3]
    • The progressive reduction in work-rate over time for Wikipedians [4]
    • Natural clustering of editing activity around Wikipedia articles related to breaking news events [5]
    • In Wikipedia different distributions of work on an article lead to different outcomes. [6]
    • Work investment per contributor tends to follow a powerlaw distribution in user-generated content communities [7]
  • Technologies
    • The use of lists by Wikipedians to identify and direct work [8]
    • Wikipedia's SuggestBot helps editors find work to do that they're likely to find interesting [9]
    • Calling focus to areas in need of work was effective in increasing contribition to a geo-wiki [10]
    • (TODO) [11]
Regulation (of behavior)
  • Development of norms -- regulate people's activities to work together
    • The wiki structure lends itself to the development and enforcement of norms [12]
    • On Wikipedia, norms are developed and first used by admins, then power users, and later, newcomers. [13]
    • While the development of formalized norms in Wikipedia was open at first, the process has become increasingly calcified. [14]
Quality management (of product)
  • Dealing with spam/vandalism/damage
    • Wikipedia's distributed cognition system supports a stygmergic strategy of coordination around undesirable editor detection [15]
    • Wikipedia's quality control systems disproportionately affect newcomers[14]
  • Technologies
    • [16] design an automated system that, based on implicit review of content, assigned a "trustworthiness" score to Wikipedia's content.
    • [17] developed an semi-automated system for identifying and removing spammy contributions to Wikipedia. See also (en:WP:Huggle and en:User:ClueBot NG)
    • When Wikipedia's primary counter-vandalism bot went down the time to revert damage doubled. Humans picked up the slack, but where much slower than the bot. [18]
Community management
  • Onboarding/socialization process
    • Legitimate peripheral participation (LPP) provides a lens to view the introduction of new members to a community of practice [19]
    • Studies of Wikipedia suggest that new editors self-socialize in ways described by LPP [3]
    • Successful newcomers to open source software projects follow a "join script" where they participate peripherally (mailing list and bug reports) before trying more substantial activities (patches/pull requests/applying for committer) [20]
    • A general framework for patterns of moving from a lurker/consumer to producer/community organizer. [21]
  • Newcomer retention
    • In 2007, Wikipedia began to experience a sudden decline in the number of active contributors -- and the cause is decreased newcomer retention [22]
    • The reason for Wikipedia's declining newcomer retention is tied strongly to the use of automated counter-vandalism tools.[14]
    • The design of Wikipedia's counter-vandalism tools frame newcomers as suspicious and this leads to bad treatment[23]
    • Differing socialization tactics lead to different outcome in Wikipedia [24]
    • Response to inquiries leads to retention and increased activity. [25]
    • Newcomers struggle with the massive amount of formalized and implicit norms in the mature English Wikiepdia[14] so Wikipedians developed a safe space for them to ask questions and receive support[26]
  • Motivation of experienced contributors
    • Anthony, Smith, and Williamson, 2007. The Quality of Open Source Production: Zealots and Good Samaritans in the Case of Wikipedia. Rationality and Society.
    • Glott, Ruediger; Schmidt, Phillipp; Ghosh, Rishab. "Wikipedia Survey - Overview of Results". Wikipedia Study. UNU-MERIT.
    • Nov, Oded (2007). "What Motivates Wikipedians?". Communications of the ACM 50 (11): 60–64.
    • Yang, Heng-Li; Lai, Cheng-Yu (November 2010). "Motivations of Wikipedia content contributors". Computers in Human Behavior 26 (6): 1377–1383.
    • See also
  • Need for a core community
    • Discussion of the roles adopted by Wikipedians and the rates at which such roles manifest [27]
    • [21] argues that a core group of "Leaders" is necessary, and due to churn, must be regularly infused with new leaders.
  • Technologies
    • gives newcomes an introduction to norms and policies.
    • [26] makes use of a robot and implicit signals of desirability for newcomers to extend invitations to good newcomers.
    • [23] calls attention to the poor treatment of newcomers and provides a means for mentors to quickly identify and rescue the most desirable newcomers.
Reflection -- Identity, setting goals and observing progress.
  • Social translucence helps reflection.
  • Tracking and reacting to large-scale trends
  • Able to set and track goals


  • open data vs. user privacy
  • establishing shared definitions for top-line metrics across systems
  • sharing and documenting public datasets

Future collaboration[edit]

  • institutional review & community review
  • knowledge transfer (design implications)
  • open access
  • formal and informal frameworks for supporting research of open collaboration communities
  • supporting open collaboration communities through research partnerships


  • required. organizers prepare curated datasets for the workshop. Host on
  • required. participants submit short position papers: summarize relevant research they've done and/or pose questions, challenges, issues for group discussion
  • required. organizers prepare lightning talks on their communities, activities, challenges, & research needs
  • optional. participants share curated datasets on
  • optional. selected participants invited to prepare lightning talks about their position papers
  • optional. leaders of related initiatives (e.g. Digital Ecosystems Research Partnership, Open Collaboration Data Factory) invited to give lightning talks about their projects


  • produce awesome stuff on top of this data
  • ..?
  • profit :)


  • relationship to DERP? Join but maintain more focused separate initiative?
co-organized with DERP founders, focus is complementary: this event will help build out the DERP network, and help develop reviewing standards, common metrics, and identify Big Questions Jmorgan (WMF) (talk) 00:26, 2 July 2014 (UTC)
  • should we ask for position papers or other prep work to seed the conversation? Who from?
    • If the goal is to come up with open questions across OCS's, yes.
all participants will submit position papers, organizers will come prepared to present/articulate their own positions as well Jmorgan (WMF) (talk) 00:26, 2 July 2014 (UTC)
  • should we ask industry reps to give short presentations of their research needs?
    • This sounds interesting. I guess by "research needs" you mean "important ongoing open problems for which research might yield actionable insights"?
yes, that's a good articulation of it :) all organizers will give lighting talks, and industry participants with compelling position papers will also be invited to participate Jmorgan (WMF) (talk) 00:26, 2 July 2014 (UTC)
  • ask industry researchers to make their data resources available beforehand? (MVP: a summary page like Research:Data)
    • Do you want data to be available there? If so, why? Is that data to answer particular questions, or something more?
WMF, Reddit, Imgur (and possibly other organizers) will post some sample datasets on These datasets can help anchor group discussion about shared top-line metrics, common challenges & important questions for OCs. They also provide a novel value proposition for attendees: "here is some cool Reddit data to play with, and here are some things you could do with it" Jmorgan (WMF) (talk) 00:26, 2 July 2014 (UTC)
  • " two communities of practice " feels awkward. I don't know what you'd replace it with, though.
has been replaced with less jargon. Jmorgan (WMF) (talk) 00:26, 2 July 2014 (UTC)
  • What does it mean to "support the work of these online communities through research"? Be more specific, if possible.
not sure how specific we can get in the space of the abstract. Mostly talking about the work that folks in Data Science/Community Manager roles do to understand community health, evaluate feature usage, etc. Agree that the proposal itself will need to expand on this. Jmorgan (WMF) (talk) 00:26, 2 July 2014 (UTC)
  • Similarly "a framework that...supports the work of the communities being studied" doesn't make sense to me. If the goal is to support communities, shouldn't community managers (rather than just researchers at particular PP-communities) be the target? I've added them (remove if that's not your intention!)
Community managers are definitely a target audience. Even those who don't do data science themselves will have important insights to share. Jmorgan (WMF) (talk) 00:26, 2 July 2014 (UTC)
  • "Datasets and APIs will be prepared in advance and shared between participants" <-- that's not a goal, that's a procedure
good point. changed it to "Pilot a data repository by publishing a small set of well-documented and compelling OC datasets". Jmorgan (WMF) (talk) 00:41, 2 July 2014 (UTC)
  • I'm not sure what "Nature" means to you -- maybe something like "Real-world problems/challenges"? A different term would be better.
I agree... any ideas of how we would re-organize our themes? Jmorgan (WMF) (talk) 00:42, 2 July 2014 (UTC)
  • Can you give an example of how technical innovation might be needed to contribute to Grand Challenges? I guess this is something about social and data translucence (e.g. usage statistics from an article being shown, which then informs priorities)
struggled with how to articulate this, but then ended up dropping "Grand Challenge" language altogether. We probably won't push the technical innovation angle at all. Jmorgan (WMF) (talk) 00:41, 2 July 2014 (UTC)
  • For a one-day workshop, on top of a framework, this sounds like a lot: "Participants will also partner to share methods, tools and data, perform analyses, and develop best practices for open collaboration research." Are you foreseeing ongoing collaborations, and how will they be fostered?
Truth be told, we probably won't perform much analysis during the workshop: it won't be a true hackathon. Sharing methods, tools and data will happen through lightning talks and during afternoon breakout groups. Developing best practices (and identifying key research questions/needs) will happen through a roundtable discussion after the lightning talks, and probably also through breakout groups. Subsequent collaboration will be facilitated through partner initiatives (Digital Ecosystem Research Partnership and Open Collaboration Data Factory) and through personal relationships formed during the workshop. Also, we will create a mailing list :) Jmorgan (WMF) (talk) 00:41, 2 July 2014 (UTC)




  • Stuart Lynn (Zooniverse)

Other open organizations to contact[edit]

as co-organizers or participants

  • JM: Nathon Maton (product manager)
  • JM: Chris Lintott (founder)
  • DT: Jeff Atwood (co-founder)
  • AH: Kevin, Bhupendra & Mike from Analytics
  • DT: Arfon Smith (former Zooniverse CTO, now head of GitHub science)
  • ??
  • TH: has some contacts
  • ??
  • Max Goodman Yes check.svg Done
  • TH: via derp
  • TH: via derp

Academics to contact[edit]

probably as participants

  • Sean Goggins
  • Mako Hill
  • Kevin Crowston
  • Andrea Forte
  • Elizabeth Gerber
  • Brian Keegan
  • Stuart Geiger
  • Jodi Schneider
  • Giovanni Ciampaglia
  • Aaron Shaw
  • Mark Zachry
  • David McDonald
  • Alex Leavitt
  • Edith Law
  • Bluma Gelley
  • Gabe Mugar

Related past workshops[edit]

this is just fodder for proposal-creation


  1. Forte, Andrea, and Cliff Lampe. "Defining, Understanding, and Supporting Open Collaboration Lessons From the Literature." American Behavioral Scientist 57.5 (2013): 535-547.
  2. den Besten, M., Dalle, J. M., & Galia, F. (2008). The allocation of collaborative efforts in open-source software. Information Economics and Policy, 20(4), 316-322.
  3. a b Bryant, S. L., Forte, A., & Bruckman, A. (2005, November). Becoming Wikipedian: transformation of participation in a collaborative online encyclopedia. In Proceedings of the 2005 international ACM SIGGROUP conference on Supporting group work (pp. 1-10). ACM.
  4. Panciera, K., Halfaker, A., & Terveen, L. (2009, May). Wikipedians are born, not made: a study of power editors on Wikipedia. In Proceedings of the ACM 2009 international conference on Supporting group work (pp. 51-60). ACM.
  5. Keegan, B., Gergle, D., & Contractor, N. (2013). Hot off the wiki: Structures and dynamics of Wikipedia’s coverage of breaking news events. American Behavioral Scientist, 0002764212469367.
  6. Kittur, A., Chi, E., Pendleton, B. A., Suh, B., & Mytkowicz, T. (2007). Power of the few vs. wisdom of the crowd: Wikipedia and the rise of the bourgeoisie. World wide web, 1(2), 19.
  7. D.M. Wilkinson. Strong regularities in online peer production. In Ecommerce ’08, pages 302–309. ACM, 2008.
  8. Visualizing Activity on Wikipedia with Chromograms. Martin Wattenberg, Fernanda B. Viégas, and Kate Hollenbach. Interact 2007.
  9. Cosley, D., Frankowski, D., Terveen, L., & Riedl, J. (2007, January). SuggestBot: using intelligent task routing to help people find work in wikipedia. In Proceedings of the 12th international conference on Intelligent user interfaces (pp. 32-41). ACM.
  10. Priedhorsky, R., Masli, M., & Terveen, L. (2010, February). Eliciting and focusing geographic volunteer work. In Proceedings of the 2010 ACM conference on Computer supported cooperative work (pp. 61-70). ACM.
  11. Krieger, M., Stark, E. M., & Klemmer, S. R. (2009, April). Coordinating tasks on the commons: designing for personal goals, expertise and serendipity. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (pp. 1485-1494). ACM.
  12. Viégas, F. B., Wattenberg, M., & McKeon, M. M. (2007). The hidden order of Wikipedia. In Online communities and social computing (pp. 445-454). Springer Berlin Heidelberg.
  13. Beschastnikh, I., Kriplean, T., & McDonald, D. W. (2008, March). Wikipedian Self-Governance in Action: Motivating the Policy Lens. In ICWSM.
  14. a b c d Halfaker, A., Geiger, R. S., Morgan, J. T., & Riedl, J. (2012). The rise and decline of an open collaboration system: How Wikipedia’s reaction to popularity is causing its decline. American Behavioral Scientist, 0002764212469365.
  15. Geiger, R. S., & Ribes, D. (2010, February). The work of sustaining order in wikipedia: the banning of a vandal. In Proceedings of the 2010 ACM conference on Computer supported cooperative work (pp. 117-126). ACM.
  16. Adler, B. T., Chatterjee, K., De Alfaro, L., Faella, M., Pye, I., & Raman, V. (2008, September). Assigning trust to Wikipedia content. In Proceedings of the 4th International Symposium on Wikis (p. 26). ACM.
  17. West, A. G., Kannan, S., & Lee, I. (2010, April). Detecting Wikipedia vandalism via spatio-temporal analysis of revision metadata?. In Proceedings of the Third European Workshop on System Security (pp. 22-28). ACM.
  18. Geiger, R. S., & Halfaker, A. (2013, August). When the levee breaks: without bots, what happens to Wikipedia's quality control processes?. In Proceedings of the 9th International Symposium on Open Collaboration (p. 6). ACM.
  19. Lave, J., & Wenger, E. (1991). Situated learning: Legitimate peripheral participation. Cambridge university press.
  20. Von Krogh, G., Spaeth, S., & Lakhani, K. R. (2003). Community, joining, and specialization in open source software innovation: a case study. Research Policy, 32(7), 1217-1241.
  21. a b Preece, J., & Shneiderman, B. (2009). The reader-to-leader framework: Motivating technology-mediated social participation. AIS Transactions on Human-Computer Interaction, 1(1), 13-32.
  22. Suh, B., Convertino, G., Chi, E. H., & Pirolli, P. (2009, October). The singularity is not near: slowing growth of Wikipedia. In Proceedings of the 5th International Symposium on Wikis and Open Collaboration (p. 8). ACM.
  23. a b Halfaker, A., Geiger, R. S., & Terveen, L. G. (2014, April). Snuggle: designing for efficient socialization and ideological critique. In Proceedings of the 32nd annual ACM conference on Human factors in computing systems (pp. 311-320). ACM.
  24. Choi, B., Alexander, K., Kraut, R. E., & Levine, J. M. (2010, February). Socialization tactics in wikipedia and their effects. In Proceedings of the 2010 ACM conference on Computer supported cooperative work (pp. 107-116). ACM.
  25. Joyce, E., & Kraut, R. E. (2006). Predicting continued participation in newsgroups. Journal of Computer‐Mediated Communication, 11(3), 723-747.
  26. a b Morgan, J. T., Bouterse, S., Walls, H., & Stierch, S. (2013, February). Tea and sympathy: crafting positive new user experiences on wikipedia. In Proceedings of the 2013 conference on Computer supported cooperative work (pp. 839-848). ACM.
  27. Howard T. Welser, Dan Cosley, Gueorgi Kossinets, Austin Lin, Fedor Dokshin, Geri Gay, and Marc Smith. Finding social roles in wikipedia. In iConference ’11, pages 122–129. ACM, 2011.