Research:Breaking into new Data-Spaces

From Meta, a Wikimedia project coordination wiki
Breaking into new Data-Spaces
Infrastructure for Open Community Science
A CSCW 2016 workshop. Saturday, February 27, 2016 in San Francisco, CA, USA.
Apply Now
(before Dec. 31st)

Despite being easily accessible, open online community (OOC) data can be difficult to use effectively. In order to access and analyze large amounts of data, researchers must first become familiar with the meaning of data values. Then they must find a way to obtain and process the datasets to extract their desired vectors of behavior and content. This process is fraught with problems that are solved (through great difficulty) over and over again by each research team/lab that breaks into datasets for a new OOC.

In this workshop, we'll experiment with documentation protocols and technologies that are designed to make the process of “breaking into” a new dataset more tractible for researchers studying open online communities. This workshop’s purpose is to bring together researchers to test these systems and discover problems and missed opportunities to support iteration. Participants will also be given the opportunity to use state-of-the-art documentation and technologies to break into a new collection of datasets. This workshop is the direct result of a call to action to build infrastructure for data sharing between researchers from past CSCW workshops and related conferences.

Workshop details[edit]

Seacliff A
Start time
The workshop officially starts at 9am, and we'll get rolling then no matter what. But we really want y'all to show by 8:30, so to sweeten the deal we will be serving coffee and pastries in the workshop room starting at about 8:15! So please come early if you can, consume some caffeine and sugar with us, and generally get to know each other a bit before diving into our action-packed day.
What to bring
Bring your laptop, and a pen/pad if you prefer to take notes by hand. There will be power strips in the room so that you can plug in as needed.
Food & drinks
Lunch will be provided by the Open Data Factories working group. Snacks will be provided by CSCW.
Attention & engagement
We're a small group and we will be working closely together throughout the day. We ask that you refrain from surfing, emailing, tweeting, etc. during the day's activities as much as possible. However, we understand that pressing issues sometimes arise, so we have built time into the schedule specifically for attendees to catch up on work and external communication.
The workshop ends at 5pm. We would love to get together afterward with any folks who are interested in continuing the conversation over food and/or drinks, at some TBD venue near the hotel.


Vision statement
A short presentation and extended discussion about the purpose of the workshop and the larger initiative towards better infrastructure for open community data science.
Hack session
Participants (split into teams) work on the replication/extension task. Participants will have a total of 4.5 hours total for time on task besides introduction, breaks, and reflection time. The workshop organizers will work with participants to both answer their questions and observe their work.
Reporting and reflection
Participant teams report on their progress and reflect on what did and did not work for them. We'll specifically ask how the methods description, querying system, and metadata was helpful and how.
  • 8:15-9:00: breakfast mingling
  • 9:00 (sharp!): AH intro to the day (process + brief overview task)
  • 9:10-9:30: Vision statement about Infrastructure for OOC studies
  • 9:30-10:15: Data introduction -- Each team/table reviews the task, documentation and infrastructure.
  • 10:15-10:30: coffee break, email breaktime
  • 10:30-12:00: Morning hack session breakouts (one team per table)
  • 12:00-12:30: Lunch serving, email breaktime
  • 12:30-3:15: Afternoon hack session breakouts (one team per table)
  • 3:15-3:30: coffee break, email breaktime
  • 3:30-4:30: Report-out and reflection (surveys)
  • 4:30: Wrap-up, Thanks & Next steps.
  • ~5:00: Victory! Food? Share contacts.


  1. identify common challenges and novel strategies for making open community research easier to replicate and extend -- specifically targeting protocols for documenting research methods (e.g. the ODD protocol[1])
  2. inform the design of data management/analysis infrastructures like Quarry, our experimental open querying service[2]
  3. inform the design of metadata indexes like the Open Collaboration Data Factory's wiki[3]


Apply Now

This workshop is being organized by the Wikimedia Foundation in partnership with Open Collaboration Data Factories project.

See also[edit]


  1. cite the most recent incarnation of ODD
  2. footnote to quarry URL
  3. Footnote to OCDF metadata wiki URL