Research talk:Breaking into new Data-Spaces

From Meta, a Wikimedia project coordination wiki
Jump to navigation Jump to search


Hey folks. Just thinking about what I'll say for the intro tomorrow morning and decided that I should just draft it here.

I think it's clear that we're are currently experiencing a time of great change around methodologies available to CSCW researchers. The growth in online communities, the wide availability of open datasets, and the growing practice of data science have changed the way we look at computers in the space of cooperative work. While the work around these new contexts, methods, and datasets has been a boon for the pursuit of knowledge, we've come to recognize some limitations in our work. Just because data is openly available does not mean that it is easy to work with. It turns out that the technical walls separating us from access to data is only one of the barriers to using it effectively. So, we have a proliferation of single system papers that are hard to generalize. We develop expertise on specific datasets because the level of skill required wield a dataset to effectively answer research questions is difficult. We regularly lament the lack of replication or replicability of past work. To plainly say it: Working with data is harder than we thought -- even when we technically have access to it and we have a set of useful methodological tools to apply.

In this workshop, we'll be experimenting with breaking down some of barriers to performing the kind of research practices we'd like to -- the barriers we think make working with open datasets and replicating the work of others is so difficult. As CSCW researchers we are uniquely positioned to think critically about this issue -- to apply user-centered design, to think about infrastructures, and to think about how we'd like to use computer systems to coordinate our work as researchers. So, that's what we're going to do. Today we're running an experiment on ourselves to see where we have been successful in lowering barriers and prioritize next steps. We'll ask you to replicate a past CSCW work (that you've all had a chance to read) using two technologies that we've identified as important for this purpose: a metadata index and an online query service. If you manage to replicate the bit of work from this paper that you find most interesting before we're done today, please feel free to extend the work or explore the tooling we've made available to you.

In the next 20 minutes, we'll give you an introduction to the Metadata index and the Quarry querying service. We'll then have you split up into groups of 3-5 to review the replication task and the tools that we have provided to you. Yuvi will be working on making sure our systems stay online. AJ & Kristen will help with any questions you have about the metadata. I'll be available to answer methodological questions about the paper. Jonathan and Andrea will be observing your work and asking questions and taking notes to help us iterate. Note that we've specifically inserted three breaks into the day -- two 15 minute breaks and lunch. Please use these breaks to do email or other non-workshop stuff that needs to get done. We've provided each table with an etherpad to take notes on. Please use this to capture thoughts about what works and what doesn't. Please make sure to note features that you'd like to see in the metadata index and Quarry. At the end of the day, we'll invite each table to choose a representative to report their progress and their notes.

Are there any questions? ..... OK, then I'll hand the mic over to Yuvi to give you an introduction to the Quarry querying system. --EpochFail (talk) 23:54, 26 February 2016 (UTC)