Research:Wikipedia Research Management

From Meta, a Wikimedia project coordination wiki

Note: the following analyses and recommendations are focused on social science research projects of Wikipedia and sister projects.

How to improve the management of Wikipedia related research projects? An attempt at practical thinking.

Introduction/Overview[edit]

These notes are a product of a (more or less) thorough overview of existing research related pages on Meta, the R Com-l archives, planned or conducted Wikipedia related research projects and published papers in relevant journals or conference proceedings.

The goal is to formulate a set of practical guidelines based on reality assessment of goals and problems: what goals are currently out of reach because of their scope or complexity and what goals are attainable, what can be done immediately and what will need to be learned and developed as particular research projects develop.

Recommendations for improvements of Wikipedia related research projects are summarized in the following section. Detailed discussions of particular Wikipedia research related issues and the rationale for the conclusions listed in the following section are provided under Discussion.

Recommendations for Wikipedia Research Management[edit]

Credo[edit]

The best way to learn how to manage Wikipedia research projects is by doing it. In this context: "doing it" = doing research + managing research.

The message underlying the following analyses of ways available to efficiently manage Wikipedia research projects is the following: managing Wikipedia research projects is most probably too complex to be planned ahead in a top-down manner; while some foresight an planning will do, most of the time we will be forced to learn by trial-and-error, by tracking and managing particular research projects.

Recommendation 1.[edit]

Most probably, the best way to start developing both technical and ethical guidelines, policies and procedures for Wikipedia related research is to get engaged in long-term cooperation/communication with particular researchers who conduct particular research projects. These communications/cooperations would act as a sort of monitoring/interaction processes on the behalf of the community, RComm and the Wikimedia Foundation - processes whose outputs produce experience, case-by-base analysis, discussion, and summarize what is learned and what, therefore, needs to be formalized in the relevant guidelines, policies and procedures.

Recommendation 2.[edit]

In order to help researchers to conduct Wikipedia related research that involves human subject participation, the Foundation, RComm and the the community need to develop a Research Support Network (RSN). RComm should try to (i) establish a network of individuals from local chapters and Wikipedia communities, (ii) build awereness among them about the importance of research, and (iii) involve them in planning, discussions and evaluations of research projects. Only an established network of editors, admins and users related to the community will be able to help recruit participants for Wikipedia related research studies. Response rates for online studies are notoriously low; not many researchers will be motivated for efforts needed to recruit a representative sample of Wikipedians for any study unless they are supported through a network that would provide direct contact with potential participants and active support in local communities. The development of a RSN could result in the achievement of two goals: (a) direct support for doing research on Wikipedia, which would in return bring higher motivation on the behalf of the research community, and (b) awareness building and active participation in the formulation of relevant goals and research projects on the behalf of the Wikipedia community.

Recommendation 3.[edit]

RComm, the WMF and the community, need to formulate a set of Wikipedia Research Challenges that would motivate (especially young) researchers to actively start developing Wikipedia relevant research projects. In contrast (but not unrelated) to research goals that would be defined as directly relevant for the community and the Wikimedia Foundation - and whose formulation was already planned by the RComm - these Research Challenges would act as motivations for researchers to put their creativity and efforts in line with the development of Wikipedia and sister projects. Some sort of online advertising on the Wikipedia Research Challenges should prove to be effective in accordance with this recommendation. The Research Challenges should be formulated, suggested and selected by the community in a process observed and catalyzed by the RComm.

(Please note: terms "Wikipedia Research Challenges" and "Research Challenges" are chosen tentatively and for the purpose of this discussion only; if the idea proves to be acceptable for the community, we should think thoroughly about precise formulations)

Discussion[edit]

The list of RComm Areas of Interest and Core Functions encompasses the RComm's core functions, as defined in the beginning of the discussions on R comm-l: this classification of interests and core functions is easily mapped onto relevant goals and activities, so we will follow them as defined. We constrain the following discussions on those areas of interest where we hope our analyses can provide helpful insights. Again, it seems that many of the things that will be discussed below touch almost all of the listed areas of interest and RComm functions.

Discussion 1: supporting the development of subject recruitment processes[edit]

The function of supporting the development of subject recruitment processes is everything but simple. We believe this goal to be among those that can not be completely planned in advance as many things will need to be learned during the implementation of subject recruitment activities for particular projects.

In order to explain why this is so, we will discuss the SRAG initiative. We will assume that no research project was actually processed by the proposed SRAG procedures since the list of approved applications, where approved projects should rest, is empty (the page is non-existent).

Discussion of the SRAG iniative[edit]

This initiative will most probably be able produce all relevant research policy guidelines when it starts discussing and processing particular research projects. Generally, there are two types of concerns related to research project that encompass subject recruitment (human subjects participation). We will term these two types of concerns as (1) technical and (2) ethical and briefly discuss both.

Technical considerations.[edit]

Technical considerations are related to the development of a standardized subject recruitment system – presumably referred to as SOMETHING/SOMONEONE on the SRAG page. The consequences of the ways this in which system should help the selection of representative samples will be of decisive importance for the researchers. On the other hand, the consequences of handling ethical (2) considerations will be of great importance for the community. Any researcher (or research team) must be able to produce the sample design for the study they intend to conduct. Roughly, this means describing exactly what variables (examples: age, sex, language, experienced/new editor, experienced/new admin, etc) should be represented in what proportion in the study sample. Only with these data provided, the SOMETHING/SOMEONE, once it is developed, will be able to determine the schema to collect an adequate random sample of participants. Researchers with approved projects should be enabled a way to express formally, exactly and precisely the demands related to their planned samples in terms of sample schemas (e.g. forms, scripts, standards) that would be processed by SOMETHING/SOMEONE. Please note: random sampling easily grows into a very, very complex problem as the complexity of the design increases. We believe specific importance should be placed upon the development of standardized ways of communicating the sample design for a particular research to (i) the RComm, (ii) all editors involved in the discussion of the project, and (iii) SOMETHING/SOMEONE for processing. For the time being, we do not believe that is possible for RComm, or anyone from the community, to produce a complete, abstract sampling schema for any possible research project (that could presumably be customized for particular projects); the task is, simply, too complex. The technical system for subject recruitment will need to be designed and iteratively re-designed to match the needs of upcoming research project as they get into focus, step by step. As a consequence, this leads to a simple, practical recommendation: the RComm needs to start working with particular researchers, on particular research projects, and develop relevant technical procedures as the project advances. After some number of iterations, the technical system for subject recruitment will (hopefully) become operational up to the needs of the majority of research projects. RComm needs to start actively monitoring some research projects and help the development of relevant technical systems, policies and procedures during a long-term interaction with researchers on these projects.

Another important related technical consideration is related to the problem noted by user PiperNigrum [1], who himself tried to recruit participants from Wikipedia in cooperation with user EpochFail:

“One avenue that is often used is posting on various community forums such as Village Pumps, Centralized discussion, and mailing lists. While this method generates participants for studies, there are a number of problems.

  • The samples are non-random
  • The sample draws only from Wikipedians who read the forum
  • As a community forum, readers often ask questions, or comment on study design.”


As we have already discussed the complexity of obtaining random samples, here we focus on the more administrative problem of reaching potential study participants. As noted above, the reach of forum posts are forum users only; that does not provide much help to a researcher who has a well planed sample design. Of course, adding direct e-mail contacts will improve upon the situation, but it will mean involving the risk of users treating these e-mails as disruptions or spam. There a still more serious problem related to the fact that the response rate induced by online contacts in online studies is very low; anyone who ever tried to conduct a study online is familiar with this fact. What we need in order to make Wikipedia an accessible research resource is a Research Support Network (RSN) that would span from RComm and users who are specifically interested in research related issues towards the more active members of local communities and chapters that could help motivate Wikipedians to take part in relevant research. At least in the beginning, not every research project will qualify for this kind of support - simply in order to avoid the perception of participant recruitment processes as aggressive advertising on the behalf of the communities - and that is the place were RComm in collaboration with the community and the WMF needs to provide policy guidelines and select projects that are of higher significance for the community in order to support them in this way. That is how we could get to achieve two goals: (i) motivate the researchers to apply with research projects that are relevant for the needs of the community, and (ii) help them in the notoriously difficult research management process of online subject recruitment. The development of a Research Support Network could more globally result in the achievement of the following: (a) direct support for doing research on Wikipedia, which would in return bring higher motivation on the behalf of the research community, and (b) awareness building and active participation in the formulation of relevant goals and research projects on the behalf of the community.

Ethical considerations[edit]

Ethical considerations refer to two subtypes of problems: (2a) does a particular research project satisfies Wikipedia community norms, and (2b) does a particular research project satisfies broader recommendations and obligations to research ethics in academic research that encompasses human subject participation?

As of (2a), the question of whether a research project satisfies Wikipedia community norms, we have not much to say. Wikipedia has well developed policies that reflect the norms of the community, and we believe a collective of interested users to be able to discuss the adequacy of each research project as they usually discuss and manage other efforts to improve Wikipedia.

As of (2b), on the other hand, the question of whether a research project satisfies broader recommendations and obligations to research ethics in academic research that encompasses human subjects participation, is, in our belief, extremely complex and can not be readily answered for a given project - except in some rather obvious cases. We list the following reasons that make us believe that relevant guidelines/policies in respect to this question will need to be developed in the processes of (i) actively monitoring research projects that are underway and (ii) communication/interaction with relevant researchers or research managers, instead of being planned in advance:

  • Policies for conducting ethically responsible research that encompasses the participation of human subjects will differ across countries and jurisdictions. In order to plan a general policy that would enable anyone to conduct any research project whose subject reach spans across different communities, the Wikipedia community and the RComm would spend a lot of time trying to solve a (probably unsolvable) problem that will, at the end, need to be managed on the behalf of the researcher's or the community's direct contacts with academic institutions in particular countries in order to make the project comply with locally relevant policies. Such problems call for legal expertize in order to be carefully planned, and we do not believe that even the best and most experienced legal experts in Internet Governance, Internet law and related fields would be able to produce a universal solution which would not conflict with any of the existing local policies. It's just too complex.
  • There are academic environments where nothing similar to IRBs in USA (or similar rules/boards in other countries) exist at all. For example, in Republic of Serbia (at least to the best knowledge of the author of these lines who is a social scientist from Serbia), no similar policy exists. Of course, the research process must comply with the country's legal system, but (as far as to my knowledge) nothing that addresses specific aspects and details of any research project in social sciences is defined. In such cases, research ethics are in the hands of researchers and the judgment of the research community.
  • Existing policies for ethically conducting research studies that involve human subjects have themselves evolved as products of long-term interactions and social research histories rather then being planned in advance and set once for all; recapitulating such processes on Wikipedia while understanding already existing policies, rules and practices, seem to be the only way to produce a universally valid, generally accepted set of ethical guidelines for doing research on Wikipedia.

Summary of the SRAG Discussion[edit]

  • The best action to start developing both technical and ethical guidelines, policies and procedures for Wikipedia research is to get engaged in long-term cooperation/communication with particular researchers who conduct particular research projects. On the behalf of RComm and the community, these communications/cooperations would act as a sort of monitoring/interaction processes, whose outputs produce experience, case-by-base analysis and discussion, and summarizing what is learned and what, therefore, needs to be formalized in the relevant guidelines, policies and procedures. The complexity of the problems related to all potential Wikipedia research projects calls for such an active approach. The fact that most of the existing research policies, guidelines and procedures that actually work were also developed in parallel with relevant research processes also provides evidence that this is a way to go.


Discussion 2: Helping to formulate the key strategic research objectives of the Wikimedia movement[edit]

We will suggest that the best course of action in respect to this function of the RComm is to help conceptualize two sets of research objectives in broad participatory processes that would encompass all interested members of the community:

  1. Key strategic research objectives that are in line with the needs of the Foundation (what does the WMF need to know in order to be able to help the development of its projects) and the needs related to already existing and accepted strategic actions (http://strategy.wikimedia.org/wiki/Main_Page);
  2. Wikipedia Research Challenges that would present a set of research objectives formulated on the behalf of interested members of the community in a process motivated and catalyzed by the RComm; this Research Challenges would help motivate young researches interested in collaborative knowledge systems to develop innovative research projects and actively participate in the community. Some sort of online advertising on the Wikipedia Research Challenges should prove to be effective in accordance with this recommendation (particularly liked the proposed new design of the meta: research pages: http://meta.wikimedia.org/wiki/Research:2011_overhaul – research challenges could be presented there, for example).

Discussion of the key strategic research objectives[edit]

The assumption is that the WMF and the RComm will be aware of the immediate needs for research projects that arise from previously defined key strategic actions and the activities that WMF needs to acomplish in the scope of its work. It does not mean that these needs will be in line with the type of research projects that the researchers themselves would be interested to develop. That is why we believe these two motivations for doing research on Wikipedia should be conceptually separated, at least in terms of management. Some researchers certainly propose projects that are of immediate interest for the community, while some needs and ideas of the community, the WMF and the RComm will certainly be in line with the interests of some researchers; we propose to exploit this existing symmetry in Wikipedia research management.

If the RComm manages to develop a Research Support Network, as mentioned above, an interesting process could be started by asking all interested Wikipedians to contribute their ideas on the following:

  • What do you think are the more interesting questions related to the Wikipedia and the Wikipedia movement that ask for a scientific research in order to receive appropriate answers?
  • What are the most fascinating things related to Wikipedia and our movement that should be studied scientifically?
  • What are the things you puzzle the most about the Wikipedia and our movement that you believe scientific research could help understand better?
  • and similar.

By helping the RComm to formulate the most interesting research questions, the community could learn more about Wikipedia research and research management processes, and maybe its members could become more motivated to get engaged in research projects, as researchers, evaluators, participants in discussions or study participants – according to their interests and motivations.

On the other hand, the formulation of important research objectives - that we only tentatively choose to call Wikipedia Research Challenges in this document - we could increase the motivation of researchers to develop Wikipedia related research projects in their attempts to meet them. The Wikipedia environment should naturally become the source that dictates the trends and objectives in the study of online collaborative knowledge systems, which is place it most naturally deserves as the greatest collaborative effort in the intellectual history of humanity.

Summary of the key strategic research objectives Discussion[edit]

RComm, together with the Wikimedia Foundation and the community, needs to formulate a set of Wikipedia Research Challenges that would motivate (especially young) researchers to actively start developing Wikipedia relevant research projects. In contrast (but not unrelated) to research goals that would be defined as relevant for the community and the Wikimedia Foundation, and that should be (and were already planned by the RComm to be) stated in parallel with these Research Challenges, the later would act as motivations for researchers to put their creativity and efforts in line with the development of Wikipedia and sister projects. Some sort of online advertising on the Wikipedia Research Challenges should prove to be effective in this respect. The Research Challenges should be formulated, suggested and selected by the community in a process observed and catalyzed by the RComm.