Learning patterns/Analysing effects of offline meetups
What problem does this solve?
Offline face-to-face meetings are an important part of the Wikimedia community. However, there are only few systematic assessments of their effects of such meeting on online behaviour as it can be difficult to obtain the meeting data. Once collected, statistical methods can be used to assess whether the offline meetups have an effect on the online behaviour of Wikipedians. In this learning pattern, I want to highlight some methods how the meeting data can be analysed, and discuss problems and limitations which must be kept in mind when working with meetup data.
What is the solution?
This learning pattern will provide guidelines on how to analyse effects of offline meetups (on online behaviour).
Effect on what?
Like in any other research project, the first step is to identify what you are actually interested in and which research question you want to answer. Do you want to know...
- whether users make more edits after attending a meetup?
- whether users make higher quality edits after attening a meetup?
- whether users remain active for longer after attending an edithaton?
- whether users taking part in introductory edithatons feel more welcome in the community?
- whether users know the rules of Wikipedia better after attending an edithaton?
- whether users are more likely to become administrator after having met other administrators face-to-face?
- whether users start (or stop) using the talk page to interact with those that they have met?
Many questions can be asked about the effects of offline meetups on the online behaviour of Wikipedians. To identify the effects, different data and strategies of analysis might be needed. This learning pattern will outline approaches and relevant thigns to consider irrespective of the specific research question. Nevertheless, before starting any analysis, it is important to make your questions and aims clear.
Getting the data
To answer your research question, you not only need data on the offline meetups but you probably need to merge it with data capturing the online behaviour of Wikipedians. This online data might come from the Wikipedia data dump (see a learning pattern on using it for research), from collected requests for adminship, from additionally conducted (offline) interviews or surveys, or other sources - depending on the research question. When only working with meetup data, one can analyse their spatial and temporal distribution, the networks that have developed, etc., but only when merging them with other data sources, one is able to assess whether therse offline meetups have an effect on other behaviours and attitudes.
Once the research question is set and the data is obtained, the next task is to assess whether there is an effect of meetup participation on behaviour and attitudes. Making claims about the effects of meetups requires a comparison with users that have not attended any. Depending on the research question and the computational possibilities, it might make sense to work with the full list of users and to statistically control for tenure and activity in analyses (so that one can reliably say that it is the attendance at a meetup that makes people contribute more and not for example only the number of previous edits). Another option would be to use a matching approach. With a matching approach, one would match each user that attended a meetup with a user that did not attend one but is comparable in other regards. For example, imagine user A and B which registered on the same date and had a comparable level of activity in their first month. In their second month, user A attended a meetup while user B did not. User A and B could now be matched and compared (this is the very basic idea of covariate matching). Additionally, one can make use of the longitudinal nature of the data and compare before- and after levels of activity, for example using a difference-in-differences design (see this paper using diff-in-diff and covariate matching).
Different statistical models and comparisons can be useful, this depends on the research question and the specific data structure. However, in any case, it is important to keep in mind that meetup attendees are a self-selective group and comparing their behaviour to comparable others is most important to identify actual meetup effects.
When to use
- Use this pattern when you want to better understand offline meetups of a Wikipedia community.
- I have used this approach in my project.