Research talk:Participatory Motivation to Commons-based Peer Production

I would greatly appreciate any suggestions and inputs regarding the project, especially in the recruiting method. Thank you

Why sample so many editors in such a desired subset?[edit]

This is a general question about recruitment that is relevant to this request. The top editors on Wikipedia are highly sought after for surveys and interviews. They also represent a crucial asset to the Wikipedia community and should be disrupted as little as possible. 1000 requests is a lot when selecting from the top editor pool. I'm worried that allowing this type of request at this scale with any regularity could overload these valuable editors with survey and interview requests. How do we allow recruitment requests like this to move forward and protect these editors from unnecessary amounts of disruption? --EpochFail 21:56, 27 May 2011 (UTC)[reply]

Sample size[edit]

I agree with EpochFail. The sample is a high number, more for a team of one researcher. Which is the total responses you desire to obtain (in order to justify the validity of your method) and you can handle in the analisis afterword?. Consider that the response rate of more active wikipedians might be higher than the general wikipedian and then you have to analyse the responses which might requiere a substantian effort for only one researcher. The methodology needs to be justified beyond "more responses is just better" and have to have an equilibrium between data & analysis, between data size & research resources, then also to argue why do you consider necessary to have a sample size larger or a different methodology than previous analysis on CBPP motivations. To rethink your methodological plan on this base might make your methodology more robust --Lilaroja 09:56, 28 May 2011 (UTC)[reply]

Response 1[edit]

Thank you, EpochFail and Lilaroja, for your valuable feedback. Please allow me to clarify a few things regarding my study (which should answer to some of the questions above).

As I mentioned in the study page, I am administering a questionnaire based on the Volunteer Functions Index (VFI), which is a 30-item questionnaire. I decided to adopt this theory, because volunteerism has great conceptual similarities with CBPP, which is seldom quantified in scholarly studies. For this precise reason, I believe my study will have significant contributions to both Wikipedia and the scholarly community that is studying CBPP, since my study can reveal the motivational factors of online contribution using a well-grounded and frequently replicated psychological theoretical framework. I will analyze the questionnaire by using confirmatory factor analysis, following the orignal VFI studies, to replicate the factor structure of the original studies.

> Which is the total responses you desire to obtain (in order to justify the validity of your method)
> The methodology needs to be justified beyond "more responses is just better"

My response to the first question would depend on how strictly-theoretically statistical I want to be. Ideally, in order to conduct a statistically sound and valid confirmatory factor analysis, I would need about 10 or 5 times the number of variables for my sample size (in my case that would be about 600 or 300 responses since I have some more variables added to the VFI, respectively). However, I understand that this is extremely unrealistic and would cause an unacceptable amount of disturbance among the community of valuable Wikipedia contributors. I obviously do not want that, so I was thinking of a compromise where I would use some of the manageable "rule-of-thumb" sample sizes of CFA, i.e., 100 or 250 responses (note that this is an absolute count, not subject-to-variable ratio; hence a "rule-of-thumb"). Taking into consideration the reactions that I received from the Research Committee, I am willing to terminate my data collection after 100 responses to my survey. I currently have 66 responses, so I am already half-way there. I am also giving up on sending "follow-up" messages, a typical technique to increase survey response rate, due to the same reason (not practical/realistic). [ADDED: 5/30] Also, to address EpochFail's concern about protecting the top editor pool, I could alter the sampling method to cover a wider scope of active editors. So, for example, I would contact 800 randomly selected editors from the top 8000 list (100 per each increment of 1000) rather than concentrating on just the top 1000.

Best Regards,

Yoshi Suzuki

Thanks Aaron and Mayo for your comments, and thank you Yoshi for your detailed response. Your revised recruitment strategy is definitely in line with the volume of requests we would expect for highly active editors. If you can use a stratified sampling based on the 8000 list that should also be less disruptive. Can you send your further requests in separate batches so as to stop them as soon as you hit the number of complete responses you are still missing? I suggest that you mention in your recruitment message that this project was reviewed and received support by the WIkimedia Research Committee and you add backlinks to both your project description and to this talk page so people can share their concerns (if any). --DarTar 17:21, 31 May 2011 (UTC)[reply]

I'm happy with randomly sampling from the larger (8000) group of top editors and stopping the requests once enough responses have been received. --EpochFail 20:53, 31 May 2011 (UTC)[reply]