Research:Mapping Wikipedia in the Middle East and North Africa
| Mapping Wikipedia in the Middle East and North Africa | |
|---|---|
| main contact |
Mark Graham
|
| co-investigators |
Ilhem Allagui
Kalina Bontcheva
Ali Frihida
Bernie Hogan
Ahmed Medhat Mohamed
|
| WMF contact | Nimish Gautam Dario Taraborelli |
| start | 2011-04 |
| end | 2013-03 |
| status | in progress |
| fields | sociology geography computer science |
| open data | |
| open access | |
| WMF support | |
Contents |
[edit] Key personnel
- Ilhem Allagui American University of Sharjah
- Kalina Bontcheva University of Sheffield
- Richard Farmbrough Oxford Internet Institute
- Ali Frihida University Tunis El Manar
- Mark Graham Oxford Internet Institute
- Bernie Hogan Oxford Internet Institute
- Ahmed Medhat Mohamed Oxford Internet Institute
[edit] Project summary
We are investigating Wikipedia's representations of the Middle East, North Africa and East Africa. In particular, we are interested in both who is being represented, with what frequency, and who is doing the representing. To this end we are using data collected from [dumps.wikimedia.org/backup-index.html], to analyse patterns throughout Wikipedia's history.
We are capturing location-oriented data in two ways, and seek assistance for a third.
- Geocoded data. We have parsed the current versions of seven languages of interest: English, French, Swahili, Arabic, Egyptian Arabic and Persian. For each one we have sought out articles for locations based on a parsing of geocodes within the articles (including body text as well as info boxes).
- NLP-based Location grammars. We are using GATE in order to assess where possible the self-identified locations of individuals, as well as cultural heritage based on Babel info boxes.
- We would like to assess the locations of logged in users, within the confines of respecting their privacy. The section below Methods, describes this more fully. To note, we are keen to ensure that our data is open access, and to that extent we are interested in APIs to private data that can output anonymized data to us rather than the explicit downloading of private data.
Alongside these location-oriented tasks is an interest in the following research topics:
- Patterns of conflict and collaboration: To what extent to geographical markers of identity play a role in the collective task of editing and maintaining articles.
- Geographic patterns of second-level metrics: At present, there are numerous maps indicating vast geographic and linguistic inequalities on Wikipedia, but these tend not to address differences in quality, contentiousness and scope as determined through extensive processing of database dumps.
- Location-based clustering of authors: To what extent to authors from a country edit together? Is this more than one would expect based on shared interests?
Finally, one of the key goals of our project is education and increased content creation. As such, we are holding four meetings across the Middle East and North Africa in the coming years in order to both disseminate our results and to further engage individuals from the Arab world on Wikipedia.
[edit] Context
In particular significant effort has been invested (and continues to be invested) in understanding the self-identified location and origin of the editors. Clearly this will be likely to remain an incomplete measure, since the nature of contributions is such that many editors do not, for various reasons, provide any such information, and even when they do it may well not be structured in an easily recoverable way. Nonetheless this is a key component to understanding the reality of representation on WMF projects, and by extension on the Internet as a whole. Allied with this, of course, is the principle of geolocating IP edits, which provides, again, an incomplete (and not necessarily fully accurate, due to use of proxies, VPNS etc.) picture of those editors who choose not log in to a user account, in this case, of course, the biggest lacuna is the geolocations of logged in users, which is where anonymised aggregated data from the Foundation's database will close a gap.
[edit] Methods
- Analysis of WMF dumps
- Static analysis,
- Data parsing
- History analysis
- Analysis of user pages
- NLP analysis to determine user self-identification
- Analysis of data aggregated by WMF
- Statistical analysis of aggregated geolocation information
- Comparison of intermediate statistics gathered above
- Editor surveys/interviews
[edit] Dissemination
Findings will be shared in various ways:-
- By workshops with those attempting outreach of WMF or other open projects in the region of interest.
- Publication of project material and data via the OII's website
- Publication of academic papers via the most appropriate journals and other outlets
- Production of tools to assist in enhancing articles
[edit] Wikimedia Policies, Ethics, and Human Subjects Protection
Proposed interview with users, as part of data gathering exercise.
- Approval from Oxford University Social Sciences and Humanities Inter-Divisional Research Ethics Committee for user interviews. SSD/CUREC1A/11-253 5 September 2011
The project is committed to scrupulous ethical practices, and any additional requirements needing ethical approval will be subject to the Oxford University Social Sciences and Humanities Inter-Divisional Research Ethics Committee process.
[edit] Benefits for the Wikimedia community
Our project has three main outputs, all of which are likely to be of interest to the Wikipedia community. They are scholarly output, dynamic web resources and dissemination workshops with translated materials. In the end, all three are oriented towards increased content creation as well as a clear articulation of mechanisms that will help bring new members into Wikipedia.
- Scholarly outputs.
- Specific research questions have been asked, but in addition other questions are being asked and answered as the project proceeds. These will be made available through a combination of traditional scholarly output, more contemporary means such as the project web page, blog posts, mailing lists and on Wiki pages and traditional mainstream media interviews and articles.
- Dynamic web resources.
- A multilingual tool to provide assistance in improving article quality is part of the overall project.
- Dissemination workshops including translated materials - both the scholarly output and workshop material.
- See next sub-section for more detail
[edit] Workshops
Four workshops are scheduled, two in 2012 and two in 2013
[edit] 2012
The 2012 workshops will take place in Cairo and Jordan, in mid-April, each workshop will be two days. The planned attendance is approximately thirty Wikimedians.
The workshops will be both a data gathering exercise, in terms of open and closed enquiry into the barriers to editing experienced by MENA editors, and an opportunity to share knowledge of the social and technical aspects of the editing process between all participants.
The program for the event will include, depending on attendance, delivery of live material in English and Arabic, discussions/round-tables, information and experience sharing, availability of written materials from the project. The detailed program is under development.
[edit] 2013
Locations for the 2013 workshops are still to be decided.
Two workshops will held, probably in March
The planned attendance is approximately thirty Wikimedians or potential Wikimedians, or others with a specific stake in increasing local free content creation and curation.
[edit] Time line
| Date | Project goals |
|---|---|
| April 2011 | Project initiation |
| September 2011 | Draft article location results |
| November 1011 | NLP analysis on user pages complete |
| December 2011 | Extraction of data relating to user |
| April 2012 | Workshop on initial results and fact-finding |
| April 2013 | Workshop on web resources and scholarly outputs |
[edit] Funding
- International Development Research Centre grant # 106228
- John Fell Fund project 101/549
[edit] References
[edit] External links
[edit] Contacts
- Mark Graham - immedium@gmail.com +44(0) 1865 287203
- Bernie Hogan - bernie.hogan@oii.ox.ac.uk +44(0) 1865 287198