Grants:IEG/Mapping History: Revision History Visualizer and Improvement Suggester using Geo-Spatial Technologies

From Meta, a Wikimedia project coordination wiki

status: not selected

Individual Engagement Grants
Individual Engagement Grants
Review grant submissions
review
grant submissions
Visit IdeaLab submissions
visit
IdeaLab submissions
eligibility and selection criteria

project:

Mapping History: Revision History Visualizer and Improvement Suggester using Geo-Spatial Technologies


project contact:

basilgeorge007(_AT_)gmail.com, rashadkm(_AT_)gmail.com

participants:


Basil George, Rashad KM



summary:

A tool for easy visualization of the revision history of a page and its edit trends, along with possible suggestions to bridge geographical gap, using freely available open source geo-spatial technologies.

engagement target:

Wikipedia

strategic priority:

Improving Quality?

total amount requested:

30,000 USD


2013 round 1

Project idea[edit]

A large number of frequent revisions occur on many topics in the Wikipedia. While the History page gives an idea of these changes, the list format of this page makes it very difficult to analyze the edit trends in terms of the geographical spread of editors, rate of topic updates etc. The geo-spatial information derived from the revision history of a page can provide valuable insights into the neutrality and comprehensiveness of an article. For eg., topics on Palastine by a majority of Israeli editors may show bias. Hence, a tool to visualize the statistics of a page not only ensures easy analysis of revision history, but also goes a long way in spotting biased articles and thus provides a monitoring mechanism to achieve quality control of Wikipedia as a whole.

The project aims to build a tool for easily visualizing the geographical trends in the revision history of a topic by using maps to highlight the world regions where edits on a topic have been made. Along with the geographical locations, other statistics such as the number and frequency of edits from various regions can also be highlighted on the map. While the rendering of the history statistics of a page onto a map is the primary aim of the project, the tool should also be able to place suggestions to the Wiki users by suggesting the world regions from where more edits need to be happening. This may encourage the users of these regions to contribute more, thereby increasing people participation and improving the comprehensiveness of Wikipedia as a whole.

Important Note:

  • This tool can be implemented as a standalone application without any hosting dependencies on WMF Engineering.

Project goals[edit]

Primary goals:

  1. Build a visual tool to render revision history statistics on a map for easy visualization and analysis of edit trends, using Satellite data from OpenStreetMap or other Free Web map services(WMS) as background layer.
  2. Extract spatial data from IP address using Geo-locator services and/or reverse geocoding.
  3. Allow researchers to download the vector data(spatial) under a suitable license(free GIS data for research)
  4. Use OGC GML standard for storing vector data.
  5. Use only freely available open source GIS technologies such as OpenLayers/Leaflet to build the rendering engine.
  6. Build interface for simple spatial query/analysis.
  7. Time slider to view change history/edits over time.

Secondary goal:

  1. Based on the analysis output provide automatic suggestions to improve the overall quality and neutrality of an article. For eg., by noticing the lack of edits that may be happening from Africa on a topic about the social impact of AIDS, the tool may provide suggestions for the same.


Part 2: The Project Plan[edit]

Project plan[edit]

Scope:[edit]

Scope and activities[edit]

The entire project is conceived to be implemented in the following phases:

Phase 0 (Feb 16 - Mar 29, 2013)[edit]
  1. Get familiar with MediaWiki APIs that need to be used to build the visualizer.
  2. Understand possible challenges.
Phase 1 (March 30 - June 30, 2013)[edit]
  1. Build a Preliminary Design Report (PDR) comprising the flow documentation and system architecture.
  2. Finish Module 1: Locating geographical areas from IPs.
  3. Finish Module 2: Rendering history statistics on a world map.
Phase 2 (July 1 - September 15, 2013)[edit]
  1. Build the Critical Design Report (CDR) using the implementation experience till then and community feedback.
  2. Finish Module 3: AI system for unsupervised learning of the geographical spread of revisions of a page and suggesting improvements.
Phase 3 (September 16 - October 31, 2013)[edit]
  1. Make final refinements and launch the history visualizer tool.
  2. Build the documentation.
  3. Final report submission.

Tools, technologies, and techniques[edit]

Service API

  • Geolocator Service.
  • Reverse Geocoder.
  • Wikimedia API (reading user info, page diff, etc.)

Libraries

  • Web Map Renderers (eg: OpenLayers / Leaflet)
  • GDAL/OGR(spatial data handling).

Programming Languages

  • PHP
  • JavaScript
  • HTML, CSS

System Overview[edit]

The overall system overview is shown below:

Flowchart showing the system overview
Flowchart showing the system overview

Budget:[edit]

Total amount requested[edit]

$30,000

Budget breakdown[edit]

Front & back-end engineers and product designers:

  • Hourly wage: $75
  • Working hours per week: 12
  • Estimated duration of project: 34 weeks (Feb 16 - Oct 31)
  • Total cost: $30,000 (approx.)

Intended impact:[edit]

Target audience[edit]

Everyone in general and Wikimedia editors in particular.

Fit with strategy[edit]

Increasing quality is the main strategic priority that this project addresses. This tool acts as a quality monitor as it will be very easy to visualize the possible biases that can creep into an article due to the lack of geographical diversity among content creators.

This project aligns with the Increasing Participation objective too as the improvement suggestions requesting editors from particular geographical regions to contribute may act as a personalized/targeted recruitment notification, thereby encouraging more users to participate.

Sustainability[edit]

This project is a part of our broad vision that technologies must be used creatively to bring together the best knowledge possible to mankind. The usage of GIS technologies is beginning to revolutionize the way we interact with our natural surroundings. The location based services provided by Google and similar technology companies are just small examples. After this project, we hope that more and more users will come together to build open source technologies that benefit the mankind as a whole.

Measures of success[edit]

  1. Visualizer: Subjective evaluation measures will determine the degree of user-friendliness of the visualizer.
  2. Improvement Suggester: The success of this module is to be measured not only by the accuracy of its suggestions, but also by checking if the users have actually accepted the suggestions and the geographical gap in the editors has been bridged.

Participant(s)[edit]

Basil George: I am a technology student currently pursuing my Masters program in signal processing. My technological interests lie in signal and data processing, particularly in building products of value to common users. I have worked extensively in many large scale projects in diverse areas such as robotics, speech and EEG signal processing. My passions range from History, Sociology and Economics to Political Philosophy and public policy formulation. Wiki user page can be accessed here.

Rashad KM: I am a research student working in the spatial informatics for the past 3 years. I am an avid contributor to the Free and Open Source Software (FOSS) movement. My work area is mainly on spatial data handling, WebGIS, spatio-temporal processing, and Remote Sensing. I am an OSGeo memeber since 2009 and actively participate in FOSS4G communities such as GRASS GIS, OSSIM and QGIS.

Part 3: Community Discussion[edit]

Discussion[edit]

Community Notification:[edit]

Please paste a link to where the relevant communities have been notified of this proposal, and to any other relevant community discussions, here.

  • Notified Wikimedia developers through Wikitech-l

Endorsements:[edit]

Do you think this project should be selected for an Individual Engagement Grant? Please add your name and rationale for endorsing this project in the list below. Other feedback, questions or concerns from community members are also highly valued, but please post them on the talk page of this proposal.

  • Community member: add your name and rationale here.