Grants:IEG/MediaWiki data browser/Midpoint

This project is funded by an Individual Engagement Grant

Welcome to this project's midpoint report! This report shares progress and learnings from the Individual Engagement Grantee's first 3 months.

Summary

In a few short sentences or bullet points, give the main highlights of what happened with your project so far.

Creation of a full-fledged web application to view and browse through an arbitrary set of structured data.
Importer scripts for extracting infobox data from Wikipedia, and data from Wikidata, into a format that the application can read.
A new lightweight syntax defined for specifying a data schema.
Finally, for what it's worth, a proof of concept that this sort of generic data-browsing interface is possible.

Methodology

How have you setup your project so far?

Describe the different parts of your experimental or pilot process, anything you decided was extra important to focus on, and any key concepts that are useful for understanding your work. Please also use this space to point to any background research or past learnings that have guided you in your decision-making process.

The project has been to create a Javascript-based "data browser" that takes in structured tables of data, represented by one or more CSV files, and present a navigable interface for viewing the data. The primary goal is to provide an interface for browsing structured data from Wikipedia and Wikidata, but there is a wide variety of other uses for such an application.

An important element of the project is that the data should be stored client-side, on the browser itself, to make for faster browsing, to reduce the load on the server, to make the application more mobile-friendly, and to create a path for turning the code into a mobile app. The mobile-friendliness of the app extends to the look-and-feel, which was designed to be lightweight and to resemble that of other mobile-friendly interfaces (one item per line, etc.).

Activities

What work has been completed so far?

Please list all activities you’ve undertaken as part of your project to this point, and provide a description for each activity.

Research into Web SQL, IndexedDB and LocalStorage - the three main browser-based data storage technologies. (I ended up using Web SQL and LocalStorage.)
Research into AngularJS, a framework for web apps (I ended up not using it, though it looks quite powerful).
Research into Javascript software design - I had never created a full JS-based "web app" before, so I didn't know the best way to split functionality into classes, etc. This involved some discussions with people with more JS experience.
Research into data browsing interfaces - I already had experience with this, having created the Semantic Drilldown MediaWiki extension, and of course we're surrounded by browsing interfaces whenever we use a computer or mobile device, but it was good to really analyze the different interfaces available, to see what the patterns and commonalities are. There are volumes of research and commentary about data visualization (graphs, charts, heat maps and the like), but surprisingly little about data browsing.
Creation of the software - I wrote about 6,000 lines of Javascript, PHP, CSS and HTML code.
Design of the settings syntax. The software uses the CSV format for reading data, but additionally uses .ini files for reading the data schema (i.e., the type of each field), and other settings around the data display. This .ini file mini-syntax had to be designed, and tweaked as the software progressed.

Midpoint outcomes

What are the results of your project or any experiments you’ve worked on so far?

Please discuss anything you have created or changed (organized, built, grown, etc) as a result of your project to date.

The software is of course the main result, and this demo page shows some demonstrations of how the code, in its current state, can be used for various types of data. Other than that, the big outcome is the knowledge that something like this is possible: a generic, client-side browsing interface for data that really looks and feels like an app, not like a spreadsheet or table, and is usable and fast.

Part of the software is a set of "importers" that can be used to generate CSV files from various sources. Currently three importer scripts exist: one for MediaWiki wikis (including Wikipedia), one for Semantic MediaWiki-based wikis, and one for Wikidata. The first of these, especially, required a lot of testing around different types of real-world data.

Finances

Please take some time to update the table in your project finances page. Check that you’ve listed all approved and actual expenditures as instructed. If there are differences between the planned and actual use of funds, please use the column provided there to explain them.

Then, answer the following question here: Have you spent your funds according to plan so far? Please briefly describe any major changes to budget or expenditures that you anticipate for the second half of your project.

This question seems less relevant to this project, since there are no expenditures to this project per se other than living expenses.

Learnings

The best thing about trying something new is that you learn from it. We want to follow in your footsteps and learn along with you, and we want to know that you are taking enough risks to learn something really interesting! Please use the below sections to describe what is working and what you plan to change for the second half of your project.

What is working well

What has been successful so far? What will you do more of? Please list these as short bullet points.

The development process itself has been going great. jQuery, and Javascript in general, are very robust and powerful technologies, and I rarely got stuck on any one part for very long. Web SQL, as well, works quite well, though the fact that it's run in an asynchronous way has been a constant design challenge.
Similarly, I think the visual interface I've come up with is solid, internally consistent, and user-friendly. However, I've only done limited user testing with it, so the real test will come when it's released to the public.

What are the challenges

What challenges or obstacles have you encountered? What will you change to do differently going forward? Please list these as short bullet points.

One big challenge, surprisingly enough, was coming up with a name for this software - I may have run through 100 or more possible names over the course of the last few months, and rejected all of them for one reason or another. I think I finally have a name now, but finding one took quite a bit more time and effort than I expected it would.
Retrieving data from Wikipedia (through scraping of its infobox data) and Wikidata (via its API) were both challenges, in different ways. For Wikipedia, there is simply no end to the variations that can occur within infoboxes, that all ideally need to be handled in order to standardize the data. For Wikidata, the flexibility of the data structure means that it can be difficult to simply get a single value, given a combination of page name and field. This difficulty is understandable, but it brings up the opportunity to create some add-on functionality, perhaps via another extension, to make this process much more direct.
The biggest challenge, I think, is one that is yet to come: turning the code into a true mobile app, using PhoneGap or a technology like it. Having looked into it a little, this appears to be rather challenging, and not just a matter of running a little bit of code.

Next steps and opportunities

What are the next steps and opportunities you’ll be focusing on for the second half of your project? Please list these as short bullet points.

All of these would be great to have, though I would say only the first two are essential:

A mobile app version of the software.
Support for languages other than English.
The ability to use a (server-side) standard relational database, instead of Web SQL, for larger amounts of data.
Some way to combine filtering with text search results.
A calendar display if the current set of dates has a range of several months.
A "locations near me" feature.
A quiz feature, that automatically creates questions based on the data.
A form within the app that allows for sending requests for changes or additions to the data, to be emailed to an administrator.

Grantee reflection

We’d love to hear any thoughts you have on how the experience of being an IEGrantee has been so far. What is one thing that surprised you, or that you particularly enjoyed from the past 3 months?

The monthly discussion calls have been helpful - it helped to see the software from the perspective of someone from outside the Semantic MediaWiki milieu, and to just talk about things like branding and usability that hadn't come up as much when discussing the software with others with a more technical background.