Jump to content

Grants:Project/Arc.heolo.gy

From Meta, a Wikimedia project coordination wiki
statusnot selected
Arc.heolo.gy
summaryA highly functional, concept-oriented approach for browsing and discovering knowledge within Wikipedia.
targetEnglish Wikipedia (initially)
type of granttools and software
amount13000
granteeianseyer
contact• ian.seyer@gmail.com
this project needs...
volunteer
affiliate
join
endorse
created on20:58, 10 October 2016 (UTC)

Project idea

[edit]

What is the problem you're trying to solve?

[edit]

Arc.heolo.gy is intended to visually impress and functionally engage people in the exploration of knowledge. One of the more familiar ways people utilize Wikipedia is through clicking - searching for an article, and clicking the articles until they find what they are looking for (and hopefully learning along the way). This project seeks to provide a functional way to approach wikipedia from a conceptual paradigm rather than looking for a single point of information by constructing a searchable interrelation network. This kind of approach can be used to find all sorts of interesting relationships that would not have otherwise been apparent. There would be several "modes" of exploration: article to article, i.e. find all paths between two articles and rate them by relevance or conceptual concision, or simple "ambulation": explore Wikipedia from a "birds eye view," allowing people to see what topics are connected to others.

As methods of knowledge production and dissemination expand, there is a notable lack of new interfaces that allow for functional exploration of the breadth and depth of information available.

In short: the current single-article user experience for Wikipedia is seriously limiting for getting a comprehensive understanding of how various concepts are intertwined with one another. E.g. if I wanted to learn about quantum logic gates, it could easily take me digging through multiple articles before realizing that I missed a vital concept, and need to traverse back to the original and then dig deeper in another direction.

What is your solution?

[edit]

Arc.heolo.gy will be a graph database created from the entirety of Wikipedia. By treating articles as nodes and linked articles as relations, various path finding and semantic parsing algorithms can be utilized to intelligently explore similar articles.

I personally have already created this graph database for English Wikipedia (a dataset of about 8 million "nodes" after filtering, and about 32 million relationships), albeit a beta version, that performs pathfinding and parses the wikipedia xml dataset. It is informative, fast, and novel.

An example query can be seen in the image on the right, expressing a few of the shortest paths between the article "C++" and "Java." (with a false user interface that is merely meant to reflect content, and does not in any way reflect desired design). From this example output you can see that immediately, you are given a quick, densely related network of potential avenues for exploration.

Included is an example query performed on actual data using the current beta graphing library. The design is meant only for conceptual purposes.

Project goals

[edit]

Within 8 months, our goal is to produce an intuitive and powerful tool for understanding the topography of information and reaching new understandings and conceptualizations based on already existing data. Further, we hope to provide a "birds-eye view" of Wikipedia, while also providing statistical analysis tooling to better understand sections of Wikipedia.

In addition, we would like to invigorate a community around beautiful representations of knowledge, while providing an open source graph visualization tool that is generic enough to be applied not only to other Wikimedia projects (e.g. Wikidata), but any graph database at all.

Project plan

[edit]

Activities

[edit]

Build a reliable deployment strategy for high volume graph databases, a semantic parsing library for wikipedia, and a visually expressive and concisely designed 2d (and hopefully 3d!) interface to query and explore sections of this database.

  • Lay a strategic, open source framework for deploying high volume graph databases at scale.
  • Build a semantic parsing library for Wikipedia which updates the dataset and reconstructs the graph database using intelligent parsing (mostly complete).
  • Build analytics software to analyze usage and relational data.
  • Construct a beautifully expressive 2d (and potentially 3d) user interface to query and explore sections of the database.
  • Target our audience through publicly usable installations either at libraries, universities, galleries, or museums.
  • Build community awareness through hack-a-thons, promotions within the tech community, blogs and journals, and various online communities.
  • Approach the WikiMedia community to best determine how to integrate the project (assuming its success) into the most appropriate project and in the most appropriate fashion.

Budget

[edit]
  • Project Manager/Lead Developer: $3,500
  • Devops/Hosting for one full year: $2,000
  • AngularJS 2D frontend: $1,500
  • Unity-based VR consulting, development, and materials: $4,000
  • Marketing/Events: $2,000 USD

Community engagement

[edit]

Our end goal in interacting with the community is to the inspire (and satiate) curiosity while building a profound awareness of the important work that Wikimedia and it's affiliates are doing.
As most of us are college graduates, artists, and professional programmers, we will speak with several universities, galleries, museums, and tech companies to set up installations of the product around campuses, while also doing online marketing to amplify outreach.
More specifically, we would reach out to student groups interested in computer science, database technology, server deployment optimization, data science, and visualization software to promote a more intelligent and powerful usage of the information available via Wikipedia.
We would also approach artists - we believe that there are several artistic applications for a network literally made of knowledge and its relations.
On the consumer end, we believe that this tool would be a sincerely functional tool for browsing Wikipedia, and would push for community adoption and feedback.
From a tooling standpoint, we would construct the software in a modular, reusable way to promote community developer adoption and maintenance by allowing our visualization libraries to be highly adaptable to nearly any graph application (e.g. usage upon a graph of Wikidata, academic references and sources, etc).

Sustainability

[edit]

The core graph library itself will be designed to be self-updating (in terms of updating datasets and optimizing relationships in the database), with both the graph and visualization libraries' codebase hosted publicly and following typical community development and contribution practices to allow for both broad community involvement and public awareness of the projects progress.

The most difficult (read: costly) part of the application once development is complete and we have a stable release candidate will be hosting. The current budget would allow for a year of hosting assuming we can reach a reasonable level of efficiency in our load balancing and resource consumption, which I believe our team is capable of. That gives us ample time to apply for hosting grants, or send out requests to various tech companies to ask for complimentary hosting for the project. In addition, since the entire codebase will be public, it is able to be hosted by anyone willing to do so.

Measures of success

[edit]

While we hope to exceed these numbers, we do believe that we can reasonably provide this level of usage and engagement for our project:

  • Getting over 100,000 unique users within 1 year of release
  • Hosting events at a minimum of 5 universities or libraries with a strong focus on our project within a year of release
  • Active development including forks and improvements on our publicly hosted codebase
  • Direct linkage to our project from Wikimedia/pedia itself for any duration of time upon community approval
  • At least 500 clicks to our "donate to wikipedia" link after 1 year

Get involved

[edit]

Participants

[edit]

Ian Seyer is a professional backend developer specializing in APIs, application infrastructure, and graph technology. He is also the lead designer for the project, with a background in fine arts.
Delwin Campbell is a phenomenal linguist and AngularJS developer/designer who is familiar with the goals of the product and has pledged to help.
Nick Shelton is a talented 3d programmer working in computer vision, and has worked on large social-graph visualizations before with Ian.
Steven Wilkinson is a machine learning programmer working for his own startup. He has extensive knowledge of semantic parsing via word2vec, and has pledged to assist in providing semantic parsing capabilities (to weight the graph by relevance).
Stella Tigre is a Unity employee who has pledged to assist in the development of the VR experience.

Between our 5 people (and their networks), we believe that we will be able to sustainable bring this project to life including: hosting, the core graph library, and both 2d and 3d visualizations thereof.

Community notification

[edit]

https://www.reddit.com/r/wikipedia/comments/56uojx/23d_graph_visualization_parsing_library_for/
https://www.reddit.com/r/dataisbeautiful/comments/56uqfm/critique_feedback_wanted_on_grant_proposal_for_a/
https://www.reddit.com/r/Neo4j/comments/56us85/critique_looking_for/

wiki-tech, wikidata, and wiki-research mailing lists also have threads on both the Round 1 and Round 2 proposal

Endorsements

[edit]

Do you think this project should be selected for a Project Grant? Please add your name and rationale for endorsing this project below! (Other constructive feedback is welcome on the discussion page).