RENDER/Berlin summit 2012/backend

From Meta, a Wikimedia project coordination wiki

Notes taken during the Wikidata/RENDER summit 2012 - day 1 - concerning backend issues.

Overall architecture & org plans[edit]

(last slide of http://commons.wikimedia.org/wiki/File:Hackathon2012-RENDER.pdf ) Discussion - focus on whether Wikimedia Labs is suitable for this kind of project or not.

Kai, Angelika, Sam Reed

WebAPI will be on the Toolserver - 2 diff kinds of results it can provide. 1st, directly from WM databases, such as link extractor tool others, need to pre-analyze some data, will be kept in RENDER database to have data in database, in red, are the tools - just 1, Change Detector, on a regular basis computes data & puts it in the database for regular analysis & pushing data into db, scheduler will probably live on the Toolserver; not sure whether that is the right place for this kind of application


this is an early architecture draft -- still having small problems with the change detector - not written well - performance optimization needed


Sam suggests looking at the IRC feed of recent changes there is one for each Wikimedia project. Some naming in channels is not uniform

Kai: We need changes -- all changes. Maybe we should build up a database of changes from all Wikipedia language versions

Databases for lang versions are hosted on different database servers so change detector is switching all the time maybe we can optimize....

What are Johannes's plans?

(explanation of ChangeDetector)

(got Erik Zachte, Diederik van Liere, & David Schoonover to stop by)

Zachte talked about some tool(s) - aggregated versions. Z has something in the works - aggregates over a month, easier to download.

(Stephen LaPorte enters)

(discussion that Sumana missed about analytics)

(discussion of the same microtask/global profile/Athena stuff Brandon mentioned yesterday)

Erik Zachte mentioned WikiDashboard extension to MediaWiki Edchi made it wikidashboard.appspot.com uses page history directly you can find recent history of your actions

There are lots of extensions that do basc stats, many on Toolserver

(Schoonover & van Liere left)

You can get almost anything via the API, but only at a certain rate. Perfect for, say, all the articles in a current category. Limit - something like 500 or 1000 revisions.... there are large articles that have 20,000 revisions. Limit is transfer speed. Some of these, you have to have the entire page before you can compute anything.

Toolserver - you can set it up to get entire rev history of a page


Stephen mentions toolserver.org/~tparis/articleinfo (was a tool of X!, now TParis is hosting it) get stats about the history of an article


vs.aka-online.de/cgi-bin/wppagehiststat.pl ? http://vs.aka-online.de/cgi-bin/wppagehiststat.pl http://ortelius.toolserver.org:8088/history/en/wp/Coffee Some data via json api: http://ortelius.toolserver.org:8088/revisions/Coffee (Some source on github: https://github.com/slaporte/revisions)

Article quality metrics collection tool: https://github.com/slaporte/qualityvis/blob/master/gadget_node.js


Review of article quality metrics https://docs.google.com/spreadsheet/pub?key=0AiXyciD1QwGSdFVxVjlWM09lSS1UQ2dVS2NmeUROTVE&output=html

At least half of our articles have been assessed via ArticleAssessment LaPorte is trying to collect data on every assessed page

(enter Yuvi Panda)

discussion of Article Assessment data, Twinkle, Huggle

Tool to grab data (shown by LaPorte) written in node.js and available on GitHub under user slaporte also gathering some external results, like Google results.... to get things like ratio of links to Google rank? (unclear)

Stephen is learning data analysis github.com/slaporte

Angelika: we could use external sources to find new news in the world that needs reflecting on Wikipedia

(Yuvi explanation of SelectionSifter; hopes to restart work on it in July)

(Yuvi talking about hackdays in India, using MediaWiki API, example of Chennai) https://www.mediawiki.org/wiki/Chennai_Hackathon_March_2012/Ideas

Suggestion: turn Brandon's UI suggestions into a wishlist for others to work on https://www.mediawiki.org/wiki/Annoying_large_bugs problem - many of these cannot be done by volunteers some of the "see also" links have pretty old content Yuvi suggests a central wishlist clearinghouse for tools

https://meta.wikimedia.org/wiki/RENDER - to improve

Angelika started page on German Wikipedia to involve community. They want usability feedback they want to spec out needs from end users, especially editors

Next step: end users, especially readers normal reader - able to use this?

TURN READERS INTO EDITORS!


Article Feedback Tool[edit]

  • Usage of the article feedback tool and the API
  • applied to every article, blacklists for disambiguation pages due to community
  • ~30k posts per day
  • works on english, chinese and spanish wikipedia
  • dario: what is driving readers to submit feedback? (ey question)
  • articles with average traffic don't get enough ratings
  • Dumps are unfortunately incomplete
  • live data is on toolserver
  • http://en.wikipedia.org/wiki/User:Protonk/Article_Feedback
  • Asthon Anderson (Stanford)
  • much more feedback is given if users are allowed to comment
  • different designs lead to different types of feedback
  • many users want to consume, but they're not willing to contribute
  • useful comments should be used to improve the articles

Research: http://meta.wikimedia.org/wiki/Research:Article_feedback API: http://en.wikipedia.org/w/api.php?action=query&list=articlefeedback&afpageid=21685577&afuserrating=1&format=json&afanontoken=01234567890123456789012345678912 (substitute articleid) Read AFTv5 feedback: http://en.wikipedia.org/wiki/User:Slaporte/FeedbackTab.js