Research talk:Understanding Wikidata's Value/Work log/2017-05-18

From Meta, a Wikimedia project coordination wiki

Thursday, May 18, 2017[edit]

Yesterday, I refactored scripts to extract/aggregate wikibase usage from wikis. These utilities should be (nearly) done.

Today, I'm moving on to work on a script to download Wikipedia page view data for a specific range of time. The user of this utility should be able to specify a start time (down to the hour) and end time (down to the hour) and download all page view logs for that given period of time. The implementation of this will involve parsing the html of the dumps.wikimedia.org pages listing all of the page view logs and matching only logs that fit the start and end criteria.

For our own purposes, I've been using a shell script to download a year's worth of data needed for our study. It takes about 3 days to download that much data with that script. We've been aggregating each month's page views using mwviews aggregate. It takes about a day to aggregate one month of views.

Today, I'll also finish my report of work presented at CHI 2017 (I attended last week).