Research talk:Wikipedia clickstream

From Meta, a Wikimedia project coordination wiki

Comments or feedback about this project are welcome on this page --Dario (WMF) (talk) 19:30, 11 February 2015 (UTC)[reply]

When will the other data become available ?[edit]

Hoi, this is English only right ? Thanks, GerardM (talk) 19:28, 17 February 2015 (UTC)[reply]

This was a one off project and has not been productionized or generalized to other language wikipedias. If you have a request for a set of languages please list them and we will take that into account during quarterly planning Ewulczyn (WMF)(talk) 15:54, 24 February 2015 (UTC).[reply]

Not found[edit]

Hoi, does this include the articles people looked for but could not find ? Thanks, GerardM (talk) 20:02, 17 February 2015 (UTC)[reply]

Do you mean clicks on redlinks? That would be good to include. Actually, for many of the stated purposes, the dataset is of questionable value if it doesnt include clicks on redlinks. John Vandenberg (talk) 21:01, 17 February 2015 (UTC)[reply]
The current release only includes requests for pages in that were in production table The next release will include redlinks Ewulczyn (WMF)(talk) 15:54, 24 February 2015 (UTC).[reply]

Clarification on other-wikipedia[edit]

Thanks so much for putting all of this together! Just to clarify -- am I correct that entries with a prev_title of 'other-wikipedia' could be referrals from either: 1) any page on any namespace in any * project other than enwiki, or 2) any page on enwiki outside the main namespace? Thanks! Staeiou (talk) 22:39, 17 February 2015 (UTC)[reply]

'other-wikipedia' includes referers from the non-main namespaces of english wikipedia and all other language wikipedias Ewulczyn (WMF)(talk) 15:54, 24 February 2015 (UTC)[reply]

Top referrer stats[edit]

I ran some simple descriptive stats on referrers, which are up at Research:Wikipedia_clickstream_top_referrers. Staeiou (talk) 23:38, 17 February 2015 (UTC)[reply]

This is great[edit]

I saw it on Twitter.

Is it possible for a general reader/editor like me to generate an image like this for en:Parkinson's disease, or do I need arcane technical skills? (I'm very old and un-techy) --Anthonyhcole (talk) 01:27, 29 April 2016 (UTC)[reply]

I was asking myself the same thing... Doc James (talk · contribs · email) 02:36, 17 December 2017 (UTC)[reply]
Parkinson's disease – Dec 2017 clickstream
@Anthonyhcole and Doc James: you may have seen the recent announcement of the productized clickstream dataset, which is now available as a monthly dump for each of Wikipedia's 10 largest language editions. User:MPopov (WMF) wrote a nifty visualization app in R that allows you to explore this data. See more examples here.--Dario (WMF) (talk) 22:36, 10 February 2018 (UTC)[reply]

More details on other-internal[edit]

Hi, more details for "other-internal" would be very useful - for example, show language id + article name for the source wikimedia project. It would help Wikipedia contributors to understand when people switch the language in Wikipedia article - generally, it would mean that existing article is not good enough and needs to be improved. Is it possible to do it? --Andy pit (talk) 14:43, 25 August 2020 (UTC)[reply]

update frequency?[edit]

When does this typically get updated? Right now it's already December 14 and the November data is still not there. Should we be worried? :) --Joy (talk) 12:26, 14 December 2023 (UTC)[reply]

Looks like the December run went through on the 19th, while the January run went through already at the 3rd. Would be nice to be able to correlate this to some sort of more information. --Joy (talk) 13:14, 10 January 2024 (UTC)[reply]
WikiNav is still stuck at October, though. --Joy (talk) 13:14, 10 January 2024 (UTC)[reply]
@Joy I just saw this now, so sorry for the late reply. Thanks for flagging this.
  • The clickstream dumps get updated at the beginning of each month. Typically, the latest monthly snapshot is available on the 3rd of the next month. The November-snapshot was an exception as there seemed to have been some problem so that publication was delayed until December 19. The December-snapshot was published as expected on January 3rd.
  • The WikiNav tool checks on the 12th of each month for the latest snapshot to update the underlying data. Due to the delay with the November-snapshot, there was no update in December. With the availability of the December-snapshot, the tool got updated on January 12 (using the December data). So we are back to normal.
I hope this answers your questions. Dont hesitate to reach out if you have follow-up questions. MGerlach (WMF) (talk) 09:45, 16 January 2024 (UTC)[reply]
Thanks! For the future, it would be great if the status of this monthly scheduled job would be transparent. Perhaps if it was published on some URL and linked somewhere from or similar? --Joy (talk) 12:17, 16 January 2024 (UTC)[reply]
I just happened to see this in passing, and I have sent a pull request to add a note about this to the WikiNav readme doc: TBurmeister (WMF) (talk) 21:05, 17 January 2024 (UTC)[reply]
@Joy There is now a note in the readme of the github-repository under Data update frequency. Thanks @TBurmeister (WMF) for sending the pull request - I was planning to add something along these lines too. MGerlach (WMF) (talk) 08:37, 18 January 2024 (UTC)[reply]
Thanks guys, but that doesn't actually do what I asked about :) is the log from the scheduled job somehow private? --Joy (talk) 10:32, 18 January 2024 (UTC)[reply]
I see that I missed your original request, sorry for that. The wikinav backend is hosted on an instance on cloud-vps and this is where the scripts are regularly run to import the latest clickstream data. We currently dont have a pipeline to make the logs of those script publicly available. At the moment, there is no ongoing further development of the tool (as a reminder, the tool was the outcome of an outreachy internship which finished in 2021). You could create an issue in the github-repo with this request and describing your use-case. In this way, this might be picked up by someone in the future (e.g. a hackathon or so); unfortunately, I dont have the capacity currently to work on this. Sorry for not being of more help. Thanks again for reaching out. MGerlach (WMF) (talk) 14:48, 18 January 2024 (UTC)[reply]