Research talk:Characterizing Wikipedia Citation Usage/First Round of Analysis

From Meta, a Wikimedia project coordination wiki
Jump to navigation Jump to search

Definitions needed[edit]

Could you define the terms:

  • upClick
  • extClick
  • fnClick
  • fnHover

It would help to understand the data.--Sphilbrick (talk) 13:08, 3 September 2018 (UTC)[reply]

Thanks, added definitions from the Schema: Miriam (WMF) (talk) 16:46, 4 September 2018 (UTC)[reply]
Thanks.--Sphilbrick (talk) 20:49, 4 September 2018 (UTC)[reply]


Can you help me understand the distinction between the chart "top domains in English Wikipedia references" in the Dimensions of Analysis section, And the chart "top clicked domains in English Wikipedia" in the Breakdown by domain section.

As an example, ranks high in both cases but is the top domain in the second chart but nowhere to be seen in the first chart.--Sphilbrick (talk) 13:24, 3 September 2018 (UTC)[reply]

Thanks for your comment. The first chart shows the most referenced domains in the whole english Wikipedia. It shows that Google Books is the domain which appears more often across all articles in English Wikipedia. The second chart shows the most 'clicked' domains in the reference section, according to the data we collected from readers' interactions with references. I added clearer explanations for both plots. Miriam (WMF) (talk) 11:29, 5 September 2018 (UTC)[reply]
Thanks, I now see the distinction.--Sphilbrick (talk) 15:27, 6 September 2018 (UTC)[reply]

Miscellaneous suggestions[edit]

External links[edit]

As you know, the term "external links" has a specific meaning in Wikipedia: en:Wikipedia:External links.

While I don't see a problem in this sentence: Number of References in Page: we parse all pages to get the number of references with an external link. , because it is clear in context, the section title "Most Visited External Links" Is potentially confusing because it is not about the most visited external links it is about the most visited references. It becomes clear when one gets into the section but I would suggest rewriting the section heading.

Good point! I changed the title to "Most Visited References"Miriam (WMF) (talk) 12:06, 5 September 2018 (UTC)[reply]

Chart format[edit]

In the chart with the title "top templates and English Wikipedia references", all template types beyond the first six are virtually impossible to distinguish. One option would be to break it out with one chart for the top six and another for the remainder. A second option, which I slightly prefer is to use a log scale for the number of references.

Here you actually spotted a mistake! That plot was supposed to show the bottom countries in term of clickthrough rates. I changed it to the correct plot, and added a new subsection to describe the template plot you were referring to, which I changed to log scale as suggested :) Miriam (WMF) (talk) 12:06, 5 September 2018 (UTC)[reply]
Thanks. looks better.--Sphilbrick (talk) 15:28, 6 September 2018 (UTC)[reply]


"Clickthrough" is misspelled in the title and in the file name of "click through rate by country".

Thanks! Changed all plots and requested file renaming. Miriam (WMF) (talk) 12:06, 5 September 2018 (UTC)[reply]


You explained that you discarded requests potentially generated by bots which is understandable. I think that means that the table preceding is that of those bot requests but it might be helpful to clarify. While that may seem obvious, one theoretically could have generated the table for all requests, then identified but requests and removed them. I don't think that's what happened; I think the table was generated for the results after removing the but requests but making this clear would help.

combining domains?[edit]

I realize this is a first pass, and very interesting but as you continue the analysis, I noticed that on the chart "top domains in English Wikipedia references", the New York Times is in second place, but BBC is in both third and fourth places. My guess is that this is because the BBC has a distinction between news and other material while the New York Times doesn't make that distinction. If there was a way to combine the two BBC domains, the results would show that the BBC is second behind Google books and ahead of New York Times.--Sphilbrick (talk) 13:48, 3 September 2018 (UTC)[reply]

Aggregate click rate[edit]

I see the rates by country; my guess is that the overall rate is about 5%, but that would be a useful item to include.--Sphilbrick (talk) 14:10, 3 September 2018 (UTC)[reply]

Breakdown by domain[edit]

Is there a reason the mobile domain shows so prominently in this graph? Is that a bug that needs fixing? Jdlrobson (talk) 22:28, 4 September 2018 (UTC)[reply]

I was also wondering this. --Krenair (talkcontribs) 15:55, 2 October 2018 (UTC)[reply]

Normalised domain graph[edit]

"Despite Google Books being the most popular domain in English Wikipedia references, we find that the top-clicked domain is the Internet Archive's Wayback Machine" intrigued me - would a version of this plot be possible as the number of clicks per citation to that domain? i.e. which domains have the highest or lowest number of clicks per citation? Samwalton9 (talk) 09:12, 6 September 2018 (UTC)[reply]