Research talk:Characterizing Wikipedia Citation Usage/First Round of Analysis
Could you define the terms:
- Thanks, added definitions from the Schema: https://meta.wikimedia.org/w/index.php?title=Research%3ACharacterizing_Wikipedia_Citation_Usage%2FFirst_Round_of_Analysis&type=revision&diff=18353791&oldid=18350171 Miriam (WMF) (talk) 16:46, 4 September 2018 (UTC)
Can you help me understand the distinction between the chart "top domains in English Wikipedia references" in the Dimensions of Analysis section, And the chart "top clicked domains in English Wikipedia" in the Breakdown by domain section.
As an example, books.google.com ranks high in both cases but web.archive.org is the top domain in the second chart but nowhere to be seen in the first chart.--Sphilbrick (talk) 13:24, 3 September 2018 (UTC)
- Thanks for your comment. The first chart shows the most referenced domains in the whole english Wikipedia. It shows that Google Books is the domain which appears more often across all articles in English Wikipedia. The second chart shows the most 'clicked' domains in the reference section, according to the data we collected from readers' interactions with references. I added clearer explanations for both plots. Miriam (WMF) (talk) 11:29, 5 September 2018 (UTC)
As you know, the term "external links" has a specific meaning in Wikipedia: en:Wikipedia:External links.
While I don't see a problem in this sentence: Number of References in Page: we parse all pages to get the number of references with an external link. , because it is clear in context, the section title "Most Visited External Links" Is potentially confusing because it is not about the most visited external links it is about the most visited references. It becomes clear when one gets into the section but I would suggest rewriting the section heading.
- Good point! I changed the title to "Most Visited References"Miriam (WMF) (talk) 12:06, 5 September 2018 (UTC)
In the chart with the title "top templates and English Wikipedia references", all template types beyond the first six are virtually impossible to distinguish. One option would be to break it out with one chart for the top six and another for the remainder. A second option, which I slightly prefer is to use a log scale for the number of references.
- Here you actually spotted a mistake! That plot was supposed to show the bottom countries in term of clickthrough rates. I changed it to the correct plot, and added a new subsection to describe the template plot you were referring to, which I changed to log scale as suggested :) Miriam (WMF) (talk) 12:06, 5 September 2018 (UTC)
"Clickthrough" is misspelled in the title and in the file name of "click through rate by country".
- Thanks! Changed all plots and requested file renaming. Miriam (WMF) (talk) 12:06, 5 September 2018 (UTC)
You explained that you discarded requests potentially generated by bots which is understandable. I think that means that the table preceding is that of those bot requests but it might be helpful to clarify. While that may seem obvious, one theoretically could have generated the table for all requests, then identified but requests and removed them. I don't think that's what happened; I think the table was generated for the results after removing the but requests but making this clear would help.
I realize this is a first pass, and very interesting but as you continue the analysis, I noticed that on the chart "top domains in English Wikipedia references", the New York Times is in second place, but BBC is in both third and fourth places. My guess is that this is because the BBC has a distinction between news and other material while the New York Times doesn't make that distinction. If there was a way to combine the two BBC domains, the results would show that the BBC is second behind Google books and ahead of New York Times.--Sphilbrick (talk) 13:48, 3 September 2018 (UTC)
Aggregate click rate
Breakdown by domain
Normalised domain graph
"Despite Google Books being the most popular domain in English Wikipedia references, we find that the top-clicked domain is the Internet Archive's Wayback Machine" intrigued me - would a version of this plot be possible as the number of clicks per citation to that domain? i.e. which domains have the highest or lowest number of clicks per citation? Samwalton9 (talk) 09:12, 6 September 2018 (UTC)