Research talk:Improving link coverage
The word "coverage" in the title made me think this was about red links (e.g. Filling red links with Wikidata). Do I understand correctly that you're only studying blue links? --Nemo 08:39, 30 December 2014 (UTC)
Some research questions would be more practically actionable than others without revolutions in the system. For instance, there are rules/habits, often applied via bots, whose efficacy is impossible to verify for editors and that, with serious data, could be changed and have articles updated in a relatively quick time.
- Linking (nearly) all dates and years: how many need those links?
- Not linking a page twice, even if it's a very obscure term last mentioned 100 KB above: how many would need more repeated links, and of what sort?
But see below. --Nemo 08:39, 30 December 2014 (UTC)
How frequently a link was clicked is not indicative of its usefulness, so we lack an easy measure. The references provided don't look useful to answer the question, either. Surely there is past research on the topic? Ideally we'd need to measure:
- when a user clicked a link to A in article B, how often did they reach something they wanted;
- when a user clicked a link to A in segment C of article B, how often were they slowed down by neighbouring links;
- when a user went (or needed to go) to article C while reading article B, but didn't do so by clicking a link, how often C was not linked from B and in those cases how often it could have been?
Or something like that... No idea how one can guess such stuff. --Nemo 08:39, 30 December 2014 (UTC)
I would also be interested in a bibliography of studies for serendipity. Most scientific and corporate work revolves round semantic similarity: that's what all the search engines and advertising/ecommerce players in the world do, there is no need for us to join the mass. As  reminds us, the point of our wikis is serendipity (as in libraries, btw) and we must keep being that way, but there is little work being done in this area (cf. https://terra-incognita.co/). --Nemo 08:31, 31 December 2014 (UTC)
Link Extractor (by RENDER / WMDE) existing tool without additional user tracking
Link Extractor @tools.wmflabs: "The main idea of the Link Extractor is to measure the completeness of articles by comparing the links contained in different language versions of that article. Based on this analysis the Link Extractor deduces which concepts and terms should be covered by this Wikipedia article. In doing so, it indicates which information and links may be missing in the article." --Atlasowa (talk) 22:46, 12 January 2015 (UTC)
Research without additional user tracking
- Kummer, Michael (2014), Spillovers in Networks of User Generated Content: Pseudo-Experimental Evidence on Wikipedia, ZEW Discussion Paper No. 14-132, Mannheim (... Wikipedia prominently advertises one featured article on its main site every day, which increases viewership of the advertised article. Shifts in the viewership of adjacent articles are due to their link from the treated article. Through this approach I isolate how the link network causally influences users' search and contribution behavior. ...) --Atlasowa (talk) 14:48, 13 February 2015 (UTC)
- Wulczyn, Ellery; Taraborelli, Dario (2015): Wikipedia Clickstream. figshare.
Will this study also research the effectiveness and potential of our existing aids to link coverage and topical browsing, namely categories and Special:WhatLinksHere? (Navigational templates are another tool, but I suspect they're being conflated with normal inline links because the HTML is hard to interpret.)
Both tools would use more exposure (for instance StackExchange has a feature equivalent to a Special:WhatLinksHere transcluded directly into the sidebar of each question) and it would be particularly useful to be able to measure their impact, so that we can attempt new things with them and assess the results. On the other hand, it would be silly to make inline links when they would only overlap with existing features. --Nemo 17:24, 8 May 2015 (UTC)
Experimenting on the Wikipedia App
- Isn't that closer to task recommendations than to links? People can't really choose what to click, they are only given few options (or one) and can decide whether to follow them. --Nemo 16:52, 11 May 2015 (UTC)
- "Task recommendations" was a feature by WMF patronizing the editors.
- "Read more" is a feature by WMF patronizing the readers. (The "Read next" version was even worse.)
- I really hate it.
- WMF mobile team decides: You go read this . And: No you can't see categories , and no sideboxes to sister sites for you , and no timelines for you , and no collapsed content , and no links to talk pages  ... --Atlasowa (talk) 21:15, 11 May 2015 (UTC)
- mw:Analytics/Research_and_Data/Showcase#March 2015:
- mw:File:Bob_west_wikipedia_research_showcase_2015-03-25.pdf--Atlasowa (talk) 22:54, 20 June 2015 (UTC)
Given the resources available, it would be nice to first restore a tool which proved to work in the past, i.e. the Connectivity project. The tool identified orphan pages and isolated clusters of pages, aiding the improvement of linkage and therefore of relevant information. On the Italian Wikipedia this helped improve thousands articles, before the tool was destroyed by the murder of Toolserver. Code: https://git.wikimedia.org/tree/labs%2Ftools%2Fconnectivity.git --Nemo 10:10, 12 July 2015 (UTC)
Re: Top 10 suggestions
These seem typical cases of disambiguation, not really of link coverage... they display a failure of disambiguation decisions and/or search. But they also seem very prone to recentism: how long a period was considered? --Nemo 11:00, 6 August 2015 (UTC)
I noticed the "data" used for "improving link coverage" comes from the web server logs. Now that Wikipedia allows you to hover over a link and get a summary; I am quite sure a lot of times these links need not be followed for the reader to gain the necessary insight into the links content; I do see that the hover popups are over XHR; I just want to comment to ensure that this is part of the logs as well, perhaps differentiating between an actual visit and a hover may be an important metric as well.