Research talk:Improving link coverage/Release page traces

From Meta, a Wikimedia project coordination wiki

Special pages[edit]

I don't think special pages should be excluded altogether. Special:Whatlinkshere, for instance, is an integral part of the navigation experience of a MediaWiki wiki. What's the point of excluding them? The anonymization method should work equally for special pages, AFAICS. --Nemo 08:53, 1 August 2015 (UTC)[reply]

That's a good point. We should think more about this and see if we want to include some special pages.
  • I agree with you that the anonymization method can stay the same if we include a special page such as Special:Whatlinkshere.
  • One issue I can think of is that if the special pages are not used very frequently, we will loose traces when anonymizing the data if we include special pages in the trace. For example, consider a trace such as {a_1, a_2, wlh, a_3}, where a_i is article i and wlh is a Whatlinkshere page. If wlh is clicked very few times, we won't share the trace even if a_1, a_2, a_3 are each viewed more than the to-be-specified threshold. This may or may not be a problem depending on what question you want to answer with the data. If you are interested to learn how all links are used in Wikipedia, for example, excluding special pages altogether can be problematic. If, on the other hand, you are interested in article usage, removing Special pages won't be as important. --LZia (WMF) (talk) 18:57, 5 August 2015 (UTC)[reply]
Tracked in Phabricator:
Task T108085
Well, yes, I see your point. Adding more information, i.e. more value, means that one crosses the line of anonymisation more often. However, what matters is that the data is more valuable and more truthful, even if less vast.
Other comments have focused more and better on specific things to look at, including issues with many other special pages; but traces containing one of those linked in the sidebar, i.e. WhatLinksHere, RecentChangesLinked, SpecialPages, as well as action=info, are certainly interesting.
Moreover, stats:wikimedia/squids/SquidReportOrigins.htm reports a lot of people referred by sister projects. We have sister projects templates and sidebars, also sister projects search in Italian subdomains: traces can be interrupted and made meaningless by a very normal stop, say, on a Wiktionary definition, can't they? It would be better for traces to be cross-wiki. --Nemo 23:13, 22 August 2015 (UTC)[reply]

Staff feedback[edit]

I asked some of the staff to leave feedback here and for some I did not specify which of the Research or Discussion pages. Since Ellery and Aaron have already started leaving comments in the Research page, please enter your comments in that main page. Thank you! --LZia (WMF) (talk) 21:23, 14 August 2015 (UTC)[reply]