Research:Recommending links to increase visibility of articles
This page documents a research project in progress.
Information may be incomplete and change as the project progresses.
Please contact the project lead before formally citing or reusing results from this page.
In order to support newcomers in their first edits, the Growth Team has been developing the Structured Tasks framework. Structured tasks break down the editing process into smaller steps that are easily understood, easy to use on mobile devices, and can be guided by algorithms. The first structured task that was implemented was add-a-link, which has been deployed to 4 wikis (arwiki, bnwiki, cswiki, and viwiki). Results from those wikis have been encouraging (T277355) -- with only 6.2% of edits from recommended links being reverted. Therefore, we would like to implement other types of tasks that are part of editors’ workflows.
One idea is to further develop the structured task on adding links. The current add-a-link framework is simple (it suggests text and the link) and the priority is given to the action of adding links rather than the value of the added link. Here, we want to add new incoming links to articles in order to increase their visibility. For example, there still exist many orphan-articles, i.e., articles without any incoming links, which cannot be reached from any other Wikipedia page. This is a much more difficult editing task, since we have to add the link to our target article in the text of a different article, the source article.
Key aims of the project:
- Thrust 1: Understand better which articles require new links to address structural gaps on Wikipedia as well as assess the quantitative impact of addressing these gaps.
- Thrust 2: Develop an algorithm for a structured task to suggest links to orphan (or similar) articles to increase their visibility.
Background[edit]
(Zhu et al., 2020) [1] show that improving articles as part of campaigns can lead to significant, substantial, and long-term increases in both content consumption and subsequent contributions. More importantly, in this context, they show that they find that there are also significant spillover effects in the increase in attention to downstream hyperlinked articles.
(Wagner et al. 2015; Wagner et al. 2016) [2][3] investigated the gender gap in the content of Wikipedia articles. They showed that in addition to an underrepresentation of women in the number of articles, there are also substantial structural biases in the way articles on women are connected in the hyperlink network. For example, women biographies are less central in the network quantified, for example, through their consistently lower values in PageRank. This results in lower visibility. (Langrock&Gonzáles-Bailón 2020) [4]systematically investigate how campaigns such as Art+Feminism are able to address these biases. They find that they are generally successful at improving the content of a target-page, but fail to improve the visibility (number of inlinks).
There is a {{Orphan}} maintenance template that tracks articles that are not linked from any other article (without incoming links). The category Category:Orphaned articles lists these articles. As of 2021-08-20, there are about 90k articles listed in this category. The template mentions the Find link tool, though for a few examples I tried, it did not yield any suggestions.
Takeaways:
- Incoming links are important for the visibility of articles
- There are many articles that lack incoming links, either as part of a structural bias or because they are simply orphans
- In contrast to other biases, existing campaigns are not as successful in addressing these biases
- Machine learning algorithms can help empower editors to address these issues by generating good recommendations
Methods[edit]
Recommending links to increase the visibility of articles can be broken down into 3 steps:
- Identify articles that are lacking incoming links (these are the target pages of the new links, for example orphan-articles)
- Identify candidate article from which to link to these articles (these are the source pages for the new links)
- Identify potential locations in text of the source page where to insert the link to the target page. This might be specific words, or sentences, or sections where we assume the link should be added. This is most likely very challenging as suitable anchor text for the link might not yet exist. Thus, adding the links will probably also involve adding some text.
Timeline[edit]
- Exploratory analysis
- Developing a prototype model and evaluation
- Potential refinement
Proposed Approach: Link-translation for orphan articles[edit]
As a first prototype, we consider a simple approach to this problem:
- We restrict ourselves to orphan articles as target pages. Without any incoming links, those articles are not visible from within Wikipedia; thus, adding any incoming link will increase their visibility.
- We generate candidate links from inspecting all other language versions of Wikipedia. Specifically, we check whether there is an existing link to the target page in any of the other Wikipedias. If yes, we will identify the matching article in the corresponding language and recommend that link. This corresponds to "translating" an existing link from one language to another language version.
- (optional) Recommend the translated section. Since we recommend existing links from other languages, we can recommend a suitable location for that link in the text. For example, we first identify the section-title where the already existing link is located. Using, e.g. the section alignment tool, we can identify a suitable section for the language of interest.
More details with a first exploratory analysis can be found here: Research:Recommending links to increase visibility of articles/Link-translation#Exploratory analysis
Results[edit]
We evaluated the link recommendations for new incoming links to orphan articles (de-orphanization) via the link translation approach and compare it with different baselines (editor tools, heuristics, embeddings). As ground truth consider all incoming links added to articles that were orphans in Jan 2022 but were de-orphanized in Feb 2022. We evaluate each method in its ability to predict the newly added incoming links using metrics Recall@k and Mean Reciprocal Rank (MRR). Our link-translation approach considerably outperforms all the baselines. Specifically it provides the best suggestions to de-orphanize articles in the following scenarios:
- low-resourced languages (macro average is as good as, and even stronger than micro average, indicating that link-translation performs equally well, and in fact, better for languages with fewer resources)
- lower values of k, i.e. when considering only few top suggestions. (a recall@1 of 15% is a remarkably strong outcome)
Detailed results can be found in Research:Recommending links to increase visibility of articles/Link-translation:Evaluation
See also[edit]
Subpages[edit]
Pages with the prefix 'Recommending links to increase visibility of articles' in the 'Research' and 'Research talk' namespaces:
Research:
Research talk:
- ↑ Zhu, K., Walker, D., & Muchnik, L. (2020). Content Growth and Attention Contagion in Information Networks: Addressing Information Poverty on Wikipedia. Information Systems Research, 31(2), 491–509. https://doi.org/10.1287/isre.2019.0899
- ↑ Wagner, C., Garcia, D., Jadidi, M., & Strohmaier, M. (2015). It’s a man's Wikipedia? Assessing gender inequality in an online encyclopedia. Ninth International AAAI Conference on Web and Social Media. https://www.aaai.org/ocs/index.php/ICWSM/ICWSM15/paper/viewPaper/10585
- ↑ Wagner, C., Graells-Garrido, E., Garcia, D., & Menczer, F. (2016). Women through the glass ceiling: gender asymmetries in Wikipedia. EPJ Data Science, 5(1), 1–24. https://doi.org/10.1140/epjds/s13688-016-0066-4
- ↑ Langrock, I., & González-Bailón, S. (2020). The Gender Divide in Wikipedia: A Computational Approach to Assessing the Impact of Two Feminist Interventions. https://doi.org/10.2139/ssrn.3739176