Research:Understanding Engagement with Images in Wikipedia
In this project, we study how readers engage with images in Wikipedia. Our aim is to leverage the server logs to provide a quantitative description of readers interaction with multimedia content in Wikipedia articles, in particular with images. We will break down our analysis by several dimensions: we will explore how page and image types impact on readers engagement, and how images are accessed from different geographic areas. This project will be kicked-off as part of a 12-week internship at Wikimedia Research.
How do we measure engagement with images? Given data availability and literature, our first goal is to find a few metrics that can be useful to measure engagement. A good candidate is the number of pageviews that convert to click on images.
To what extent are readers engaging with images, and which images tend to be more engaging? Here, we will perform a large scale analysis of how readers engage with images. We will break down this analysis by page and image types. Computer vision technology will be employed to classify images into topics. Here we want to answer questions like: which % of readers engage with images? are certain topics or image types (quality, subjects) more engaging for readers? Are visual factors such as image quality impacting readers' engagement with images? Are article factors such as article "completeness" impacting reader's engagement with images?
Are readers from certain locations/language communities more prone to engage with images? Here, we want to get deeper into the role of language and location for visual content engagement. We will perform an analysis of the location of clicks VS language edition. We want to see here whether images are accessed prominently by e.g. non-native speakers or people coming from certain geographic areas.
Are image useful to increase (new)readers engagement? Lastly, we want to run an experiment to see how comparable articles with/without images impact engagement metrics such as dwell time, session length, visit frequency.
We base our study on the server logs available in the webrequest logs table from which we collect pageviews and imageviews for each reading session. We identify reading sessions by concatenating client_ip+user_agent.
First round of analysis
The first round of data analysis has been performed in May-July 2020. We started a quantitative analysis of how readers engage with images in Wikipedia. To do so, we first defined two key metrics of readers engagement: the page-specific click-through rate and the image-specific click-through rate. We computed these metrics after collecting two weeks of data for four Wikipedia language editions (English, French, Spanish, and Arabic), and breaking down our analysis by several dimensions: country, topic, and access method (desktop or mobile web).
- The average page-specific click-through rate shows a weekly pattern with an increased probability of clicking on images over weekends with respect to weekdays. Moreover, it is 3.5% for English, 3.7% for French, 2.9% Spanish, and 2.2% for Arabic Wikipedia. For English Wikipedia, it is ten times higher than for citations;
- The Main Page plays an important role in increasing image views: images placed on the Main Page are viewed 60 times more on average than the rest of the images;
- There are significant differences in the way readers engage with images based on the topic of interest.
More details on this first round of analysis here.