Research:Evaluating article recommendations for content translation
The goal of this project is to inform the design of an article recommendation service that is used to suggest articles that should be translated from one language of Wikipedia (or other Wikimedia project) to another. The study will focus first on learning how experienced content translators currently select articles to translate, and second on their response to the beta version of the article recommendation service, available through a simple web application hosted on wmflabs.org.
The study will consist of 3-6 study sessions of approximately 45-60 minutes apiece. There will be a facilitator and a participant in each session, and some sessions may also feature a note-taker. All sessions will be video and audio recorded and recordings will be archived publicly, with the participant's consent.
The target participants for this study are experienced content translators who regularly create or expand articles by translating content from one language to another. The study will take the form on semi-structured interviews with questions focused on participants' current practices (in particular, how they decide what to work on), supplemented with questions designed to capture their responses to the articles recommended through the article recommendation service. These questions will be based on existing survey and user study protocols developed to evaluate relevance and user satisfaction of recommender systems in other domains.
Users will be recruited via messages posted on the translators-l mailing list, the Babylon and Translation of the Week portals, word of mouth, and/or talkpage messages (of active translators).
- September-October: recruit participants
- October-November: conduct study sessions, perform analysis, and report results
- add a ‘close’ button to the dialog box
- add a direct link to article for people who don’t want to use contentTranslation
- some recommended articles already translated into target language
Metrics used to drive recommendations
- pageviews is an important metric to some, but not all
- add # of wikis with this article metric (importance)
- add # of articles on source wiki that link to this article (importance)
- WikiProject importance ratings (where available)
- participants want to be able to filter by article length, because sometimes you just want to translate a short article
- some recommendations clearly are relevant to user’s interests, but it's not consistent
- if a participant's edit history is used to drive recommendations (rather than seed), it will be important to filter edit history (for example, only use history of edits to articles where user contributed substantial content)
- participants want to see categories that the recommended article is in
- convert underscores to spaces, support URL encoding consistently
- provide a message when no recommendations are returned
- allow people to ‘re-run’ and get another set of recommendations for the same seed
- explain how these recommendations are made in the interface
Metrics used to drive recommendations
- allow people to view multiple metrics of importance
- allow people to view/filter by other article stats (length, warning templates)
- allow people to view/filter categories used in the recommendation (if they can be used)
- maybe even geo-code?
- recommendation quality is inconsistent
- Sometimes rec’s are relevant to interests and clearly related to seed
- sometimes not relevant to interests but clearly related to seed
- sometimes not clearly related to seed at all
- detect redirects in target wiki: some recommended articles already exist on the target wiki, but under a different name (name used in source wiki is a redirect)
- it’s worth investing some more time in this as a freestanding tool, rather than focusing completely on CX integration
- working on the tool on its own provides more opportunities for testing that can improve the algorithm and also inform the design of the CX interface and other future interfaces
- run some unmoderated tests to determine which categories/topics have the best performance, and which ones have worse performance
- another round of moderated user tests would probably be good, but you might get more useful data at this point with unmoderated tests, which will yield more observations
- consider piloting the tool for a constrained set of articles/categories/topics where performance is best to increase user acceptance
- instrument the platform to support more granular user feedback on relevance of recommended items