Research talk:Characterizing Readers Navigation
Feedback on the first draft
@MGerlach (WMF): thanks to you and the collaborators for putting this page together. I did a pass over it (as a todo that came up in our recent 1:1 meeting) and here is my feedback:
- The research plan is touching some of the fundamental open questions we have about Wikipedia readership. These are the type of questions we sometimes have a harder time to open time for, and it's great to see them being surfaced in this research. There are a few ways I encourage you and team to improve the research plan (see below).
- Divide up the time horizon ahead of you into n chunks and for each horizon commit to some deliverables. Document this on the meta page. I understand that this research is currently at a highly exploratory stage and the commitments should reflect that. Also, the commitments can change over time as you all learn more. You can start planning for the next 9-12 months for example, and figure out with the team what makes sense to commit to deliver every 3 months.
- Related to what you commit to, I have a few suggestions beyond the direct outputs from the research:
- Please have a chat with Jkatz_(WMF) and see if there are intermediate outputs that can be helpful for them. Jon has specifically flagged that any effort that can help them have access to reading session stats on a more regular basis and with deeper information about the sessions can be helpful for them. Please coordinate with KZimmerman_(WMF) to make sure she's in the loop and can consider ways the codes you will have can be more broadly used in Product Analytics. (Note: some of your deliverables can specifically be: session code for team x.)
- Consider sharing your learnings broadly every some milestones. If this is a project that will take 12 months or more to complete, I highly recommend considering writing posts in Medium or the Tech Blog about your learnings. This supports the Communications team in their efforts and makes sure what you learn is made available to a broader audience in a shorter time intervals.
- MNovotny_(WMF) and I are exploring ways for prototyping research ideas/outputs together. Her team is in a really good position to be able to help us bridge the gap between research outputs and product ideas. Keep me posted with what you learn and we can approach them when you're ready. (We do need to put this in the quarterly plans and communicate with them before the start of the quarter.)
- As of working with the webrequest logs: I highly encourage you to do the explorations on fresh copies of the webrequest logs and do not copy the data from one period of time and keep it for longer than 90 days as much as possible. While we strip the data from IP address and UA in these cases, still it's better practice not to keep this data for long, unless we go through a similar process that we went for the COVID-19 related pageviews. I understand that working with fresh data is not always possible and you may need stability over a span of time. However, given the size of the team working on this project, I'd expect it to be possible to allocate spikes of time to the project in which within a 90-day period specific questions can be answered and data can be discarded. For the publication, of course, we need to work with data that we can keep for a while (until the work is published) but if you have the codes and findings ready, you can re-run them on the fresh data closer to the submission time.
- Linked articles, frequency of links
We do a lot of linking of articles in Wikipedia, we even stress about "orphan" articles that no other article links to and underlinked articles with few or no links to other articles. Our gender gap task force is concerned that articles on women tend to be linked to less from other articles than articles on men. All this is based on the assumption that our readers find these links useful and click on the links in Wikipedia articles. I'm not aware, perhaps someone has already done this research, but it would be good to know:
- what proportion of our readers ever click on links to other articles. How many do so repeatedly?
- What sort of links do people find useful and are likely to click. On the English Wikipedia we already deprecate overlinking to some very common terms en:USA, en:2020 and other calender years. Is this deprecation serving our readers well?
- We deprecate repeatedly linking to the same other article, again is this in accord with our users behaviour? Do people sometimes find it more useful to click on the link when they can't understand a paragraph without reading the link rather than the first time the link appears.
- Desktop v mobile
- There are really two sets of Wikipedias, the desktop view and the mobile view. Is our readership as different between these two as our editorship is?
- Article length both in bytes of text and bytes downloaded
- Some of our articles get very long, and sometimes we have them split up "for the sake of our readers". Do our readers actually have a problem with a given length of article? If so what length should we aim for, and does this vary between the mobile and desktop platforms?
- some of our readers have fast reliable wifi, some are on pay as you go or the apocryphal dial up connection in an internet cafe in a refugee camp. What effect does the number of bytes downloaded for our largest articles have on this, and are there any rules on maximum article size or maximum section length that we could use to make our policies more evidence based and reader friendly?