Research talk:Characterizing Wikipedia Reader Behaviour/Demographics and Wikipedia use cases

From Meta, a Wikimedia project coordination wiki
Jump to navigation Jump to search

Work log



20 %[edit]

20 % is a significant number for language switches in the same session. If it's anywhere near that, or even just above 10 %, I'd say it validates the multilingualism of Wikimedia projects. Nemo 08:33, 19 January 2019 (UTC)

Note that in contrast, mw:Universal_Language_Selector/Compact_Language_Links/metrics/data says "In June 2016 0.2125% of Wikipedia visitors in all languages clicked on interlanguage links. In June 2017 it was 0.4042%." It's not quite clear to me though what definition of "visitors" was used there.
Diego and I talked a bit recently about this difference, but haven't found a clear explanation yet.
Regards, Tbayer (WMF) (talk) 01:42, 4 February 2019 (UTC)

Interested languages[edit]

If you would like to see the study of reader demographics in your language, please fill out the information in the table below.

Language Point of Contact Documents to translate Status
Arabic Abbad Yes check.svg Done
English LZia_(WMF) Yes check.svg Done
Hebrew Amire80 Yes check.svg Done
Polish Aegis Maelstrom Evaluating
Russian Kaganer Yes check.svg Done
Danish fnielsen Feasible - time permitting
Persian Mardetanha Yes check.svg Done
Armenian Davit Saroyan (WMAM) Evaluating (sampling challenges due to monthly pageview counts)
French Rémy Gerbet WMFr and Ilario Yes check.svg Done
Chinese Liang-chih ShangKuan (WMTW) Yes check.svg Done
Romanian Strainu Yes check.svg Done
Ukrainian Юрій Булка Yes check.svg Done
Portuguese Chicocvenancio (pt-br), DarwIn (pt-pt) Feasible (with one translation)
Norwegian Bokmål Jon Harald Søby (WMNO) Yes check.svg Done
Norwegian Nynorsk Jon Harald Søby (WMNO) Very low pageviews which makes it hard to sample.
Basque Theklan sampling challenges due to monthly pageview counts
Italian Ilario Feasible - time permitting
German Ilario Yes check.svg Done
Hungarian Tgr Yes check.svg Done
Sakha (sah) HalanTul Pageviews are too low to make this specific research possible.
Spanish Yes check.svg Done
Japanese Feasible, but missing a local volunteer
Hindi Feasible, but missing a local volunteer
Indonesian Feasible


@Strainu: @Tgr: @Amire80: @AWossink: @Antanana: @Lyzzy: @Shangkuanlc: @Whym: @Kaganer: @عباد ديرانية: @Satdeep Gill: @Racso: @Hasive: Hi. Given that you were the point of contact for the previous round of the study of readers in 14 languages, I am pinging you here to make sure you have seen my recent message to wikimedia-l and you can act on it if you wish.

@LZia (WMF): Thank you for the pinging, I don't monitor Wikimedia-l regularly. I am glad this is moving forward & can't wait for the documents to translate come out! --Liang(WMTW) (talk) 16:01, 7 March 2019 (UTC)

@Mardetanha: Let me know if I should sign up fawiki. I had guilty conscious the other time when I learned that you wanted to act on it but you didn't get a chance to do it. I'd be happy to do the bulk of the translation work if you're generally up for it and you find it useful for the work you all do in fawiki. Of course, if there is someone else from the fawiki community who can help me, I'd love that. Let me know.

@Anthere: In Wikimania 2018, you raised a good question about why we don't include Africa specific languages. Let's change it for this new round of research. Which languages you would recommend we consider? (I can check the traffic to those languages to see if running the survey in them would be possible). Arabic and English were present in the previous survey. I'm hoping we can at least add French, and a couple of more languages of your suggestion though for French I'll need indication of interest from the Wikipedia French language community. :) --LZia (WMF) (talk) 21:37, 6 March 2019 (UTC)

Hi. I would say you could pick up Hausa, Swahili, Igbo, and Yoruba. Other options, Lingala, Luganda, Amharic, Shona. Possibly Afrikaans as well would be an option. Thanks Anthere (talk)
@Anthere: We looked into the languages above. At most we see ~2 million monthly pageviews which means we can't sample the readers for the survey if we want to have enough responses for reliable statistics. The debiasing step to make sure the results are reliable can also run into issues as we will be showing the surveys to everyone, and we know only a very small percentage will participate. We now do have fr, en, ar and we can sample by country to focus on specific countries in Africa and get more data from them. We have decided to go with those languages for this part of the study. If you see a point we have missed, please flag it in the coming week. --LZia (WMF) (talk) 18:24, 2 April 2019 (UTC)
Got you ! Thank you Anthere (talk)
@LZia (WMF): permanent question - where is translatable survey materials, for improving and profreading? --Kaganer (talk) 22:44, 6 March 2019 (UTC)
@Kaganer: The three questions you worked on for the previous study are at https://meta.wikimedia.org/wiki/Research:Characterizing_Wikipedia_Reader_Behaviour/Taxonomy_of_Wikipedia_use_cases#Taxonomy_of_Wikipedia_readers . Feel free to improve them. I'd appreciate if you let me know a summary of improvements you make. The 5 new questions are not fully finalized, yet. Once they're available, we will ping you here and share the link to those as well. I'm looking forward to working with you on this project. :)--LZia (WMF) (talk) 22:49, 6 March 2019 (UTC)

User:LZia (WMF): I can help User:Mardetanha so it would be faster and less work if that's okay for you two. Amir (talk) 15:37, 7 March 2019 (UTC)

@Ladsgroup: That's excellent. Thanks! Let's do it then. (And I see that Persian is already added to the table.) --LZia (WMF) (talk) 17:30, 7 March 2019 (UTC)
more the merrier. Thanks for the offer Amir Mardetanha talk 10:05, 8 March 2019 (UTC)

@LZia (WMF): I can help with Arabic. Should we start by translating the page with the three questions, in the link you provided above? --Abbad (talk) 10:16, 8 March 2019 (UTC).

I have put three languages because for me (and for Wikimedia Switzerland) the community is split in 4 main languages (including English). If this study can be country based, it would be the best. --Ilario (talk) 19:13, 11 March 2019 (UTC)

Translations[edit]

Hi all -- thank you for volunteering and your patience! At this point, we have compiled the translation templates for each language. Below are the templates for each language. Each one has the English text on the left-hand side and space for you to write in translations on the right-hand side. You will need to request access to the document with whatever email address you prefer. If you have any questions or concerns, please do not hesitate to reach out. We did our best to find translations from previous projects or external resources to reduce your burden and these are pre-filled in the templates. You may leave them as is or make updates if there is something obviously wrong with them.

We would like to have this stage completed in two weeks if at all possible, which is 4 June 2019. When you finish, let me know and we can set up a call in order to talk through the translation and answer any questions.

Just a reminder of the project overview: https://meta.wikimedia.org/wiki/Research:Characterizing_Wikipedia_Reader_Behaviour/Demographics_and_Wikipedia_use_cases#Reader_Surveys

Best and thanks again! --Isaac (WMF) (talk) 18:07, 21 May 2019 (UTC)

@Isaac (WMF): access requested. By way, why there was not used TranslateWiki or Translate extension in Meta? --Kaganer (talk) 20:35, 21 May 2019 (UTC)
Yeah, good question @Kaganer:. Two reasons: 1) because we are seeking to use these questions for research, we want to be careful about having a single, agreed-upon translation. Similar to last time with the reader motivation questions, the thought is to do them through a more carefully controlled process and then add them via the translate extension once they are finalized, and 2) we were also aiming for simplicity (keeping all the work that had been done and needed to be done in one place for each language). The privacy policy is actually hosted on foundation.wikimedia.org and so the translations will be added there and the questions/survey language will remain here on meta. Furthermore, the parts of the privacy policy that are already translated for some languages do not break neatly into paragraphs. Given that this workflow worked pretty well last time and we were unaware of a good way to be both simple and careful via meta or translatewiki, we have opted to go this route. Hopefully that makes sense. If not, let me know. --Isaac (WMF) (talk) 21:26, 21 May 2019 (UTC)
@Isaac (WMF): Russian is Yes check.svg Done. --Kaganer (talk) 23:52, 21 May 2019 (UTC)
Hebrew is done. I think it's OK, but it would be nice if anyone could review it. --Amir E. Aharoni (talk) 07:39, 22 May 2019 (UTC)
@Isaac (WMF): Arabic done. Kindly ping me if there's anything else needed --Abbad (talk) 16:18, 25 May 2019 (UTC).
Thanks @Amire80: and @عباد ديرانية:. I will follow up by email with further instructions. --Isaac (WMF) (talk) 12:08, 28 May 2019 (UTC)
@Isaac (WMF): (Traditional) Chinese done. However, I feel the same as @Amire80: stated that it would be nice if anyone could review it. Can I invite someone I know Traditional Chinese to review the google doc? Also, in Chinese Wikipedia, there are two major vairants of writing system, I wonder if I can find someone who is native speaker of Simplified Chinese to help in the translation document? --Liang(WMTW) (talk) 11:29, 28 May 2019 (UTC)
Thanks @Shangkuanlc: -- with the caveat that I am largely ignorant of these matters, the challenge we have is this: the survey extension only allows us to provide a single survey link so we need to choose between Traditional and Simplified but cannot provide both. It seems that last time we provided Traditional only. This also seems to line up with the fact that Taiwan is the largest source of page views for zhwiki. Does it make sense then to just retain Traditional and assume that while not ideal, we can expect that Traditional will be better for more readers than the Simplified? --Isaac (WMF) (talk) 12:11, 28 May 2019 (UTC)
You are welcome @Isaac (WMF):, I am fine with only Traditional Chinese in the survey due to research design limitation. Can I invite another reviewer into the google doc to double check my translation? --Liang(WMTW) (talk) 14:35, 28 May 2019 (UTC)
Thanks for the reminder @Shangkuanlc: yes, inviting another researcher to take a quick look would be perfectly fine. After that then, I will still ask for a short meeting to walk through the translation so I can make sure I'm aware of any caveats or differences I should be aware of when we analyze the results. --Isaac (WMF) (talk) 15:32, 28 May 2019 (UTC)
@Isaac (WMF):, I just invite a reviewer for my translation, @Ffaarr:, please confirm if he can review the Chinese document Thank you. --Liang(WMTW) (talk) 12:54, 1 June 2019 (UTC)

Romanian is done. Strainu (talk) 06:27, 31 May 2019 (UTC)

Launching Surveys[edit]

An FYI for those who have not been following the Phabricator task or been in contact regarding specific language communities. Here is the template I have been asking for to be translated and posted in each language community's Village Pump:

The Wikimedia Foundation Research team is planning to run a follow-up survey of Wikipedia readers. You can read more about the first two parts of the study in the meta page linked below or in the following two papers: https://arxiv.org/abs/1702.05379 and https://arxiv.org/abs/1812.00474. We expect no disruptions in the workflow of editors during this study. The survey will ask readers about their motivation for reading as well as a few demographic questions (age, gender, education, place, native language). The survey aims to improve our understanding of the diversity of readers as well as how the needs and experience of Wikipedia readers varies across different populations.
We plan to run the survey for a week starting on 2019-06-26. It will sample 1 out of every 2 readers.  For questions, feel free to ping Isaac (WMF) or leave a comment on the meta page. Thank you!

We are currently planning to launch surveys in: Arabic, German, English, Spanish, Persian, French, Hebrew, Hungarian, Norwegian, Romanian, Russian, Ukrainian, Chinese.

If your community was not included and you're still interested in running the survey in your community, ping me and we can see about running a second round of these surveys later this summer.

Also, see this work log where I try to provide a bit more context around how we determine whether a language community has enough page views to be able to take advantage of these reader surveys while attempting to not just fully ignore certain regions / language communities.

Hi @Isaac (WMF):. You have written If your community was not included and you're still interested in running the survey in your community, ping me and we can see about running a second round of these surveys later this summer. - I have submitted my community over 4 months ago, unfortunately to no avail and no explaination was given. I don't know your precise criteria but seeing the eligibility of similar language editions I expect Polish to be squarely eligible, and my community and chapter would actually appreciate more insight about our readership base. Thanks for your response, aegis maelstrom δ 18:07, 25 June 2019 (UTC)
@Aegis Maelstrom: thanks for sticking with us. Setting up and running each survey takes a substantial amount of time on our end so unfortunately we just couldn't include all the language editions we wanted to on this first round and stay sane. Sorry for not better communicating that upfront. Polish does indeed have sufficient page views so I'm happy to work with you to include it in a follow-up round. I'm hoping to begin that work in mid-July so you should hear from me then. --Isaac (WMF) (talk) 18:19, 25 June 2019 (UTC)
@Isaac (WMF): Thanks a lot for your prompt answer! Surely, I will wait. As a side note, we may have some resources to e.g. highlight the findings afterwards, and locally communicate the research on wikis better :), for instance like here. Best Wishes, aegis maelstrom δ 18:49, 25 June 2019 (UTC)
@Aegis Maelstrom: excellent! --Isaac (WMF) (talk) 21:39, 25 June 2019 (UTC)

Ratio of English Wikipedia readers per country[edit]

You state "In non-english speaking countries, the number of people visiting English Wikipedia is marginal." This is contradicted by the reported results. In India, where the most spoken language is Hindi (about 58% of the population) and only about 11% of the population counts English as one of its languages, almost all page views are to English Wikipedia. Bangladesh, with only Bengali as an official language, and English very much a minority language, also has almost all page views to English Wikipedia. It appears that at least 12 countries where English is *not* the primary language have greater than 20% pageviews on English Wikipedia. That does not seem "marginal" to me, as an outside eye; in fact, it seems pretty significant. Could you please explain why you used the term "marginal" for this situation, particularly when 2 of the 5 countries with highest percentage of pageviews being in English do not have a comparably large English speaking population? I would really like to understand this. Risker (talk) 00:16, 19 October 2019 (UTC)

Thanks for the ping @Winged Blades of Godric: and thanks for the comment/patience @Risker:. I'll caveat this by saying that I should probably move that content off the page to somewhere that is more pertinent (e.g., this project which has been stalled for a while but I hope to one day pick back up) as it's not directly related to the surveys and it's an analysis that was not done by me / written up by me. So I can't speak to specific word choice but I'm happy to work with you to update this in the meantime. In general I would agree that the statement is too general for the data -- i.e. while many countries where English is not a primary language have relatively little page views to English Wikipedia, there are also plenty of counterexamples as you raised. I also suspect that "non-english speaking countries" was meant more broadly than you are interpreting though too. For instance, with India I would say it's reasonable to say that English is a main language even though there are plenty of arguments to the contrary. And same with Bangladesh, where it looks like English is a compulsory subject in schools even if it's not a primary language. Would something like this feel like it reflects the data better: "While English is the dominant language edition read in countries where English is also a primary language -- e.g., Australia, Canada, United States -- there are several countries where English arguably is not the primary language but is taught in schools and the plurality of page views still go to English Wikipedia -- e.g., India, Bangladesh, The Netherlands. There are also, however, many countries with high readership where less than 20% of page views go to English Wikipedia." --Isaac (WMF) (talk) 20:46, 19 November 2019 (UTC)