Research talk:Increasing article coverage

From Meta, a Wikimedia project coordination wiki

Let's discuss this project here. :-)

Known problems
Some users received the experimentation email even though they do not speak French. We are sorry about that.
Concerning the Content translation tool, you can find out more about it and refer to its help page (in French).



Scraping Babel is a very ugly hack. ULS is our main tool for language selection; it also uses Translate's "assistant languages" preference. Babel should provide an API and the merge of various sources should be handled by MediaWiki. --Nemo 06:51, 16 June 2015 (UTC)[reply]

Thanks for the pointer. This can become useful in the future iterations. Does ULS provide information about the proficiency in a language? Proficiency is easily identifiable through Babel templates. The other advantage of Babel templates for this stage of the experiment are that they are cheap to process. We need a subset of editors who know source and destination languages. As long as Babel templates give us some subset of that editor population, and as long as enough editors use it, Babel templates are good enough for this stage. Again, later stages may require fancier and more accurate steps. Thank you. --LZia (WMF) (talk) 16:22, 18 June 2015 (UTC)[reply]
@LZia (WMF): why not at least make sure we have proficiency in the destination language too! I got asked to write articles in French... --99of9 (talk) 03:23, 26 June 2015 (UTC)[reply]
Hi 99of9. Thank you for your feedback.
Here are the things we want to change for the next round (we have not settled about the details since we need to wait for more results to come in but they can still give you a sense of what we're thinking about):
  • put a minimum number of bytes added in at least one contribution in both source and destination languages. For example, we don't consider any user who does not have at least one 200+ byte edit in source and destination language.
  • we should not consider image link contributions.
  • we may want to ignore contributions that result in negative bytes added.
  • fix the issue with the way we read Babel. If a user in the source language had said that they don't speak the destination, we had not taken that into account. We will do this in the next round.
If you have any measures you think we should consider, please share them here. --LZia (WMF) (talk) 19:10, 26 June 2015 (UTC)[reply]

For reference[edit]

--Atlasowa (talk) 15:55, 16 June 2015 (UTC)[reply]

Thanks for the two pointers. These will help us when we work on the literature review or background section of the page, as well as learning from the past experiences. --LZia (WMF) (talk) 16:25, 18 June 2015 (UTC)[reply]

My feedback[edit]

Just received your mail. Nice idea. I so wanted to do something similar on itwiki. I suggested to create a "community project" to handle "initatives" like that but so far no luck... I am availbale to do something with itwiki users if you want.

Now, as far as I undestand and I have studied, the method you are using to target users is not perfect, it shows "false positive" like me. I am active on frwiki and enwiki but not very recently in ns0. You should focus on users with at least 100-200 edits in ns0 in the last 6 month in the destination wiki and the wiki of the language you want the translation from. I see you use the "babel template", that's no very effective IMHO. Edits on ns0 are much more effective to show the real interest of an user. If you want, you can filter those with the babel template parameters, or you can just look for users who have a certain amount of ns0 edits and have created a new page in the source language too. Creation of a page is the real funnel, there is no point in inviting users to create pages if thy haven't done it in years. People don't change so often.

No idea how you selected the target articles. Whatever the method was (number of languages on wikidata , visits...) that's probably ok as astarting point for a first test.--Alexmar983 (talk) 19:10, 25 June 2015 (UTC)[reply]

In any case, I think that a message in my talk in frwiki would have been perceived more "friendly". You did it for a community, there is no reason why these messages shouldn't be "open". The few times we posted "mass" messages on users talks on itwiki reactions were always positive, so I wouldn't use the email.--Alexmar983 (talk) 19:13, 25 June 2015 (UTC)[reply]

I would also suggest a "reward". Examples: access to some free biblio database, or a barnstar at the end for the most active translator or a free travel to a wikimedia conference. That's how i would I have done it if i had the chance.--Alexmar983 (talk) 19:20, 25 June 2015 (UTC)[reply]

Hi Alexmar983.
Thank you for putting the time to try the recommendations and writing to us. These are valuable feedback. My responses below.
  • Re itwiki test: If you'd like to give it a try, you have our full support. We would be happy to prepare recommendations from en->it or another language pair of the community's choice. We are improving the algorithm and how we do things each time we run a test. More feedback will be certainly valuable. Thank you for offering that.
  • How we chose users for this test is explained here. We will make an update to the Evaluation section by the end of the day (PST) about what we have learned so far from this test and how we are considering to change the way the users are identified in the future test. In the way we're thinking right now, we are not considering to filter users by high number of edits (100-200). I have made a note of that for us to consider. The issue with that is that we will be filtering newer users and those who have not been editing regularly in the past year but may be willing to do a translation aggressively if we choose a 100-200 edit count minimum.
  • We explain how we choose the missing articles here.
  • Thank you for your feedback about the email versus talk page options. Not sure if you've read my earlier comment in this page about why we chose emails, it's definitely not ideal but privacy was the biggest item that sent us to a non-public venue. We had to make a call which one to go with and for this test, the assessment was that it's better to go with emails.
  • The idea of testing for the impact of rewards is very interesting, for this and other things editors do on Wikipedia. I know there are other people interested in this topic as well, for example, Halfak (WMF). I'm happy to chat more about this once we have a better hold of how well the algorithm works.
Thank you again for your time and feedback. --LZia (WMF) (talk) 20:07, 26 June 2015 (UTC)[reply]
about the edit in ns0... Incola informed me few minutes ago the server is shutting down too long queries. A faster alternative is to focus on AV/AP (autopatrolled) users. Much faster, and they are a more qualified group in many cases. Some of them are inactive, but if they show again to try the tool it is ok, I guess. I am helping in setting up the CT tool on itwiki today. We are inf act preparing a list of target users to inform about about the new tool, that's similar as a target, just a step behind in the process. It is on Elitre's page here (hidden), I'll let you know the details once we have finished.--Alexmar983 (talk) 21:44, 26 June 2015 (UTC)[reply]
I am looking at the user-article match. Nice. In my case it has no sense, but simply because of my non-linear contribution ... I always do new things to experiment, so my 5 options are totally random and 4 out of 5 sound quite boring to me. For a generic user, focused on 2-3 main interests, I would expect 3 matches out of 5. Even if they don't translate anything, how do you check how many of the 5 options were at least inetersting for the target users?--Alexmar983 (talk) 22:21, 26 June 2015 (UTC)[reply]

Choix des articles et des auteurs possibles[edit]

J'espère simplement que lorsque vous avec sélectionné un article, vous ne proposez pas sa rédaction à plusieurs auteurs possibles, ce qui risquerait de provoquer des quiproqos mortifères.Gilles MAIRET (talk) 00:13, 26 June 2015 (UTC)[reply]

Quick translation: "I just hope that when you have selected an article, you do not suggest it for translation to several possible users, which could cause deadly quiproqos." Trizek (WMF) (talk) 09:11, 26 June 2015 (UTC)[reply]
Thanks for the translation Trizek (WMF).
Hi Gilles MAIRET. Thank you for your comment. I confirm that each article is only recommended to one user. We share your concerns regarding multiple people starting to contribute to one article.
LZia (WMF) (talk) 15:44, 26 June 2015 (UTC)[reply]

In your selection of 5 articles, none or only one could benefit of a French page[edit]

Hi, I am pleased to be selected in order to increase Wikipedia coverage. But in the examples that you sent to me, I would not have spontaneously created such a French page, either because the title is not appropriate or belongs to a broader title already treated on the French Wikipedia, or because the person is unknown in France. This person might be known in an other French speaking country, like Quebec for instance which is close to US. But then, I am not the right person.

  • I could give you more details on myself, for example, the country where I live. Or you could give us the possibility to try another run of 5 new articles.
  • If I was entering your creation process, I would be obliged to add the horrible template {{Traduction/Référence|en which gives Cet article est partiellement ou en totalité issu de l’article de Wikipédia en anglais intitulé « XXX XXX » (voir la liste des auteurs). See Wikipedia:fr:Modèle:Traduction/Référence. Fortunately, I am discovering that the English template (This article has been translated from the French...) has been banned in English, see Wikipedia:Template:Frenchtrans&action=history.
  • I think that the power of Wikipedia is to keep variety in the knowledge. If a new English Wikipedia was starting now, main articles would be presented differently. This is the gain that we have when reading the same page in different languages: diversity! Please keep diversity and bannish this template first {{Traduction/Référence|en !
  • When I say obliged to, it means that as I believe that the subject cannot pretend to a French page, my work will be reduced to a pure useless translation. Plenty of such pages did florish recently on the French Wikipedia, and they are not natural at all and completely useless because you can't improve them. If it was: "Apple is a fruit", we could start improving it, but it is an already fixed page. A link to Google translate could be more appreciated because the form is not important here but the information itself.
  • In conclusion, I would be very interested to get information, say every month, of new pages to be created on the French Wikipedia using your algorithm, but don't be sad if only 5% could be retained. That's normal for me.--Nbrouard (talk) 07:28, 26 June 2015 (UTC)[reply]
You are welcome to get the useless templates deleted in more wikis. They have been superseded in 2009, when Terms of use#7c was introduced. --Nemo 09:44, 26 June 2015 (UTC)[reply]
Hi Nbrouard, I think that the task of determining which articles should be translated is one that cannot be done entirely algorithmically. It requires the judgement of an editor. There are 4 million articles in enwiki that do not have a corresponding article in frwiki. The vast majority of these have no place in frwiki. But some small fraction of them do. How do you find the needle in the haystack? The goal of our article selection algorithm is to create a smaller pool of articles that are missing, with a larger fraction articles that would be useful in frwiki. We chose articles from this smaller pool of articles to send to users. Even the creation of these articles must be subject to an editor's judgement, but hopefully enough of them qualify for translation for the recommendations to be more useful than manually going through the list of 4 million articles. I am glad to hear that you are still interested in receiving recommendations in the future. Our goal is to add a feature to the ContentTranslation tool itself that shows users possible articles for translation. The system will also incorporate editor feedback saying that an article in not worth translating. --Ewulczyn (WMF) (talk) 18:53, 27 June 2015 (UTC)[reply]
That feedback channel (on worthiness) shall accomodate (technically) and encourage to provide reasons: poor quality in the source wiki, low estimated interest in the target wiki, redundancy (with link). --Rainald62 (talk) 22:54, 27 June 2015 (UTC)[reply]

Cool email[edit]

The selection of translator recipients probably needs some more work, but I think asking wikipedia editors or users to contribute is great. It would be even better to include some links to the content translation page to help the translators instead of freaking them out. Here's a few suggestions:

Hi Chimel31. Thank you for taking the time to going over the recommendations and your feedback. It's greatly appreciated.
Re link to the CX tool, do you mean something like this link which was already in the email footnote (but it is in English), or a kind of tutorial page, for example? --LZia (WMF) (talk) 01:41, 27 June 2015 (UTC)[reply]

1) I want to be absolutely certain when I select an article that no other translator is working on it, or that no other translator will take it over for, say, one month after my last translation draft edit, because that's about the time it may take to perform some research on translated references to link, plus our own available time to translate. On the other hand, all translators would welcome some help, but it should be synchronized first with the first/main translator, so a new translator does not randomly work on a section that the original translator is working on. You may also want a system where the translator helper is submitting his draft to the on-going translator for review and for the finishing touch to maintain the same style and integrity, keeping all credits as due, of course. But all that seems a bit more complex than simply blocking the translations for a while, and 2 translators attempting to work on the same article at the same time will probably never happen, except maybe for timely articles related to something in the news, or a newly broadcasted TV show episode, etc. I assume you already lock the article translation to the first localizer that started it, but it's not explicitly mentioned, and I don't know for how long. For instance, if someone starts a new translation with good will but becomes discouraged or is killed (tough competition among localizers ;) before finishing it, the draft should be made available to someone else, with the possibility to revert it if the initial translation draft was not helpful, or if the source language article has changed since the first translation draft.

I understand this concern. Each article is recommended to only one editor for the purposes of this test. You can take as long as you wish to finish the translation/creation of the article of your choice. Just keep in mind that we won't have control over other editors, by accident and without recommendation, choosing the same article and starting to create it in parallel. The chances of this happening is low given the total number of articles created in French Wikipedia everyday but it's still a possibility. If this happens, CX will allow you to publish the article under your user namesmape.
I very much like your comments about possibilities of collaboration between translators. The CX team has a lot on their plates right now but they would probably be interested about this given that they have already thought about the issue of publishing an article that is already created by someone else. Feel free to communicate this directly with them. --LZia (WMF) (talk) 01:41, 27 June 2015 (UTC)[reply]

2) It would be great to add missing articles to our own customized list of suggestions of articles to translate: If an article on one subject matter links to another related article, the same translator might want to translate that missing linked article to. Similarly, it would be great for existing translator to flag on the fly any missing article that they encounter for their language(s), so that they may localize them later if they want to.

Yes, this would be really nice. It requires more engineering and design effort to get it right, if it works well, it's certainly great. This will allow humans to fill in the areas that machines/algorithms lack and vise-verse. --LZia (WMF) (talk) 01:41, 27 June 2015 (UTC)[reply]

3) How do we edit references in the content translation page? Could they at least all be changed to add the original language information at the end of the reference, like " (en anglais)" (meaning "in English" in French). If the translator manages to find an equivalent localized link, how do we add it, should we publish the translated article first and then edit the article with the usual edit tools? If a localized equivalent reference can be found, I would suggest not to replace the existing one, but to insert the localized reference before the English (or source language) one, as a second separate reference (with the wording " (en anglais)" as suggested above.)

4) Standard sections such as "References" are not automatically translated. I think you probably have these translated names somewhere, it would be great to use them for standardization, to avoid introducing translations that do not match the standard wikipedia structure for that language.

5) There are some tags at the bottom of the article, but it looks like only the ones that have a translated equivalent are present. All other tags are missing, and there are no way to add translated tags in the content translation page.

6) When clicking on a link in the left pane (English source article in my case), wikipedia tries to open the English article from the French wikipedia. It would be great to open the proper English linked article from the English article. I get by by opening the original article in a separate browser tab, but it's not as efficient.

7) The Contributions|Translations home page should not start as a "Start a new translation" project, I just expect to see my contributions, and there's already a big oblivious "Start a new translation" button at the top.

8) I haven't yet finished to complete my first article translation, will I automatically receive notifications if the source or target languages articles are modified? How do source text revisions appear in the content translation page, is there a way to easily see what has changed since the last translation, edit the translation with the corresponding change or just flag it as correct if the change was just a typo fix that does not affect the translation.

Comments 3-8 are best answered by @Amir E. Aharoni since they are CX specific. --LZia (WMF) (talk) 01:41, 27 June 2015 (UTC)[reply]

9) Any way to like or unlike a translated article to kind of mark which articles might need not just a translation, but also a review if a translator regularly produces low quality work? Localization requires many linguistic, editorial and other knowledge skills, I'll be the first to admit I lack competence in some of these areas. Or maybe a checkbox for the localizer to mark the translation as requiring a review, which can be performed by other contributors from that language, not necessarily translators.

We want this to happen. The CX team has already done some ground work on this area. Check out The idea from the research side would be to collect editors' feedback and feed it to the algorithm for it to learn more. --LZia (WMF) (talk) 01:41, 27 June 2015 (UTC)[reply]

10) I don't think new translators should allowed in the content translation page without reading some localization guidelines page first. Edit: There are some localization guidelines links, but they are not (all) accessible when translating.

11) Is there a way in my talk settings to automatically add my name instead of being forced to type the 4 freaking tide characters that may not exist on all localized keyboards? I talk too occasionally to remember the stupid rule and forget it most of the times. But that's OK, wikipedia logs me out automatically after a while (timed cookies), and I never notice it, so most of my contributions are not under my name anyway, just my ever-changing IP address... End of rant.

Sorry. I don't know the solution to this and it happened to me, too. Someone may suggest it as soon as I save the page. --LZia (WMF) (talk) 01:41, 27 June 2015 (UTC)[reply]

12) This page logged me out. Looks like there is a timeout value effectively restricting the comment to just a couple of paragraphs. Grrrr again...

Chimel31 (talk) 07:31, 26 June 2015 (UTC)[reply]

Hi Chimel31,
3. How do we edit references in the content translation page? - it is not currently possible in the translation interface. It is primarily intended at creating the basic translated article, and editing it further is possible in the usual wiki syntax editor or Visual Editor. We may add the ability to edit references in the future.
4. Standard sections such as "References" are not automatically translated. I think you probably have these translated names somewhere, it would be great to use them for standardization - we don't have such a think currently, but this sounds like a brilliant idea, and pretty easy to add. I filed this as a task at - we'll try to do it some day.
5. There are some tags at the bottom of the article, but it looks like only the ones that have a translated equivalent are present. All other tags are missing, and there are no way to add translated tags in the content translation page. - yes, we are only able to add templates that already have corresponding templates in the target language. As above, we only focus on translation at the moment. We may add other editing features in the future.
6. When clicking on a link in the left pane (English source article in my case), wikipedia tries to open the English article from the French wikipedia - yes, this is a known bug. I hope to get it fixed very soon.
7. The Contributions|Translations home page should not start as a "Start a new translation" project - it's a part of another experiment of redesigning the Contributions page. User:Pginer-WMF should have more details about it.
8. I haven't yet finished to complete my first article translation, will I automatically receive notifications if the source or target languages articles are modified? - not currently. Content Translation sticks to only one revision, when the translation started. We plan to make it smarter in the future.
Thank you very much for the feedback and the suggestions! --Amir E. Aharoni (talk) 10:36, 11 July 2015 (UTC)[reply]

Selection algorithm[edit]

Thank you very much for this interesting project. I would be happy to do some translation work.

The proposition of languages (English to French) was perfectly relevant for me. However, the selection of articles was a bit disappointing. As an amateur of fine food, I must say I am slightly skeptical about the Horseshoe sandwich. I also do not have any knowledge in chemistry that could help me to translate Lead(II) bromide.

I do not contribute very often but I sometimes correct spelling mistakes in random articles. I also correct links pointing to disambiguation pages (which I select from the list Projet:Liens_vers_les_pages_d'homonymie and do not choose according to my personal taste). So the last fifteen articles which I edited do not reflect my interests. Maybe you could not take into account the articles which have only been subject to minor edits? In my case, it would also have been relevant to put more weight on the articles which I created myself; they are more connected to the topics I am interested in.

Thank you anyway. Do not hesitate to send me another five items!--Racinaire (talk) 08:18, 26 June 2015 (UTC)[reply]

Hi Racinaire, thank you for you comments and suggestions. I'm sorry to say that the Horseshoe sandwich article was an accident! We did not take the topics you have edited into account to generate your recommendations (some emails contained personalized recommendations, some emails contained recommendations with articles that were predicted to be widely read). Our personalization method takes into account the past 15 articles you have edited to which you have contributed at least 100 bytes. This should filter out most minor edits. I think the idea of weighting articles that you have created higher is a great idea. --Ewulczyn (WMF) (talk) 19:27, 27 June 2015 (UTC)[reply]


I agree with several of the above messages:

  • I don't really mind receiving such messages in my e-mail inbox, but I can understand that other people take issue with that.
  • Of the five articles that were suggested to me, none were relevant to my interests in the slightest, and none was important enough to warrant a translation more than any other random article. It was quite baffling, actually.
  • It is annoying that the links lead directly to the new translation tool. I do not use it and I would rather have direct links to the original articles.

The basic idea is interesting, but the execution is lacking a little so far. I hope it gets better with time. Ælfgar (talk) 09:11, 26 June 2015 (UTC)[reply]

Hi Ælfgar
Thank you for your comment and sorry if the recommendations didn't meet your expectation. We're working hard on it and your feedback can help us improve faster. Thank you for that. Below are my comments regarding yours:
  • Thank you for bringing that up. We provided opt-out option for future communications in the email. However, I'm committed to find a better solution for those who do not want to be contacted by email.
  • Two comments here.
1) I do not know if you where in the group that received random recommendations or personalized recommendations. I cannot check this now because it can impact the result of the test in ways I may not fully have control over. I'm sorry if the recommendations disappointed you. Do you think there is a way we can improve the email to reduce this issue? The wording around "history of contributions" should be improved (though history is a general term but it may be interpreted that the recommendations are personalized).
2) Regarding the importance of those articles: the editor should make the final call if they are important in the destination Wikipedia. According to our algorithm the recommended articles are predicted to receive non-negligible page-views if they are created. However, not every article that is available in another language and is predicted to have reasonable pageviews belong to the destination Wikipedia.
  • Do you have recommendations how we can do this without using ContentTranslation tool? As I mentioned in another comment on this page, we initially considered recommending redlinks, but that requires translating the title of the article from source to destination on our end and that does not scale for large number of articles.
Thanks again for putting the time to write your feedback here. We appreciate it.
LZia (WMF) (talk) 16:00, 26 June 2015 (UTC)[reply]
@LZia (WMF): Do I understand correctly that some users received random recommendations with the email saying that these recommendations were personalised based on their contribution history? — NickK (talk) 14:43, 27 June 2015 (UTC)[reply]
Thanks again for putting the time to write your feedback here. We appreciate it.
  • I suppose the best way to contact Wikipedians for Wikipedia-related stuff is to use their talk pages on Wikipedia.
  • If the recommendations I received were random, that would explain a lot. But then, I don't see the point of suggesting random pages to random people. What are the odds they'll be interested in them?
  • Page views are a tricky indicator, easy to manipulate and hard to decipher: I remember that the article which received the most hits in 2012 was Houx crénelé, for some reason. I think the assessments made by the various Wikiprojects are a better indicator of "objective" importance.
  • In my opinion, the message should provide the link towards the original article as well as the ContentTranslation one. The person receiving the message should know well enough how Wikipedia works for that not to be a problem.
That's just my two cents. I really like the idea and I hope it turns into something better in the long run. :) Ælfgar (talk) 09:14, 28 June 2015 (UTC)[reply]

Oui mais non[edit]


Merci pour votre initiative qui, j'en suis sûre, a illuminé la journée de nombre de wikipédiens.

Toutefois, en ce qui me concerne, veuillez noter que je pense que :

  1. Tout e-mail automatique non sollicité n'est rien que du spam, et que les spammeurs, on les flanque en prison.
  2. Il est hors de question que je fasse quoi que ce soit pour me désinscrire d'un programme auquel je n'ai jamais donné d'accord actif pour m'inscrire. Vous avez su mettre mon email dans votre liste de victimes, vous saurez le sortir.
  3. Fouiller dans les contributions d'un utilisateur pour créer un algorithme (déficient de toute évidence) ciblant ses supposés domaines d’intérêts se rapporte à une intrusion intolérable dans sa vie privée, comparable au délit de faciès : je ne cache pas mes contributions, mais en AUCUN CAS elles me définissent.
  4. Un contributeur "prolifique" n'est pas forcement un rédacteur. Pour ma part, je ne fais quasiment que de la maintenance, ce qui peut expliquer que votre algorithme soit bon pour la poubelle.
  5. Un contributeur qui indique sur sa page personnelle dans commons (et pas ailleurs, ni sur WPen ni sur WPfr) que son niveau en anglais est basique ne devrait même pas être contacté.
  6. L'outil de traduction est tout sauf au point.
  7. Je souscrit totalement à ce que déclare Soboky sur le bistro de WPfr :

    Pourquoi partir du principe que l'amélioration de l'encyclopédie se fait nécessairement par la création d'article (c'est à dire du chiffre) plutôt que par le désébauchage (c'est à dire du contenu) ? Pourquoi considérer éternellement que tout ce qui est sur la WP:EN aurait sa place en francophone ?

  8. Les articles ciblés sont d'une qualité plus que fluctuante :
  • Thar coalfield : 3 liens morts (non liés au texte) sur 3 proposés en "référence". WPen et WPfr ne doivent pas avoir le même point de vue en matière de référence. Sur WPfr, il est considéré que les références doivent être complètes, accessibles et liées au texte.
  • Marine Corps Base Camp Lejeune : 2 références rien que pour sourcer la prononciation (c'est vrai que c'est stratégique comme information concernant une installation militaire), plusieurs références incomplètes et/ou inaccessibles, plusieurs références sur des polémiques, mais la partie traitant réellement de la base est sous-développée et sourcée uniquement par le site officiel.
  • Auditory brainstem implant : Il y a un gros bandeau pour dire qu'il y a un problème de source, mais c'est pas grave, parce qu'il est "important pour les autres langues du projet".
  • Inglourious Basterds (soundtrack) : outre 5 sources de "reviews" sur les 6, je ne vois vraiment pas comment une musique de film peut être considérée "comme étant importantes pour les autres langues du projet", mais vraiment, ça m’intéresse de savoir comment vous avez déterré des trucs pareils.
  • Information cascade : alors là, chapeau, elle est extra celle-là, il faut vraiment (et sans ironie) qu'elle soit traduite. Je ne vois pas comment la nommer en français (mon niveau en anglais étant, je le rappelle, "basique"), mais on pourra toujours la sourcer avec cette page, non (là, il y a de l'ironie, mais juste un peu) ?

Pour finir, venir me dire que je dois m'exprimer "en anglais de préférence, même si nous trouverons certainement un traducteur si vous nous écrivez en français" dépasse les bornes des limites de l'acceptable. Votre vision de la version anglaise éclairant le reste du monde est le cliché le plus typique d'une culture qui se considère supérieure. Je ne suis habituellement pas aussi virulente dans mes réactions, mais je vous suggère d'aller solliciter les contributeurs de la WPen pour qu'ils traduisent des articles en provenance des autres WP, et d'attendre sagement de voir ce qui va se passer.

Sur ce, je vous souhaite un excellent WE, pleins de contributions importantes pour toutes les langues du projet.

Amicalement. --Indeed (talk) 11:50, 26 June 2015 (UTC)[reply]

Je rejoins pleinement l'avis d'Indeed, l'ordre de traduction que j'ai reçu concerne des articles que je ne traduirai jamais même avec un flingue posé sur la jugulaire (sur 5 articles, 5 qui concernent des aspects parfaitement secondaires et marginaux de la politique de pays anglophones : eh, bravo ! ça c'est de la priorité ! c'est paaas du tout une vision dominatrice du monde civilisé anglophone sur le monde arriéré qui a le culot de parler d'autres langues et de ne pas avoir les articles de en.wp !. Je m'occupe d'une langue dont la maîtrise est extrêmement rare sur fr.wp (le turc) et quand on sait de quoi on parle, il n'est absolument pas nécessaire de se faire aider pour pouvoir identifier les articles importants à traduire (ou tout simplement à créer ex nihilo vu leur état parfois lamentable en langue d'origine). Mon travail d'édition, gratuit et désintéressé, doit me laisser quand même le choix de choisir mes centres d'intérêt moi-même, de choisir les articles à créer/traduire moi-même. Envoyer des demandes de traduction sur un mail personnel, non mais vous avez vu ça où ? JE NE VEUX PLUS RECEVOIR CE GENRE DE MERDE SUR MA MESSAGERIE. Kumʞum quoi ? 13:11, 26 June 2015 (UTC)[reply]
I also agree with Indeed. I sum up what she said:
1+2. Direct emailing is bad. It's like spam to some of us. Please stop. You should have written on our discussion pages instead, or even better, here. The reactions would have been less violent.
3a. Datamining us is a bad thing. We volunteered, but not for this!
3b. The datamining algorithm is far from effective. Most of the plaintifs don't even know why such articles were sent to them.
4. Some active users are NOT writers. Some of them do maintenance jobs, others produce graphics, etc.
5. Asking self-proclaimed basic-english-speaking-users to translate articles is a nonsense.
6. The translating tool is not ready.
7a. Quantity doesn't imply quality. Some users prefer having few articles of good quality rather than many articles of bad quality.
7b. Why considering all WPen articles should systematically be translated into other languages?
8. Selected articles are often not good enough to be transalted. Some have dead links, or lack of sources.
Epilogue. Why don't you ask English users to translate articles originally written in other languages? The English language is not the belly button of the world. WPen neither is. --Flappiefh (talk) 16:11, 26 June 2015 (UTC)[reply]
Furthermore the research protocol sucks: what does mean a "comprehensive" encyclopaedia? The "next million articles"? How did you find that translation is the way to reduce the "gap"? And what is this gap? And in which way is it a "dramatic" problem? And what is this problem in the context of linguistic imperialism? Not a SINGLE WORD about this fundamental questions in a 3-lines intro! An undergraduate in a non-prestigious university in France would get an F with an indigent intro like that whatever follows. Then? Ok, the methodology :
1) Identify missing pages: that confirm my first though about "English is comprehensive and others are not" ;
2) Evaluate them: do you really find you can evaluate quality from quantitative data? Great! ;
3) Identify the potential translators: the worst part, you clearly identified my fields (I write about politics), what I got? 5 page to translate about politics in the English-speaking-world! Is that "random"? Hey I'm curious who allowed you to use this kind of data?
4) Match them by sending by mail a translation order: genius. Only genius.
That's only a data-trip, that's nothing about research in social sciences, about ethics, about translation, about engagement, about choices which determinate in a first way who writes what and in which language. I translate from Turkish (and some other languages) to French, I'm not a professional translator, I do it on my free time, I don't have any interest (money, prestige or academic position) to do it, but I do it. I chose myself the missing stuff, I translate it or write from nothing (if you think no-fr.wp pages are a priori better and more referenced contents...) Who found that a contributor whom Babel list has En-2 or En-3 has enough capabilities to translate? Then, since when the WMF has editorial prerogatives? Since when the WMF doen't give a single shit about privacy (mail addresses and individual edition fields)? I don't know if an award for The worst resarch in social sciences exists, but you would earn it very easy! I'm not a systematic criticist, I am well placed to know how research is unsure, has its own risks especially if results provoke surprise or indignation.
You don't took enough time, you did not ask us, you did not explain your choices, you did not define your role, you did not give a shit about context and other academic views (data is great, but if you think it is above psychology, sociology and politics, you're far, very far from any scientific, and here really "comprehensive" comprehension of the problem). You could recieve "constructive remarks" or "comments" or (why not) "greetings", I don't know if one of the scientist (how is that possible...) seen the French Bistro, it is very unusual to see such consensus against something! Bravo! On the line of "Excellent research -> Pile of shit", you are far from the middle and in the wrong side. Please stop this joke. Kumʞum quoi ? 16:40, 26 June 2015 (UTC)[reply]
Wahou ! What a shitstorm (lol). Beside the fact that it was a really bad idea to contact us via mail (claiming that it's for privacy reason !) without asking first, if I'm more or less able to read, understand and write in english it doesn't mean that i'm able to translate a full article. I've done it one or two times, it's really not an easy way for me to contribute to a project in which I still believe, even if sometimes some of the other contributors disapoint me just like the 4 of you who have decided that my contributions are at your service. Thank you to Kumkum, Flappiefh and Hadrianus, I feel less alone in front of this road roller ! Sorry for my bad english, but I've warned you... --Indeed (talk) 17:16, 26 June 2015 (UTC)[reply]


Surtout qu'aucun d'entre vous qui avez commis ce projet ne se donne la peine de me répondre. Après tout, la politesse et le respect d'autrui et de la communauté ne sont que des conventions sociales, et vous êtes au-dessus de cela.

Puisque manifestement mes commentaires ne sont lus que par les autres victimes de votre expérimentation, je vais en rajouter une couche, j'ai toujours adoré me donner en spectacle. Votre rajout sur la page du projet intitulé French Wikipedia Test: Lessons Learned (Draft) me donne furieusement l'impression que la "leçon" n'a certainement pas été apprise/comprise. Vous passez soigneusement sous silence le point qui a rassemblé la majorité des feedbacks négatifs, à savoir le spam auquel vous vous êtes livrés. Et ne venez pas me dire que ce n'était pas du spam parce que je ne sais pas trop quoi (je n'ai pas compris l'argumentation développée plus bas), ou que vous vouliez respecter une quelconque privacy consideration alors que dans ce paragraphe vous expliquez bien que vous fouillez dans nos contributions.

Il n'y a AUCUNE remise en question sur la construction de votre projet. Parce que je suis presque de bonne humeur, je vais écarter le fait que votre point de départ est de lisser le contenu des WP à partir de la WPen (j'ai bien aimé le terme de "belly button of the world", merci Flappiefh). Voici une petite liste de ce qui vous aurait évité bien des problèmes :

  1. Plutôt que de parler de "gap" entre les différentes versions du projet, parler d'une amélioration générale en faisant bénéficier chaque version de l'expérience et du contenu des autres versions.
  2. Contacter le projet:traduction de chaque version et se renseigner sur ce qui existe déjà (par ex, pour fr : Wikipédia:Articles à créer/pages demandées les plus liées au modèle Lien). Personne ne vous a attendu pour avoir l'idée de faire des traductions ni de se coordonner.
  3. Annoncer la mise en place d'un "premier essai-test" de WPen vers WPfr
  4. Contacter les contributeurs sur leur pdd pour leur proposer de s'inscrire à l'envoi par mail
  5. Prévenir la communauté par le dépôt d'un message sur le bistro, en s'assurant qu'une personne est bien prête, sur chaque version, à répondre/aiguiller/accompagner les contributeurs se manifestant.
  6. Proposer de "s'inscrire" sur un sujet précis ou à l'algorithme (vu que la fouille des contributions n'est pas super-super efficace, ce qui est tant mieux pour nous et tant pis pour vous).
  7. Prévoir une page de feedback dédiée aux articles proposés à la trad (permettant de préciser par ex pourquoi l'article ne doit pas être traduit : respect des CAA,...).
  8. Écarter les ébauches, les articles sans source, ceux provenant d'une traduction d'une autre wp, les articles supprimés dans la WPcible, ceux faisant l'objet d'un désaccord... Bref, ne pas fouiller les poubelles de WPen pour soi-disant améliorer la qualité de WPfr.

Vous arrivez avec vos certitudes d'œuvrer "for the greater good", vous ignorez les feedbacks qui vous dérangent, vous ne vous excusez a aucun moment pour les erreurs (et j'ai bien dis erreur, pas faute, parce que je respecte encore AGF mais ça ne va pas durer) commises et vous allez tranquillement présenter à Wikimania 2015 les résultats qui vous arrangent de ce torchon que vous prétendez être de la recherche (ayé, AGF est mort, RIP).

Même si vous vous en fichez royalement, comme tout ce que j'ai pu écrire précédemment, sachez que votre mépris m'a tellement dégoutée que ma motivation à contribuer est au plus bas. Plus très amicalement. --Indeed (talk) 12:50, 1 July 2015 (UTC)[reply]

Big +1 with everything that Indeed just said, especially the part where you clearly ignore any feedback that is not going your way, not answering questions (like the one I asked in pinging you in "Use of mail-address", ... I repeat again that I've never given my explicit consent for receiving bulk emails, so don't do it again. --NicoV (talk) 17:01, 1 July 2015 (UTC)[reply]

Try again[edit]


Obviously, this project is a good idea, but doesn't work yet.

  1. To most of us, sending so many emails by bot is spamming. You should ask volunteers, first, or send messages on Projects talk pages.
  2. How did you select articles? I translated some, none of them related to the list I received. --El Caro (talk) 13:37, 26 June 2015 (UTC)[reply]
Hi El Caro. Thank you for your feedback.
  1. I understand the concern regarding emails. We did not want to expose the recommendations on talk pages since this was a test to assess how good the algorithm was doing and we wanted to be on the safer side (the algorithm could be inferring recommendations that may not be immediately identifiable by just looking at users' contribution history and exposing those publicly was not something we feel completely confident about.) I'm explaining this in more details here
  2. We explain how we chose the articles here.
Thanks again for your time and feedback. --LZia (WMF) (talk) 20:21, 27 June 2015 (UTC)[reply]

Some ideas for this line of research[edit]

Nice ideas, thanks for sharing them. Some thoughts on improvements:

  • Test language pairs in both directions. Only You can Stop Language Imperialism! If you're testing EN --> FR, also test FR --> EN (and either make it easy for anyone to try the experiment in their own language-pair, or try at least two lang-pairs yourself)
  • Start experiments gradually. Don't ping 1000 people on day 1: start with a hand-picked group, then increase each day while responding aggressively to feedback. This may mean that the first half of participants help tune the experiment, and only the last half of participants are part of the 'final' run, but this should hardly impact the results. The benefits from a better study should outweigh the decrease in participant size.
  • Test short messages. It's fun to try a long intro, multiple article suggestions, and lots of help text. But try also a brief suggestion on a talk page (which is how many existing requests to participate are done), at least for comparison.
  • Try out different ways of selecting articles. This method seems to suggest mainly low-importance topics, which may seem quite random to the editor. You might also link the editor to the algorithm that chose those topics :)

SJ talk  14:05, 26 June 2015 (UTC)[reply]

+1 on everything. On the last point, IMHO it's key to be explicit about what the algorithm does, rather than make value judgements. --Nemo 19:56, 27 June 2015 (UTC)[reply]
Hey SJ, you are completely correct on the points you raise. In terms of language pairs, we designed the system to be language pair agnostic (i.e. the system is fully multilingual). This test was done as a precursor to implementing the method within the ContentTranslation tool itself, where many language pairs are available. We chose (en, fr) for this test as a matter of convenience (large source wiki, smaller target wiki with CX enabled), but another pair like (es, fr) would have a better choice in light of your observation. In terms of staging the roll-out of the emails, you offer sage advice. We did an internal pilot with French and Spanish staff users. But we should also have sent out a much smaller batch of emails as a trial on frwiki. Many of the issues we are seeing in our method can be addressed with minor changes to parameters of the system. Also there were issues with the wording of the email that we could have fixed based on a trial. We debated between the choice of talk pages and emails for a while. Since some of the emails were personalized we thought it would be better to send them via a more private channel, like email. Testing this hypothesis via small talk page trial would have been a good idea. Thanks again for the advice. --Ewulczyn (WMF) (talk) 20:22, 27 June 2015 (UTC)[reply]

And some more of the same[edit]

It's not a bad idea and I don't mind having been contacted by email - in fact if you had left a message on my talkpage I would have been lucky to find it for some time as I'm not very active at the moment and am often logged out. If you'd left it on my French talkpage I may never have found it. So maybe those things are a problem in themselves in terms of who is being selected ...

Further feedback:

  • I don't speak/read/write French. Yes I have edited French Wikipedia, a total of 15 times, and solely to add/change photos in articles. The last time was May 2013 (so well outside the stated one year window).
  • If it's looking for people to translate from English to French possibly the email should have been in English? Or both English and French? Or maybe it's designed to weed out English speakers that don't speak French?
  • Surely the matching process (checking for matching usernames and matching email addresses) is going to be highly error prone for people like myself with global accounts?
  • The articles suggested were mostly well beyond my 'interest vector'. As far as I could remember I'd never contributed to, or even read, any of the articles (I know that wasn't a criteria for suggestions, but perhaps worth considering?), but I wouldn't have considered that I'd ever done much work on related articles either, and certainly not recently.
  • The articles suggested were of variable quality, ranging from quite long and seemingly okay, to articles of dubious quality, articles tagged with issues, and at least one that was little more than a stub. Is there some way of doing a quality check before recommending them?
  • I would suggest that maybe one, and perhaps two of the articles would be of much interest to French readers, and even then they would seem to be of quite limited appeal. The others seemed to be of quite low importance mainly of interest to locals (e.g., quite specific to either a US or UK audiences, and small audiences at that). Again can that be checked first?
  • I agree with a number of others that the link in the email should go to the article first so the user can look at and evaluate it. I didn't even know what it was when it took me straight to the content translation tool. Perhaps include two links in the emailed suggestions, one to the article and one to the translator?

Overall even if was going to or was able to translate some articles I wouldn't have had much joy with this process, and would have just sought out things for myself that I was interested in and that were of reasonable quality. Keep working on it, but clearly there's a number of issues in its current form. --Jjron (talk) 15:13, 26 June 2015 (UTC)[reply]

Hi Jjron. Thank you very much for writing your feedback.
  • We chose users using Research:Increasing_article_coverage#Identifying_potential_contributors for this test. Clearly, the bar was too low, and we ended up choosing someone like you by mistake. We're sorry about it. We have changed that process as explained Research:Increasing_article_coverage#French_Wikipedia_Test:_Lessons_Learned.
  • I understand. As you guessed, because the editor should have been able to write in French, we wanted the email to be in French. We could include English, that would make the email longer. It's definitely a consideration for future messages like this.
  • Now that we have updated the way we identify users, matching should be more effective, too.
  • For this test, we had to test if our algorithm has any advantage over random recommendations by selecting a subset of users as random recommendation receivers. I'm not sure if you were in that group but that could explain part of the reason your recommendations were not close to your interests. We have learned that randomization does much more poorly through this test, as explained more here (detailed analysis will be available at the end of the test and once we assess whether the test in Spanish is needed.)
  • We have received feedback that we should not consider any article with less than C rating. Maybe we can remove those with not that many references? Any other suggestions?
  • I see. so you mean by considering location based articles?
  • I agree with your point. I especially like your suggestion about having two links in the message, one to the article, one to ContentTranslator. That's a solution that does not require a lot of engineering resources for the purposes of the test but will make it much easier on the recipient end.
Thanks again for your feedback and the time you put in this. --LZia (WMF) (talk) 20:46, 27 June 2015 (UTC)[reply]
Thanks for your reply LZia (WMF). No apology needed. I'll just attempt to briefly respond to your queries. I'll hit a reply to the email as well including a link to this discussion, which may help you with deciding about whether or not I was in the random group re article suggestions.
I'm not sure I agree about focussing on the article 'ratings', i.e., nothing less than a C. Admittedly I haven't been editing much in some time, but when I was I found ratings were not always that accurate, at least IMO, and seemed to be rarely updated, which may be more of the issue in the sense that they went out of date. I'm almost thinking it needs manual oversight, but I guess that's what you're trying to avoid by coming up with the algorithm to identify articles - I suppose you're making a lot of suggestions to a lot of people, but could the articles be manually screened after being suggested by the algorithm? Maybe still too human intensive.
Not sure exactly what the algorithm currently looks at, but I would things to consider may involve size of the article, some minimum referencing, possibly inclusion of image/s, a consideration of ratings for 'quality' and/or 'importance', article visits... Maybe things could be weighted? I would suggest that categories could also be considered - for example I believe things like articles on animals and plants, even locations, tend to have relevance globally, while say the killing of a taxi driver during an 80s Welsh miner's strike (one of the suggestions I received) would be of limited interest to a French audience. Maybe that ties into the 'location based' thing (I'm not entirely sure what you meant by that), but while that article would tick a lot of other boxes - it's quite well written, well referenced, rated C-class in several projects although mostly tagged as 'low importance', I'm sure of interest to his local community at least - however analysing the categories of the article may help it be excluded as not of much interest to the French. Sounds good in theory, but I suppose it may be difficult to actually implement something like that.
Anyway good luck with further progress on this. --Jjron (talk) 14:37, 28 June 2015 (UTC)[reply]

Specific page to get articles[edit]

First of all, I have to say I'm sad to see so much aggressiveness in this page, I feel sorry for that. I think it's a very good initiative, even though there are many ways it could be improved. Thanks for your work.

As many, I don't think giving me a list by email is the best solution, for several reasons:

  • the email probably won't arrive when I'm available, so I might just ignore it
  • the list of articles offered to me to translate may not be of interest to me, and I probably won't be very motivated to work on these
  • There is no direct link to the selected articles, so I have no way to check the article first to see if I think it's worth translating

For me, the best solution would be a visible link on the wikipedia interface to a page with a selection of pages to translate, with a link to the original article and the translation tool. This would probably help me contribute more.

Thank you for your kind words. Its a good point that the emails are unlikely to have arrived when you were logged in and working on the site: another reason why a talk page message would have been the better choice. Your suggestion about having a link to the source article and a link to the tool is well taken. We should do this when we build recommendations into the translation tool. --Ewulczyn (WMF) (talk) 00:43, 28 June 2015 (UTC)[reply]

Une bonne idée mais...[edit]

Comme tous les autres intervenants, j'ai été supris de recevoir ce courriel me proposant de traduire des articles et du choix de ces articles. Mes remarques :

  • mon email n'est pas là pour recevoir du SPAM, quelque soit le destinataire. Si vous voulez faire une proposition, ma PDD est la page adéquate.
  • Babel : je n'ai pas indiqué que je parlais anglais dans ma boite Babel donc je suis étonné d'avoir reçu ce courriel. Ce n'est pas parce que j'ai quelques contributions sur en.wikipedia que je maitrise la langue.
  • Le choix des articles : parmi les articles que j'ai reçu, deux n'étaient pas éligibles sur fr.wikipedia, un est compris dans une autre page de fr, enfin deux autres articles avaient très peu de sources pour être considérés comme des articles intéressants à traduire. Ce qui est présent sur ne sera pas présent systématiquement sur donc vous devez en tenir compte.
  • Enfin l'envoi systématique vers l'outil de traduction au lieu de renvoyer vers l'article anglais est contreproductif, surtout que l'outil de traduction anglais-français n'est pas encore opérationnel.--Remy34 (talk) 19:46, 26 June 2015 (UTC)[reply]
Here's a quick translation for the project's members:
  • Email perceived as SPAM, prefer using users' talk page.
  • If a user doesn't specify he's English speaking (see his/her Babel box), don't send him translation jobs.
  • Some selected articles are existing as chapters of a bigger French article. Some others selected articles are stubs and/or lack sources. Some others are not relevant in WPfr. All that's on WPen doesn't have to be WPfr.
  • Adding a link to each English article would have been better. Arriving directly into the translation tool isn't satisfying, all the more because the tool is not ready yet. --Flappiefh (talk) 08:52, 27 June 2015 (UTC)[reply]


Hi there, another wikipedian not so pleased.

  • I didn't like receiving this e-mail (I have seen your apologies above), why don't use a sitenotice in order to recruit some beta tester ?
  • When I try to see the page you want me to translate, I click on the link/name proposed :
    • This is the translate for newbie tool, and guess what, it looks like the new tool called Visual Editor, make for people to edit wikipedia w/o the wiki code. I don't know how to use it. I don't wanna use it.
    • The click can't launch the tool, because the syntax is not correct : my list (Disclosure / Share Our Strength / Iron chelate / Joe Lonsdale / Bengali literature) end up like this in the tool : Iron_chelate (show me the list of 5 choices).
    • If maybe I want to help and translate, links opened this url : I can't even see what is the page you want me to translate, or have a context : I first have to translate the page title to access further. The link is on the "fr" wikipedia. How do I get to the original page ? I go back into the e-mail, get the page name, find a way to, search the page, find a disambiguation page =__="
    • I don't recognize me into these pages, why translate them ? I am a volunteer, I wanna make the wikipedia projet better, on my (so rare) free time. I wanna do it in the french language, with the good and bad there is over there, that I know. I don't believe en wiki is better, it's just bigger.

So, there am I, frustrated. I send a e-mail to a un-subscribe e-mail account, with unsubscibe into the subjet field (why not a url to click, that says : Hooo, I'm soooo sad you quit, feel free to come back. Any feedback will be nice : -liste of things-

I come here angry fisrt, then sad by all the feedback from deeply heart hurt people. Then I'm glad some tries to do things around, so I left a message anyway. In english... I'm sooo rusty : /

See ya — X-Javier [m'écrire] 18:40, 1 July 2015 (UTC)[reply]

Hi X-Javier. Thank you for writing your feedback. I am sorry that you are frustrated and hope the responses below help.
  • The reason is selection bias and not being able to control for that in small populations. Those who would choose to receive recommendations could have characteristics that could be different from the general population and we would not have a way for controlling that.
    • I understand the different preferences the users have. The best option would be to have redlinks to the new pages and/or a link to the CX tool. The issue with redlinks (which was our first choice) was that we could not translate a lot of article titles to make such links. We could just send the list of recommended articles, and link them to the enwiki article, but that wasn't good, too, since you may loose a lot of people if you don't provide a way for them to contribute from the email.
    • Thanks for pointing this out. I've made a note and we will look into this. It is confusing on the user end to figure out what we are recommending in this case.
    • This is a problem. At least one other user has brought this up as well. We should have provided the link to the enwiki article in the recommendation list, maybe next to the link that takes you to CX.
    • Yes, and we absolutely respect your freedom to choose what you want to contribute to. The recommendation should be considered as a recommendation/suggestion. Regarding the choice of English as a source language: I tried to address this in my village pump announcement. I'm happy to discuss this more with you if you have more questions.
There is at least one way we can improve this: If we go with "Here are the list of articles we recommend you consider creating/translating, and b.t.w., this article exists in the following languages." This way, if a user knew any of the source languages, (s)he could choose to look at those languages. The main bottle-neck here would be that we would need to have the article title translations.
Thank you again for writing X-Javier. I do hope that this response helps, at least a bit. --LZia (WMF) (talk) 21:27, 1 July 2015 (UTC)[reply]

Usage of email[edit]

« recommander par courriel des articles à créer… »[edit]

Thanks for your message on frwiki, but I don't want to be contacted by e-mail. I'll have to check my mail and I find this annoying. Yuk! Could you post recommendations on the target user's talk page? That's what it's made for. Neatnik (talk) 17:05, 24 June 2015 (UTC)[reply]

+ 1. The user's talk page may be better. TCY (talk) 21:06, 24 June 2015 (UTC)[reply]
Hi. Thank you for bringing this up. In order to choose the medium for communication, we considered four options: user talk pages, Echo notifications, Content Translation tool, and email.
We decided to cross off user talk pages for two reasons: 1) privacy considerations: although users' contributions across projects are public, the algorithm's results does not have to be. We are not manually checking/filtering what the algorithm recommends and you can imagine there are situations that users may not want to receive specific types of recommendations on their talk pages about, for example, topics that are taboo in their community/country. 2) to accommodate the needs of the test: we wanted for each recommendation to be only visible to the user it is recommended to. If another user sees someone else's recommendation, they may get curious and they may start the page earlier than the person the recommendation is sent to. This is generally a great thing to happen but for the purposes of the test, this could limit the number of data points we could eventually use in the analysis.
Echo notifications could be a natural solution. Unfortunately, customization of Echo messages for each user is not currently possible and given that each message will be different in terms of the recommendations the users receive, customization is a feature we could not give up on.
Content Translation tool itself is a natural place for these recommendations. We had three obstacles to use the tool for this test. 1) The Language Engineering team is very close to having recommendations possible in the tool but they are not there, yet. 2) The number of users in each language that use the tool is limited and we would not have enough data for our analysis. 3) Echo notifications were needed from within the tool to let users know there are recommendations available. This feature was not available in the tool when we were considering different options.
In general, when possible, these kinds of tests should happen through venues that work smoothly with the work-flow of a user on Wikipedia. Just to emphasize, we are using email for the purposes of the test only and due to the limitations explained above. Once we know the algorithm works, we will spend more resources on it and one of the priorities would be to make sure the recommendations blend in the work-flow of users.
I hope this information helps. If you still prefer not to receive an email, please let me know and I will do my best to exclude your email from the list. I can do this for a handful of requests, if there are more requests, we should aim for asking the editors to opt out from future research communications via the option shared in the email. If you decide to stay in the list and receive an email, we appreciate your willingness and understanding.
LZia (WMF) (talk) 05:08, 25 June 2015 (UTC)[reply]
I just received your email about the program. First of all, even I can understand the objective of the project, I must share with you my first feeling, which was quite bad : what kind of intrusion into my privacy ! Ok, it's just the 1st feeling. But I think it could be interesting to explain the project better than you made, and find something to reassure the editors you choosed. My 2d point is a question : can you explain what is the process to choose the editors you contacted ? Are they choosen because they frequently translate articles from EN to FR or because of an other reason ? I'm not a very active editor and was very surprised to be contacted by the WMF. Thank you in advance for your responses, and I wish you success for your project. If you are interested to measure the success of the algorithm, I can say to you that it was well targeted, but not *very well* targeted : the articles suggested to be translated are just near my scope, not exactly in my scope (if we can say that I have a clear scope). Best, --Manastirile (talk) 23:34, 25 June 2015 (UTC)[reply]
Hi Manastirile
Thank you for putting the time to write to us. It's appreciated.
Sorry that you felt uncomfortable with the email. That was not intended, I apologize for that. Do you have recommendations about how we can improve the email itself? That can be really helpful. Our main goal was try to put the information needed in the email, but still keep it relatively short. I'm sure we could do better, so please help us with that.
Re how users were identified: We have explained the process used to identify users in here. There are definitely things we want to improve in that approach: putting a bar on the minimum number of bytes added in both source and destination languages and excluding image links from the list of contributions we consider are two examples. Do you have other recommendations we should consider?
Thank you for your time. --LZia (WMF) (talk) 19:24, 26 June 2015 (UTC)[reply]

use of mail-address[edit]

I'm surprised and confused that you used my mail-adress (besides of talking french to me, a language that I don't understand). This kind of usage is not mentioned in the user preferences. Did you check if it is covered by our Privacy Policy? It might be a question of interpretation, but I for myself don't like to see my mail-address used by a Stanford University reseach without being asked and without any opt-out option. Alice Wiegand (talk) 16:50, 26 June 2015 (UTC)[reply]

Hi Alice Wiegand. No email addresses were given to external persons. Bob, who is a researcher on this project is a research fellow at the Foundation and has signed an NDA, Ellery and myself are WMF employees. Jure does not have access to non-public data. All email addresses were kept on WMF servers. The opt-out option for research contacts right now is to send us an email to I would like for us to work towards a better opt-out solution and I'm committed to work with the different interested parties to make this happen once we are done with this test. --LZia (WMF) (talk) 21:13, 27 June 2015 (UopyTC)
You are missing the point here. Looking at the discussions that are taking place about everywhere (local wikis, email lists, Twitter, etc), no one wants a "better opt-out solution", as you mention: the system should be either opt in, or not exist at all; any other solution seems to be a great way to alienate the community. Schutz (talk) 06:58, 29 June 2015 (UTC)[reply]
Same with me.--Mautpreller (talk) 19:01, 26 June 2015 (UTC)[reply]
+1. Bawolff (talk) 09:08, 27 June 2015 (UTC)[reply]
I disabled on every project the email-function of my account for certain reason. I only want to receive e-mails via Wikimedia software except in those cases wich I actively wish. It is a deliberate violation of the Wikimedia privacy policy to give my e-mail-address to the people of the research team. -- Andreas Werle (talk) 13:02, 27 June 2015 (UTC)[reply]
Hi Andreas Werle. I will make sure we look at the user preferences table to exclude those emails that have chosen not to be contacted by other users via email. Sorry that you received the email. We did not put this filter in place. --LZia (WMF) (talk) 21:13, 27 June 2015 (UTC)[reply]
As always, on this point the traditional privacy policy was way clearer than the recent one. --Nemo 13:37, 27 June 2015 (UTC)[reply]
What Andreas Werle tells us here is a very serious issue which should have instant consequences. See also Chricho's post some points below. However, the research team doesn't seem to care whether they violate Wikimedia privacy policy or not.--Mautpreller (talk) 17:13, 27 June 2015 (UTC)[reply]
Hi Mautpreller. The team has had Legal's approval for this test. You can read more about it in Michelle Paulson's response in here Andreas Werle has not requested her email to be deleted from the WMF databases from what I understand. He has asked not to be contacted by other users. I've already made a note that we should exclude such users from future communications. --LZia (WMF) (talk) 21:13, 27 June 2015 (UTC)[reply]
Why should it be necessary to ask for deletion? As soon as the e-mail settings are changed, WMF has no right to store the old address any longer. There is obviously something wrong with "user table", not necessarily with you.--Mautpreller (talk) 21:35, 27 June 2015 (UTC)[reply]
LZia (WMF) (or anyone with the appropriate background) Can you explain on what ground Legal has given their approval to send bulk messages to people who haven't explicitly given their explicit consent to this ? For me, this is the basic definition of "spam", and this is not allowed. The fact that you have my email in your database has nothing to do with me giving my explicit consent to receiving such email... --NicoV (talk) 20:32, 28 June 2015 (UTC)[reply]
NicoV - The research team did speak to me (senior legal counsel who handles privacy matters) prior to beginning this project to ensure that they complied with the WMF privacy policy. It is my view that this type of use falls within the permissible potential uses for email addresses under the policy. That said, it is a new use and therefore, will and should be the subject of discussion and debate. It is feedback like this that will help us refine email practices to be both effective and reflective of community values. Mpaulson (WMF) (talk) 22:45, 1 July 2015 (UTC)[reply]
Mpaulson (WMF), why do you think every commercial site asks you to check a dedicated box saying that you're willing to receive bulk messages and do not rely on privacy policy or selling conditions to allow themselves to send bulk messages ? Because, that's the only way of getting explicit consent for bulk messaging, which is otherwise spam. I repeat: I've never given you my explicit consent for bulk messages. When I filled my email address in frwiki preferences, that's what I read "Indiquer votre adresse de courriel est facultatif, mais permet de vous envoyer votre mot de passe si vous l’oubliez. De plus, cela vous permettra — si la section « Courriel » ci-dessous est configurée dans ce sens — de recevoir des courriels des autres utilisateurs sans que votre adresse ne leur soit divulguée et de leur en envoyer (à partir de votre adresse). Vous pourriez aussi choisir de laisser les autres vous contacter sur votre page de discussion utilisateur sans que soit nécessaire de révéler votre identité.": basically, that my email will allow to retrieve a password if I forget it, and other users to send me email without nothing my address (and only if I check an other box to explicitly allow it), nothing mentioned at all about bulk messages. Similar message on enwiki. Even after reading the privacy policy, nothing is mentioned about bulk messages of this sort. How can you consider that I've given my explicit consent for this? Or do you consider that you don't need explicit consent for bulk messages, meaning that you can spam people? --NicoV (talk) 04:18, 2 July 2015 (UTC)[reply]
I agree with Alice, see #Usage_of_user_database below. I consider this a violation of the Privacy Policy. Yellowcard (talk) 11:38, 28 June 2015 (UTC)[reply]
How many emails have been / are being sent, over what period of time?
On the question of opting out: every email should at the least have an opt-out link in the bottom for people who want to be excluded from future mail of this sort. SJ talk  06:53, 29 June 2015 (UTC)[reply]
SJ, we have sent out 12K emails.
The emails all had an opt-out option in the bottom of the email. There is definitely an issue that since some people were contacted by mistake and they could not read in French, they could not read that line at the end of the email. This is a problem that will be avoided by fixing the editor selection mechanism. --LZia (WMF) (talk) 20:19, 2 July 2015 (UTC)[reply]
Thanks for the stat, LZia. We should probably have guidelines for how to minimize sample-size. For bots, we require that they all work without complaint on 100+ pages before they can run automatically; surveys should probably also be run on a similar sample size before something larger is attempted. 'Research exhaustion' is a real risk, and the benefits from learning something new can be offset by irritation with future learning via the same channel.
I did see the opt-out sentence! It certainly complies with minimal nospam requirements; but a link that just takes a click, and takes you to a page confirming you've been opted out, is even better :) It may be worth getting the UI right once since then it can be used by all research. Warmly, SJ talk  22:15, 2 July 2015 (UTC)[reply]
I agree on the point about research exhaustion. At least three components should be considered when contacting users: desired sample size, time needed to achieve the desired sample size, and fatigue caused by the research. We considered the two factors when choosing the editor population, we should do more work on the latter. (In case you're curious, we made minor changes in the email text based on the feedback in the first hours from emails sent to 1K users.).
Yes, I'm with you. We need to fix this for consistency and ease of use purposes. I'd like to see a proper solution emerging for this in the next months. --LZia (WMF) (talk) 16:13, 6 July 2015 (UTC)[reply]

Thank you Lzia. I'm glad to see that concerns are taken seriously. As said, even if I personally disagree with legal's interpretation this entire discussion makes me feel confident that the different perspectives are considered in the next steps of your research. Alice Wiegand (talk) 17:58, 6 July 2015 (UTC)[reply]

Usage of user database[edit]

Hi! I do not consider the project and the emails itself a privacy intrusion. However: The mail got sent to an email address which I do not use any longer for WikiMedia projects. This implies that the research team stored a copy of the mwusers table or whatever kind of extract. This is indeed a violation of data privacy. I understand if there are some backups containing old versions of the databases, but it is definitely not necessary that a research team stores account settings. I also quote the privacy policy:

To facilitate their work, we may give some developers limited access to systems that contain your personal information, but only as reasonably necessary for them to develop and contribute to the Wikimedia Sites.

Regards --Chricho (talk) 10:12, 27 June 2015 (UTC)[reply]

Hi Chricho. Thank you for writing to us. We queried the latest version of the user table to get your email address. We did not use any copies of the database, that table is the most updated table I know of. I'm asking for help to figure out what has gone wrong here. We will update you here (this may have to wait until Monday because of the weekend).
Regarding your quote from the privacy policy: All people who have had access to emails are WMF employees (myself and Ellery) or WMF fellow (Bob) with signed NDA (Jure has not signed an NDA since he does not work with any non-public data). We are not providing email addresses to anyone outside of the Foundation. Please let me know if I understood your point incorrectly. --LZia (WMF) (talk) 21:25, 27 June 2015 (UTC)[reply]
Hi Leila, I'm confused as the quote from the Privacy Policy seems very clear: The access to personal information is clearly limited to developers [...] but only as reasonably necessary for them to develop and contribute to the Wikimedia Sites. Without any doubt this personal information access was not only for developing and contributing to Wikimedia Sites, but the sensible information was given to WMF employees for other purposes than development. I agree with Chricho that this was a violation of the Privacy Policy. Yellowcard (talk) 11:31, 28 June 2015 (UTC)[reply]
I disagree with your interpretation, here; by definition, research should contribute to our development of Wikimedia sites, either directly (by validating software systems that have been built) or indirectly (by increasing our knowledge of how the Wikimedia ecosystem works, allowing us to better develop software systems). The interpretation that the privacy policy does not give employed, NDA-covered researchers access to any kind of sensitive data seems problematic because...what would be the point of having those researchers? Chricho's quote is from the privacy policy, specifically the "to understand and develop" section. Here's the problem.
As the rest of that section makes clear, this covers how we share information outside the organisation. Every section references volunteer, or third-party, developers, and when it does also mentions researchers. A more full quote of the privacy policy is "The open-source software that powers the Wikimedia Sites depends on the contributions of volunteer software developers, who spend time writing and testing code to help it improve and evolve with our users' needs. To facilitate their work, we may give some developers limited access to systems that contain your personal information, but only as reasonably necessary for them to develop and contribute to the Wikimedia Sites." (emphasis mine). This isn't anything to do with paid WMF employees, and is entirely to do with volunteers. Concluding it has anything to do with employees would require you to ignore literally every other sentence in that section. Ironholds (talk) 14:14, 29 June 2015 (UTC)[reply]
Hi Chricho and Yellowcard. Ironholds is correct in his interpretation of this section. The section you quoted is meant to address volunteer software developers, not WMF staff and contractors. It explains under what conditions WMF can share personal information with third parties outside of WMF. Hope that provides some clarification. Mpaulson (WMF) (talk) 20:53, 30 June 2015 (UTC)[reply]
Hi Chricho. This answer explains the reason we have used your old email address. If you would like to discuss the specifics about your email address that cannot be discussed publicly, please email me directly at --LZia (WMF) (talk) 04:11, 30 June 2015 (UTC)[reply]
Hi LZia! Thanks for the information. However, from where did you query the user data? The explanation given in the e-mail seems to be incompatible with my assumption that it has been queried from French or English Wikipedia (since I visit them regularly).
Second question: Would a batch update of all the local user databases be unreasonable? The current situation seems to imply unintended use of addresses, while the user is given the impression that she can change the address globally. --Chricho (talk) 09:53, 1 July 2015 (UTC)[reply]
Hi Chricho. I checked your two emails in enwiki.user and frwiki.user tables and they do not match. I'm not sure why this is happening. I'm asking around and someone will get back to you here. I hope we have your second question answered in that process, too. --LZia (WMF) (talk) 20:48, 1 July 2015 (UTC)[reply]
If I check them in the preferences, they match. --Chricho (talk) 21:14, 1 July 2015 (UTC)[reply]
The explanation was partially incorrect; sorry for confusion. The local email field is updated when you log in to a wiki, not when you visit it.
Tracked in Phabricator:
Task T104500
Unintended use should not be a problem once people get used to having a global user database and using that in their queries. Still, we should not keep private data unless we actually need it. Filed as phab:T104500. --Tgr (WMF) (talk) 21:26, 1 July 2015 (UTC)[reply]

Opt in[edit]

I note, above, that the reason the requests were sent by email is due to privacy concerns, as the algorithm utilized the editor's editing history (which is public anyway...). This, of course, somewhat ironically, raised a lot of people's hackles over...privacy concerns. And in general, sending mass email blasts to people who have not opted in to them is spamming and bad netiquette. Hence the reason for the pushback here.

Instead, why not leave a talk page notice, not suggesting any articles yet, but just noting that the project is ongoing, the editor might be a good fit, and would they like to opt in to receive further emails with article suggestions tailored to them? Then, once some people do willingly subscribe, email them to your heart's content (and do give them a means to unsubscribe again). But this was, while I'm sure an innocent mistake made with the best intentions, really not acceptable, and rather than questions about "how can the email be improved", there needs to be some assurance that this is recognized as an error and mass emails to users who have not indicated their willingness to receive them will not happen again. Seraphimblade (talk) 19:39, 27 June 2015 (UTC)[reply]

A message without a direct use is more likely to be spam than a useful message. Better improve the messages. --Nemo 20:08, 27 June 2015 (UTC)[reply]
A message sent to many people who haven't given their explicit consent is spam, that's for me the definition of spam. So, the message we received is pure and simple spam, whatever the intentions behind. --NicoV (talk) 21:27, 27 June 2015 (UTC)[reply]
Your proposed definition of spam is not consistent with industry standard: --Nemo 21:42, 27 June 2015 (UTC)[reply]
Well, for me, it's my definition: unsolicited (I've not given my explicit consent) and bulk (larger collection of messages). What's the difference ? --NicoV (talk) 21:44, 27 June 2015 (UTC)[reply]
Nemo Can you explain the difference between the link your posted and my definition ? --NicoV (talk) 21:48, 27 June 2015 (UTC)[reply]
Read the whole page, please. In particular "Technical Definition of Spam". --Nemo 21:52, 27 June 2015 (UTC)[reply]
First, if you read the "definition", spam means only unsolicited and bulk: exactly what happened here...
Second, if you read the "technical definition", which is an attempt to explain the exact definition above in technical terms:
  • A) "the message sent is equally applicable to many other potential recipients": half the recipients had a non personalized list, so complete matching with the definition. Even for the personalized list, it's still a match (otherwise, just adding the email address in the text of the message would mean not spam)
  • B) "the recipient has not verifiably granted deliberate, explicit, and still-revocable permission for it to be sent": exactly the case, or if you think otherwise, show me the proof ("verifiably") that I granted permission
--NicoV (talk) 21:59, 27 June 2015 (UTC)[reply]
Nemo: Mass messages to people who didn't agree to receive them are spam. Period. It doesn't matter if it's you blasting your address book with one of those stupid chain letters or a research team sending these. It's spam. Get opt-in first. Seraphimblade (talk) 23:59, 27 June 2015 (UTC)[reply]
Hi Seraphimblade. Thank you for your feedback and sorry for the delay in getting back to you.
I like what you're recommending for future tests though I'd like to point out that this may not always be possible. For example, asking people if they want to participate in a specific test or not may introduce selection bias into research results in ways that we may not be able to explain. For example, in our test, if we would go with your recommendation, and we still wanted to compare random versus personalized recommendations' performance, we could run into a problem since generally, those who opt-in to take part in a test can be more willing to contribute to random recommendations even if they feel they are not so good, just because they have opted in to help. This would be a selection bias in our case since now because of the less restricted test we did, we know that random recommendation does much more poorly than personalized. The other potential issue is the number of people opting in to participate. For some research, if those numbers are low, the results become unreliable. So a researcher may need a relatively fast way to reach a large number of users in a more controlled (for biased purposes) way. Now, there are editors who do not want to be contacted by their email for such purposes and we need to find a way to respect that wish while allowing researchers to run tests. --LZia (WMF) (talk) 06:11, 2 July 2015 (UTC)[reply]
It looks like you're avoiding any solution that is not email: why would there be a bias between random recommendations and personalized recommendations if all you ask people is if they want to participate in a test? That would be you that would that decide in which group to put people, after they have told you they are ok to participate. --NicoV (talk) 06:36, 2 July 2015 (UTC)[reply]

Frustration with the ContentTranslation tool[edit]

I have just started translating another article (whose topic is, this time, truly uncovered in the French version), but I find it hard to use the ContentTranslation tool, as you cannot (it seems) edit the wikicode (or use a WYSIWYG equivalent). I gave up in the first paragraph, because I couldn’t change the target of a link. I am suddenly wondering whether this is relevant to your research (perhaps I should rather ask the team behind ContentTranslation about my problems, but this is linked anyhow, since this limitation makes me – and so probably others – less likely to help). Regards, Eiku (talk) 23:28, 25 June 2015 (UTC)[reply]

Oh, I forgot to tell you that I find the idea nice, perhaps even brilliant (let’s not be only negative). Eiku (talk) 23:33, 25 June 2015 (UTC)[reply]

Eiku, thanks for the compliment.
You can currently edit the link by clicking the appropriate link in the source text. Very soon it will be possible to add a link to any page. --Amir E. Aharoni (talk) 08:16, 26 June 2015 (UTC)[reply]
Hi Eiku
Did you see this help page about Content translation tool ?
Trizek (WMF) (talk) 09:08, 26 June 2015 (UTC)[reply]
Hi Eiku
Thank you for taking the time to read the email, check out the recommendations, do the translation, and giving us feedback. It's greatly appreciated. We are also thankful for your encouraging comments. We are working hard to improve things on our end and it is great to hear your feedback.
As you guessed, and Amir and Trizek indicated, the specifics about the tool goes to the ContentTranslation team. They are working around the clock on the tool and hopefully they will get to your comment soon.
A comment regarding this research: the choice of tool is up to you. You do not have to use ContentTranslation tool if that does not work with your workflow yet. You have five recommendations, whether you want to translate them via the tool or otherwise, or create them from scratch is your call.
Thanks again for your feedback and I hope this helps.
LZia (WMF) (talk) 16:08, 26 June 2015 (UTC)[reply]
Hi @Eiku: It's been a while. I was wondering if you would be willing to participate in a user research study related to the article recommendation research. We have built a tool (design in progress) and your feedback will be valuable in understanding the strengths and weaknesses of the tool. If you would like to participate, please send me an email (, and please don't feel obligated. Thank you! --LZia (WMF) (talk) 18:05, 5 October 2015 (UTC)[reply]

Mail arriving too early[edit]

Hi, thanks for the email I received about "important articles", but don't you think it's really too soon to encourage people using Content Translation ?

The current version of Content Translation still has bugs that produce articles with clearly unwanted syntax. I already reported several problems a few weeks back (T96242, T96467, ...), and most are still present. Just picking up the first content translation in the recent changes on frwiki, I get this awful result: nowiki all over the place, invisible internal links (only nowiki has displayed text).

Don't you think that you should first fix those bugs that create damaged articles before advertising people by email to use this new tool? Wikipedia is a production wiki, not a test wiki. --NicoV (talk) 11:27, 26 June 2015 (UTC)[reply]

Hi NicoV
Thank you for your feedback. I understand your concerns. Ideally, this test should have been done completely separate from ContentTranslation, to give the tool enough time to get readier for larger scale use. We initially thought about recommending redlinks to users instead. The issue with that is that we need to translate the article title to French in order to be able to make such redlinks and that does not scale in the scale we are testing. Are there other ways we can send recommendations that we may have not considered? I appreciate your thoughts on it.
Regarding feedback to ContentTranslation tool, it's best to do it on the tool's talk page or as you've already done, via phab tickets. I know that team is working really hard on the tool and hopefully they get to your reported bugs/issues soon.
LZia (WMF) (talk) 15:34, 26 June 2015 (UTC)[reply]
No, I don't think you understand my concern at all: in the last few years, WMF seems to believe that the goal of production wikipedias is to be testing platforms for software (sometimes totally unwanted by editors). Of course, as many other editors, I strongly disagree with this view: goal of production wikipedias is to create a encyclopedia, not testing software. You're putting your test above this goal, thinking that your research is more important than creating an encyclopedia. Like others, 'd like to say to stop spamming us, but rather request feedback before trying to spam us. Why not give the link to the article to the source language, letting the people who translate the article decide which translation is best ?
In France, adding people to diffusion lists without their explicit consent is forbidden, and I've never given my consent to such spam. Please stop (and don't tell me to unsubscribe, I've never given my explicit consent to this).
For feedback, as I mentionned in my post below, there's no obvious way to send feedback to ContentTranslation tool: they don't provide any link in the summary of the edits they are making. So, no, I'm not going to search in the many WMF wikis to try to find where such feedback should be posted. --NicoV (talk) 18:55, 26 June 2015 (UTC)[reply]
And also, could you modify the configuration of Content Translation tool so that there's a link in the tag ? For example, on frwiki, Visual Editor tags have a direct link to fr:Wikipédia:ÉditeurVisuel which helps finding where to submit problems, whereas Content Translation tags are just plain text. --NicoV (talk) 11:49, 26 June 2015 (UTC)[reply]

Mismatch with personal interests[edit]

A really random spam looking like phishing[edit]

I think I understand the original idea, but what I received was a really random spam. Here is what went wrong:

  1. People do not want to receive automated emails they did not ask for. Emails are not meant for such communications, if you want to spam people, use talk pages: active users read them, and inactive users are inactive anyway.
  2. Sets of languages are not adapted. I wonder why I received a proposal to translate from English to French if neither is my native language (and both of my native languages have ContentTranslate) and I have made less than a thousand edits in French Wikipedia. Why wouldn't you suggest people to translate to their home wiki from their native (or second best) language?
  3. Articles are not important for target languages. How can you say that the article en:John Franzese Jr is important for other Wikipedias if this American criminal is not even interesting for English Wikipedia and is definitely not described in any French-language sources? Why wouldn't you start looking for articles related to that target languages, e.g. articles about France or French people in English without French interwikis?
  4. Articles are irrelevant. How can I be interested in a book like en:w:I'd Tell You I Love You, But Then I'd Have to Kill You if I am an adult male and have never written even a line about young adult fiction for girls? (Actually all five articles were completely irrelevant, but this is the most striking example).
  5. Articles for translation must be of a reasonable quality. Stubs or articles with issues like en:John Franzese Jr should not be candidates for translation.

Honestly, at the first glance I thought it was a phishing, as I thought that WMF just could not send a completely irrelevant email like that. Please consider improving your algos and using talk pages instead of spamming next time. Thanks — NickK (talk) 00:11, 26 June 2015 (UTC)[reply]

I quite agree with all these comments... --H4stings (talk) 08:27, 26 June 2015 (UTC)[reply]

Hey NickK, thank you for your comments, I will try to address each of them in turn.
1. We considered using talk pages but were concerned about the privacy implications for those editor's who received recommendations based on their topics in their edit history.
2. Our article selection algorithm tries to predict the number of pageviews the article will get in the target language. On average the model does a decent job. For articles that exist in the source and the target, it can estimate the percentile of the article in the target language based on features of the article in the source language within 2 percentile points. The model, however, does exhibit a bias in that it overestimates the importance of articles containing French and Italian words. Furthermore, the topic expressed in the article (e.g. crime, mafia, FBI) are probably highly weighted by the model. But I agree, its a poor suggestion
3. Yes, you received recommendations that were just drawn from a pool articles estimated to be widely read if the existed in the target language. You did not receive recommendations based on topics you have edited in the past. Hence, the recommendations seem irrelevant to you. I apologize that the wording of the email suggested that they would be based on the topics you have edited in the past instead of the fact that you have made edits in both languages and at least one sizable edit in any of the languages.
4. Instead of removing stubs from the pool of recommendations, we just require the article to be at least 1500 bytes in length. I agree with you that it is probably better to just remove all stubs and have a length threshold.
Thank you for you comments. Getting this sort of feedback really helps to understand the shortcomings of various parts of the pipeline! --Ewulczyn (WMF) (talk) 18:28, 27 June 2015 (UTC)[reply]
Hi Ewulczyn (WMF) and thanks for answers.
I do not think that there is any privacy problems in recommendations: there are already tools suggesting articles based on contribution history, although they usually suggest articles for improvement.
It would make sense to look into WikiProject templates on article pages, especially given that French Wikipedia heavily relies on them. It would be great to filter primarily pages with high and medium importance, as those were assessed by a human as relevant and notable. Please also note that if articles are very focused on local topics (so were mine, as all of them were related to US and Canada), they are less likely to be interesting outside these countries.
If you pick random articles, please state clearly you pick random articles. My most recent edits in French Wikipedia were about French scientists and about Paris, thus I couldn't believe that some young-adult fiction or American criminals can be based on my contribution history. My first thought was that it was a kind of phishing (someone got my email and sent me a mail looking like a Wikimedia one), as I knew that Wikimedia algos are usually a bit more accurate than that
I assume that I could have been happier if I had received some links related to my contributions, so please work on better presentation of this project — NickK (talk) 10:30, 28 June 2015 (UTC)[reply]


I suppose article suggestions are customized to each potential translator. For information, here are the 5 I received by email. I hope no other potential translator received the same.

- Government agencies in Sweden - Shopaholic and Baby - Child labour in the diamond industry - Early 1990s recession in the United States - St. Kazimierz Church

I am probably flagged as a pedophile, that's why wikipedia thinks I'd be competent to translate the baby and child articles, thanks, wiki!  ;)

I just noticed that the links in the email go straight to the translation page, with no possibility to preview the article before accepting the translation conditions and deciding if its content is worth translating. Not so cool.

I also noticed some poor English quality articles in that list. Probably would need to do some source language edits as well.

Hi, Thank you for your feedback, it's appreciated.
Re recommendations themselves: the update under Personalization in Lessons Learned may be of interest to you although I cannot check which group you were in.
Re an intermediate step to see the article: I hear you. This is a good feedback for us. This will require some more Engineering work but can definitely improve the experience.
Re poor English quality articles: do you have recommendations on how we can filter out such articles automatically?
Thanks again for your feedback. --LZia (WMF) (talk) 17:42, 27 June 2015 (UTC)[reply]
Well, many en.wikipedia articles probably have been rated on their talk pages by Wiki Projects, I recommend excluding anything below C-class. --HHill (talk) 18:07, 27 June 2015 (UTC)[reply]
Thanks HHill. So, include any article with no rating, exclude any article with ranking under C. I made a note of that. It should go to Lessons Learned section once we implement it in the code. Thank you! --LZia (WMF) (talk) 20:11, 27 June 2015 (UTC)[reply]
You will need additional criteria, as other language versions don't have those ratings (either not as comprehensive, or not at all [e.g. de.wikipedia]). And just for clarity: I'd exclude lists too (even featured ones), usually there isn't that much free text to translate anyway and for the rest it should be possible to rely mostly on Wikidata. --HHill (talk) 08:46, 28 June 2015 (UTC)[reply]

Out of scope[edit]

Hi. Just to say that no one of the five articles proposed fulfill my interests. For instance two articles concerning India, whereas I have never written anything about this country (only perhaps cancelled some edits during RC patrol).

Greetings, --Floflo (talk) 07:38, 27 June 2015 (UTC)[reply]

Thanks for the feedback, Floflo. The group of all users was divided into two groups, one that received recommendation from our algorithm and another that received random recommendations. This setup is called A/B testing and is necessary to know whether the system behaves better than a simpler baseline system. It might be the case that your recommendations were bad because you might have been part of the "random" group (but it might also be the case that the system failed to be useful in your case...) -- since we haven't evaluated the results yet, we unfortunately cannot tell you at this point which group you were in. --Cervisiarius (talk) 18:14, 27 June 2015 (UTC)[reply]
Cervisiarius, many thanks for your answer and clear explanations. Bis bald, --Floflo (talk) 21:37, 27 June 2015 (UTC)[reply]

Not too bad[edit]

Hi all,

Here is some feedback about the email I received.

I wonder how I have been enrolled as a translator. Some say it comes from using "Babel" boxes on one's home page, but I have none of them on my personal page. Maybe you have based it on my past contributions.

Why translations from English to French? I translate probably a lot more from Italian to French, and I have a small percentage from Spanish, German and Russian to French as well. My best guess is that the experiment is always assuming English as the source language.

About the selection of articles: not too bad, indeed. I've been offered to translate politics and yes it's one of my preferred areas, together with cinema and a few others. I am not sure though that I will do them because they definitely fall in the "boring" category, but they are on the spot enough to make me at least feel guilty of not doing them :-).

Overall the initiative seems a good thing to me. Some felt it as spam, maybe some opt-in or opt-out option is needed (apologies if that exists already).

Best, --Agatino Catarella (talk) 09:03, 27 June 2015 (UTC)[reply]

Thanks, for the feedback, Agatino. We're happy you're considering getting involved in some of the recommended translations! To respond to your questions: (1) The method employed for selecting potential contributors is described on the research page of the project; we're aware that this is one of the big weak spots of the current system, and we're working on improving it. (2) English-to-French is simply the language pair we chose for this first test (and it's good we didn't roll this out to more languages at this point, given the amount of useful suggestions for improvement that have been coming in from the English-to-French test alone...). Thanks again for the encouraging feedback! --Cervisiarius (talk) 18:05, 27 June 2015 (UTC)[reply]


Bonjour, je rejoins les avis d'Indeed et de Hadrianus. Voici les articles "importants" qui m'ont été proposés :

  • EMC Winton-engined switchers (des locomotives)
  • Wallball (un jeu de ballon)
  • Conor Clifford (un footballeur irlandais)
  • Military Courts of the United Kingdom (un tribunal militaire)
  • Battelle Memorial Institute (un département d'études scientifiques et technologiques)

Or, je contribue principalement sur le cinéma, les parfums, les bijoux, les romans d'amour, les chansons etc. Pour vous dire à quel point, j'étais la candidate idéale pour ces articles ! :) --Guil2027 (talk) 12:53, 27 June 2015 (UTC)[reply]

Merci pour la réponse, Guil2027. Ceci était un test pour évaluer la qualité de notre système de recommandation, alors tous les utilisateurs ont été aléatoirement partagés en deux groupes avant le test, un groupe recevant les 5 recommandations de notre algorithme, et l'autre recevant des "recommandations" aléatoires. Le deuxième groupe est nécessaire pour savoir si l'algorithme trouve, effectivement, des articles plus utiles (c'est-à-dire avec un plus haut pourcentage de traductions démarrées) qu'un système de base qui choisit par hasard. Or, c'est possible (mais non nécessaire) que vous fassiez partie du groupe aléatoire, ce qui pourrait expliquer les recommandations bizarres de votre point de vue. --Cervisiarius (talk) 17:51, 27 June 2015 (UTC)[reply]
Malheureusement, pourquoi utiliser des tournures affirmatives comme "Les cinq articles suivants existent dans la version anglophone de Wikipédia et sont considérés comme étant importants pour les autres langues du projet." et pas des tournures plus neutres ? Quand j'ai lu "Nous identifions les articles importants et populaires grâce à un algorithme. Cette sélection d'articles peut être un résultat personnalisé ou aléatoire.", j'ai compris que le fait de m'en donner à moi 5 pouvait être custom ou random, mais que le pool d'articles était bien fait, lui, pas qu'il y avait un pool placebo et un autre d'articles bidons également, qui pourraient être traduits par des bénévoles, gaspillant au passage leur bonne volonté. Si ça venait d'un autre acteur que la WMF, on crierait au SPAM éhonté (cf la dernière phrase avant la liste par ex.). J'espère qu'il y aura un autre email pour expliquer, même a posteriori qu'il y a eu du cobaye dans l'air. C'est dommage de ternir une bonne initiative par une exécution assez amatrice. Je pense que le fait que le message soit en français accentue encore plus ce problème pour le coup. cdlt, Xentyr (talk) 18:36, 27 June 2015 (UTC)[reply]
Xentyr, je m'excuse si je ne me sois pas exprimé clairement. Vous avez raison que le pool entier est composé d'articles estimés d'être importants. Un groupe a reçu des recommandations personnalisées, et l'autre des recommandations aléatoires (comme c'était écrit dans le courriel), mais les deux group ont reçu des recommandations choisies parmi les articles "importants" (selon l'algorithme d'estimation). J'espère que ça élimine l'impression qu'on ait "gaspillé" la bonne volonté des participants. --Cervisiarius (talk) 19:32, 27 June 2015 (UTC)[reply]
Importants vous êtes surs ??? Voici ma sélection : en:2003 Coupe de la Ligue Final, ébauche sur en et compris dans fr:Coupe de la Ligue française de football 2002-2003 sur fr ; en:Quentin Pereira, obscur footballeur ébauche sur en, non éligible sur fr, en:Monégasque Football Federation, ébauche sur en, non éligible sur fr, en:Manoka, ébauche d'une commune camerounaise sans sources, en:Pichavaram ébauche de moyenne iomportance sur en. Donc où sont les articles d'importance non traduits dans cette liste ?--Remy34 (talk) 12:48, 28 June 2015 (UTC)[reply]

In addition to better matching of articles to editors, it is important to have a better measure of importance of the topic. I agree that that sentence in the message seems too strongly worded. Better and more accurate to say "this article is popular in one language, see if you think it is worth creating in another". This would also be a tiny step towards addressing the problems of the translator whose hockey-player article was deleted.

More than half of the articles I have seen people mention have seemed low-importance to me. That's still often knowledge that someone wants in some language, but not necessarily everyone in every language. In particular, while people who become power-users of this tool may want to be able to "tune down" the importance or quality settings to use it with stubs or minor articles, the average editor would like to work on some of the most-needed and most-wanted articles in their subjects or categories of choice. SJ talk  06:50, 29 June 2015 (UTC)[reply]

Unsuitability of translation source[edit]

Topics already covered in an article with a slightly larger or narrower scope[edit]

Hello, I’ve just received your e-mail. However, I noticed that the subject of the first article you suggest I start on the French-language project (Produce traceability) is already covered (in Traçabilité agroalimentaire), but not linked to the English-language article because its scope is very slightly different, though there is no need for a new article. Is there already a solution to allow such closely-related (and even largely overlapping) articles to be associated with each other, as the interwiki link is already used (Traçabilité agroalimentaire is currently linked to Traceability#food processing)? Thanks. Eiku (talk) 23:15, 25 June 2015 (UTC)[reply]

Well known issue, phabricator:T54564. In this case merging [1] should be ok. --Nemo 09:54, 26 June 2015 (UTC)[reply]
Hi Eiku, This is a great observation, one we currently rely on humans to make! We don't have a method of merging closely related or overlapping articles that have not been associated with each other via Wikidata or interlanguage links. --Ewulczyn (WMF) (talk) 20:54, 26 June 2015 (UTC)[reply]
The inviting mail shall contain a hint to that problem to prevent unnecessary work and the annoyance when the case is discussed on the redundancy page. --Rainald62 (talk) 22:02, 27 June 2015 (UTC)[reply]

Important? really?[edit]

Hello, as everyone else here I have a few comments about this new system. I don't see any problem to receive emails like this (I understand however why some people do). But...the mail specify the proposed article are considered "important". So I have check the few proposition...and none of them are important. In fact they are mostly stub or have some specific issues in english. I don't see the point of creating them in French (apart have a new stub). From my point of view the algorithm needs to be improve. Triton (talk) 07:31, 26 June 2015 (UTC)[reply]

Quite true, most English Wikipedia topics are unimportant. Probably they should be ranked by number of interwikis at least. --Nemo 09:58, 26 June 2015 (UTC)[reply]
Hi Triton, Hi Nemo
Thank you for taking the time to write a feedback. It's greatly appreciated.
Triton, thank you for mentioning your email preference. Emails are definitely a concern for some folks and I completely understand that. I'm committed for us to find a solution for those who are concerned. For now, with the opt-out option in the email, they can be sure not to receive any other research emails but that solution doesn't scale. We need a proper opt-out solution.
"important" is a tricky word. The algorithm considers these articles as important. There are few challenges and things we need to work on on this front:
  • The editors should make the final call whether an article is important in the destination language. The algorithm predicts that if these pages are created, they receive somewhere from non-negligible to medium/high pageview load, but that's not enough. In the, hopefully not so far from now future, I'd like to see a feedback loop from editors choice back to the algorithm. In other words, if you think a recommendation is not important in the destination language, we should allow you to specify that, and the algorithm should learn from your feedback.
  • We have a choice to put a threshold on the importance ranking of the algorithm. We can put the threshold very high and end up with articles that are predicted as very important, or put the threshold lower, and receive more of not-very important articles. They are all important according to the algorithm, but their importance differs. We do not have a good handle of where this importance threshold should be right now, and we are working on it. There are two challenges here: the larger the set of important articles, the higher chance users are recommended articles that they potentially like to edit. The lower that threshold, the personalization part will do poorer since the set the algorithm can consider for each user is smaller. Also, our algorithm currently considers the predicted number of pageviews had the page existed as the measure of importance. We know that pageviews alone are not enough and we want to find a better outcome to predict (a mixture of pageviews and other metrics?). In the absence of the better measure, we should not make the set of important articles too small (we should not increase the threshold too high) since that can result in users receiving recommendations only around pop-culture articles. This is not something we want to systematically encourage at least without fully understanding the consequences.
I hope this addresses your comments. Thanks to the two of you for your feedback.
LZia (WMF) (talk) 16:36, 26 June 2015 (UTC)[reply]
Hi, sorry to jump in, but the French text of the email at least did not at all reflect this 'feedback loop' in anyway. Like others, I assumed they were actually important and I did a full translation of a company article to give a try to this new ContentTranslation tool which was quite frustrating (no way to access to the code, or translate references for instance). Yet I had a mixed feeling as the article was very light. So after that attempt, I went to check the other 4 candidates in EN to realize that actually none of them were really important or even up to the EN standards, which means they would be even more obscure in French. And, icing on the cake, as a confirmation of my early doubts, a few minutes after I published my article on the company, it was flagged by one of our bots as orphan article, meaning nobody in FR red-linked to it. I would not be surprised if it is deleted in the next months.
In a nutshell, that is definitely a good idea, you were spot on in selecting me (but talk pages would be better than emails though, also, I was not at all very active those last months tbh), yet importance is flawed: it should be somehow related to some indicators that would mainly come from the targeted Wikipedia, here only one out of 5 would have some relative importance to FR, and maybe two in EN!).
As a side note, skimming over a lot of different topics, I cannot assess the accuracy of your algorithm regarding areas of interests.
Xentyr (talk)18:09, 27 June 2015 (UTC)[reply]

PAN PAN {help!!!} I am confused![edit]

I receipt a traduction querry from about the traduction oo fthe page w:en:Prince_William_of_Hesse-Kassel from english into french. [2]

After a short view into the content I noted that de information is:

* completely wrong
* not documented (nor referenced)

The verified best content to proof this discovered bad quality is here [1] It is in german, but believe me, you can thrust this page and the mentionned references!!

What is the best procedure to:

* note , signalling or block a such weaky page
* force the author to correct and then complete with refs
* protect translaters not to make work on garbage

I am motivated and willing to work on articles for translation, but not on that level of 'quality'

--Cosy-ch (talk) 10:02, 26 June 2015 (UTC)[reply]

Hi Cosy-ch. Thank you for your feedback. It is great that you checked whether the content is worthy of translation. In the future, we want to 1) improve the algorithm to not include such pages, 2) collect feedback from the editors when they see the recommendation to help the algorithm learn more. The human verification step is absolutely critical at this step. We will make it clearer in any future communications.
Can you help us understand what a good filter for not including such pages is? Requiring a minimum number of references comes to mind. What other filters can help? --LZia (WMF) (talk) 17:49, 27 June 2015 (UTC)[reply]

Mail non sollicité, méthode insatisfaisante[edit]

Merci de votre e-mail, dont je regrette toutefois qu'il est non sollicité.

Quelques commentaires :

  • La communauté Wikipédia en français est dotée d'outils (projets, suggestions de création d'articles sur le bistro etc.) qui permettent facilement à chacun de trouver des endroits où l'on a besoin de bras.
  • Pour ma part, je construis moi-même la liste des articles que j'aimerais bien créer. Parfois (voire souvent), ce sont des articles qui n'existent dans aucune langue. Après tout, même si votre algo était parfait, il ne cernerait pas aussi bien mes centres d'intérêt que je ne peux le faire moi-même...
  • Je pense que ce n'est pas le rôle de la Fondation d'orienter le travail éditorial
  • Vous proposez des traductions d'articles de l'anglais vers le français. Chaque wikipédia devrait-elle tendre vers la structure et le contenu de wikipédia en anglais ? Pour moi, la réponse est non.
  • Le mail est intrusif et je n'ai pas laissé mon adresse mail pour recevoir des suggestions éditoriales.
  • Un article créé par traduction n'a pas une bonne garantie de sourçage puisque le traducteur n'ira certainement pas vérifier que les sources disent bien ce que dit l'article (et qu'il n'a pas forcément accès à ces sources)

Si un bot tournant sur fr.wikipedia laissait cette suggestion sur ma page d'utilisateur (avec possibilité de désinscription), ce serait plus positif. Mais la procédure correcte et respectueuse de la communauté WP:fr serit la suivante : 1. Demande d'autorisation de lancer un bot 2. Message laissé par le bot sur chaque PU (ou sur les PU ciblées par le bot) demandant à l'utilisateur de confirmer qu'il souhaite recevoir ces suggestions sur sa PU et/ou par mail (dans ce dernier cas pour éviter les problèmes de respect de la vie privée). Cette méthode serait d'une part plus respectueuse de la communquté WP:fr, qui n'a rien demandé à personne mais aurait dû avoir son mot à dire en amont, d'autre part des contributeurs individuels.

Il y a sans doute une bonne idée à creuser et à mettre en oeuvre avec ce bot, en évitant l'écueil d'une uniformisation des différentes versions linguistiques. Mais il faut y mettre les formes !

-- 13:03, 26 June 2015 (UTC)[reply]

Bonjour. Merci d'avoir pris le temps de nous donner votre retour ; je vous prie de m'excuser de ne pas avoir répondu plus tôt. Voici quelques éléments de réponse correspondant aux points que vous avez soulevés :
  • Merci d'avoir mentionné ces autres outils. Notre test a eu lieu sur Wikipedia en français, mais notre système de recommandations fonctionne pour toutes les langues, et toutes ne disposent pas du même degré d'organisation que Wikipedia en français. Par conséquent, les contributeurs de chaque projet sont libres de décider s'ils souhaitent utiliser nos recommandations ou non.
  • C'est en effet le cas pour certains contributeurs, et notre but n'est pas de remplacer les envies des contributeurs par nos recommandations. Nous veillerons à l'avenir à ce qu'il soit plus facile d'ignorer nos recommandations si vous ne souhaitez pas en bénéficier. Je veux insister sur le fait qu'il s'agit de recommandations, pas de demandes : le choix de les consulter ou de les ignorer vous appartient
  • Cette discussion est importante et dépasse le cadre de notre projet de recherche. Je voudrais juste mentionner deux autres initiatives de la Fondation qui ont contribué à aider les utilisateurs à ajouter du contenu : en 2014, l'équipe Growth a mené un test proposant des tâches aux utilisateurs, et en 2014/2015 l'équipe WikiGrok a mené un test visant à ajouter du contenu à Wikidata.
  • Ce point est particulièrement important. Voici ce que nous avons écrit dans notre annonce sur le bistro à propos du test :
    "Le but de ce test est de vérifier si l'algorithme peut aider à avoir un meilleur partage du savoir dans les langues des différents utilisateurs. Le choix de la langue source (l'anglais) et de la langue cible (le français) pour ce test est basé sur la disponibilité des contenus dans cette langue, la taille de la communauté de contributeurs et le nombre de données dont nous avons besoin pour vérifier que l'orientation de notre recherche est prometteuse. Si c'est le cas, nous allons améliorer l'algorithme sur la base des retours de la communauté et la liste des éléments que nous avons déjà rassemblés, afin de le proposer dans l'Outil de traduction (ContentTranslation) et/ou via l'API, et pour réaliser d'autres combinaisons de langues."
Avant d'étendre notre système de recommendations à d'autres wikis, nous serions ravis de générer des recommandations qui utilisent d'autres langues. N'hésitez pas à nous dire si des recommandations basées sur d'autres langues vous seraient utiles (à vous ou d'autres contributeurs).
  • Je comprends cette préoccupation, et je suis désolée que vous n'ayez pas apprécié de recevoir ce mail. La décision d'utiliser le mail a été prises après avoir pesé le pour et le contre des autres méthodes (la page de discussion, les notifications Echo, ou encore les recommendations intégrées à l'outil de traduction, qui ne sont pas encore disponibles). Notre mail contient un lien de désinscription si vous ne souhaitez plus recevoir d'autres mails liés à nos projets de recherches à l'avenir ; j'espère que cela répond à vos attentes.
  • Oui, c'est quelque chose que nous voulons rendre plus clair dans les prochains tests : nos recommandations ne sont pas destinées exclusivement à la traduction. Les recommandations sont basées sur du contenu présent dans d'autres langues, mais les contributeurs sont tout à fait libres de créer un article sur ce sujet eux-mêmes. La raison pour laquelle nous avons proposé d'utiliser l'outil de traduction dans notre mail (plutôt qu'un lien rouge) est qu'un lien rouge nécessite de connaître le titre de la page en français, dont nous ne disposions pas.
En ce qui concerne vos conseils pour les prochains tests de ce type : Le processus que vous décrivez me semble pertinent, et j'aime particulièrement l'idée de laisser l'utilisateur choisir entre recevoir le message par mail ou sur leur page de discussion. Je tiens juste à souligner que, dans certains cas, nous ne pouvons pas discuter du test avant qu'il n'ait lieu, ou bien nous ne pouvons pas demander aux utilisateurs de s'inscrire pour participer au test. En effet, cela pourrait rendre les résultats du test invalides. Il est donc indispensable de mettre en place un mécanisme permettant aux utilisateurs d'indiquer à l'avance s'ils ne souhaitent participer à aucun test. Cela étant dit, je suis entièrement d'accord avec vous que votre approche devrait être celle utilisée en priorité quand c'est possible.
Merci encore pour vos retours détaillés. Ils nous aident à améliorer notre approche, et ainsi à soutenir la communauté .
Message écrit en anglais par LZia (WMF) (talk) et traduit en français par Guillaume. 22:22, 1 July 2015 (UTC)[reply]

Article by topic[edit]

As I'm a wikignome, the list of article was quite irrelevant (fr:Mal du siècle, fr:Acousmatic sound, fr:Mécanisme de la physionomie humaine, fr:Philosophy of love, fr:Politician's syllogism) I would be interested by a list by topic such as math or computer science Xavier Combelle (talk) 16:17, 26 June 2015 (UTC)[reply]

Hey Xavier Combelle, it seems like the method picked up on topics outside of math and cs in your recent edit history. Increasing the window of edit history might help address the issue. Also, the predicted affinity you would have for the articles we sent you were rather low. It would be best to set a threshold for predicted affinity and not send the recommendation if they fall below the threshold. Thank you for your feedback, --Ewulczyn (WMF) (talk) 00:31, 28 June 2015 (UTC)[reply]


Il est vraiment consternant de voir que des gens dépensent du temps, de l'énergie et de l'argent à la WF pour pondre des choses aussi débiles, alors qu'il y aurait tant d'améliorations utiles à réaliser. On m'a proposé cinq articles sur des sujets absolument mineurs (et certainement ni importants ni "populaires" [sic !]) appartenant à des domaines pour lesquels je n'ai pas d'intérêt ni de compétence particulière. De toute façon, je suis assez grand pour savoir quels articles je veux créer et je préfère les créer de toute pièce plutôt que de faire une traduction. En accord avec beaucoup de remarques ci-dessus et notamment avec celles d'Indeed. Hadrianus (talk) 16:29, 26 June 2015 (UTC)[reply]


I agree with most of the post of Ælfgar, except that in the 5 suggestions I have received, 2 were good (en:Service de police de la Ville de Laval, because of geographical proximity, and en:Grand design spiral galaxy, because astronomy is my favorite topic), one were ambigus (en:Spiral (disambiguation). Disambiguation pages from one language to another is often difficult to link. I think you must avoid to suggest translation of disambiguation pages) and 2 were totally incorrect (en:Olivia (fictional pig) and en:Bradford Thomas Wagner, I have absolutelly no interest in these topics).

Have you think about a collaboration with TOTW ?

You can send me emails again to continue this experiment. Simon Villeneuve 17:30, 26 June 2015 (UTC)

Hi Simon Villeneuve. Thank you very much for your feedback. You found a problem with the way we were filtering for disambiguation pages, we were looking only for disambig template and not disambiguation which is the one used in the page recommended to you. We fixed that in our codes. Thank you.
Re the actual recommendations: I'm not sure if you were in the random or personalized group and I can't look into it to avoid impacting the results. I've made a note about your comment. We will look to see what the reason for that is once we conclude the analysis.
Re TOTW: we had not thought about it although we want to work more closely with those who are interested in this project as soon as we know the algorithm works and we can allocate some more resources to it. Thanks for the pointer.
Thank you for offering your time for future contacts. Really appreciated. --LZia (WMF) (talk) 22:50, 26 June 2015 (UTC)[reply]
I'm not sure what [3] means, but you should not be looking at templates at all. We're using mw:Extension:Disambiguator since a while ago, so you can and should use page_props (or the web API). --Nemo 23:04, 26 June 2015 (UTC)[reply]
@Ewulczyn (WMF): can you look into what Nemo explained here and let us know how we will be changing the code? This can wait until Monday. --LZia (WMF) (talk) 01:08, 27 June 2015 (UTC)[reply]


I don't like mail spam like this but there is something more to it. I think the objective of your research project is not useful but detrimental to Wikipedia. One of the best things in Wikipedia is that all language communities took their own road. I'm very glad that articles upon the same subject in different Wikipedias are essentially different in contents, structure and presentation. You get diverse perspectives on a subject. Your idea, however, is a step to uniformity. I don't like that at all und would be glad if you would stop it.--Mautpreller (talk) 18:29, 26 June 2015 (UTC)[reply]

Hi Mautpreller, and thanks for your feedback. I agree that diversity is one of Wikipedia's great strengths. In the best of all possible worlds, every language would create all content in a fairly independent manner. However, there is only limited peoplepower to create that content, and sometimes important content is missing in languages for which that content would be relevant. For instance, HIV/AIDS has no article in Ewe or Hausa. We think that having an article about these topics that's translated from another language is better than not having such an article at all. English/French is just our initial test case, but we think the real benefit will be had on smaller language versions, such as Hausa or Ewe, with less content and less peoplepower to increase content. --Cervisiarius (talk) 19:21, 27 June 2015 (UTC)[reply]
I agree that in some cases a translation makes sense. But I strongly disagree that this is a question that could be answered by "research", and even less so by this kind of research. It's simply a question of communities. I'm rather sure that a good article about AIDS in Hausa should be written in another way than in English. And why should the English article be the point of departure for such an article in Hausa? Why not the French, German or Swahili one? Besides, lists of "important articles that should exist in every Wikipedia" do exist, everybody can read them. No use in nudging people.--Mautpreller (talk) 19:43, 27 June 2015 (UTC)[reply]
Mautpreller, I agree on everything (see the other sections on language pair etc. for reasons and remedies) but "No use in nudging people" is incorrect. Several things went a bit wrong here, but letting people know of certain opportunities they're likely interested in is good: the hard question is how to make sure they appreciate what they get and they are not pushed in some biased direction (e.g. anglocentrism, or non-notable topics, etc.). --Nemo 20:04, 27 June 2015 (UTC)[reply]
For Hausa, Arabic or French would be better source languages. The translation tool is language-agnostic: so it can be used by people with any language pair. I agree we don't want uniformity. We do want to improve articles with available free knowledge, including related knowledge in other languages; if an article exists in much more detailed form in a different language, I'd like to know that while editing or browsing. [I would also like somehow to see the set of all references used in any language: to encourage diversity of refs] And nudging can be useful, even fun, when done in a welcome way. SJ talk  06:59, 29 June 2015 (UTC)[reply]
Hm ... if a person whom I estimate as an author asks me to take part or help in an article, that's fine with me. In this case I can understand the motives and, more important, I can say this is almost certainly a reliable person with a realistic plan. If someone asks me about a problem or if I see that someone asks a question on the reference desk, I also can say, fine with me, as I can understand the question and its motives. This is not at all the case with an e-mail like that. This is neíther useful nor fun to me. About "detailed articles": I am happy to write German articles (there is enough to do for many years), and I always look what other language versions have to say on a subject. But the job of developing an article concept that really works is hardly to be done in a foreign language by me, even though I think I do speak English reasonably well and French moderately well. I am afraid that all these translation tool things greatly underestimate the skill and labour you need to write a good article, in terms of article concept, linking, cooperation, connotations, and so on. The people who develop these tools are, in my opinion, too much preoccupied with data and "information" and do not address the real problem, which is a good text that fulfills its task. This is always hard work that has to be done with care and feeling. I am afraid that a superficially translated text will do more harm than good as it will often be of poor quality. --Mautpreller (talk) 15:40, 29 June 2015 (UTC)[reply]

Non admissible.[edit]


un contributeur a créé une page qui lui a été conseillé par un email. Malheureusement cette page existant sur wp.en (en:Brendan Lemieux) n'est pas admissible sur fr ... C'est dommage car :

  1. le contributeur (de bonne volonté) créée un article qu'on lui a suggéré de créer
  2. un contributeur du projet:Hockey sur glace fait une demande sur SI pour cause de non-admissibilité (ici)
  3. un opérateur de passe par là, supprime la page mais se fait engueuler.

--TaraO (talk) 19:02, 26 June 2015 (UTC) (fr:Utilisateur:TaraO)[reply]

This issue is really interesting. It is obvious that the research project people didn't even think about a thing like cultural diversity. There are a lot of differences in handling procedures like deleting between the Wikipedia language sites. This fits in with my post above, under "Objectives". I am afraid this is simply a case of cultural insensitivity on the research project side (or even "language imperialism" as said by another user?!). And this in the form of unsollicited spam mail. Not a good idea.--Mautpreller (talk) 19:16, 26 June 2015 (UTC)[reply]
As a service: TaraO just told you that an editor did translate an English article about an ice hockey player into French. However, it was requested for deletion by another user, a participant of the ice hockey project in French Wikipedia. The article was deleted (because in fr.wp, ice hockey players who are not professionals are deemed not relevant). The translator was somewhat annoyed as you will surely understand.--Mautpreller (talk) 19:28, 26 June 2015 (UTC)[reply]
TaraO, Mautpreller: thanks for flagging this. I'll reach out to the translator to explain the situation. My team is well aware of the difficulties of providing recommendations that are relevant to both languages, taking into account cultural diversities and different notability criteria. Any contribution to improve the selection criteria for articles to be translated which we may implement in future tests is very welcome. --Dario (WMF) (talk) 20:07, 2 July 2015 (UTC)[reply]

Une bonne idée ? Et la forme....[edit]

  • Je trouve la méthode déplacée?
  • Rien qu'à la lecture je crois comprendre pourquoi ces articles n'existent pas en français. et que ce passe-t-il si ils ne sont pas créés ? Il faudra faire un concours? Embaucher des salariés ? Créer un robot ?
  • Je suis sur que les catalans, les basques , les bretons, les alsaciens seraient enchantés que nous produisions une liste des articles qu'ils "devraient introduire dans leur espace wiki"....Je prose que le wiki en traduise le "tigre bleu de l'Euphrate" par Laurent Gaudé qui a traumatisé de nombreux français et fait beaucoup de bruit dans les médias. De bien penser à inclure dans l'article Obahma la conduite des "grandes oreilles" de la NSA vis-à-vis des présidents de la République Française et j'ais bien d'autres exemples. Cordialement Gérald Garitan (talk) 20:32, 26 June 2015 (UTC)[reply]

Here's a quick translation for the project's members:

  • The method is not welcome.
  • Reading those articles help to understand why they do not exist in WPfr. What happens if all WPen are not translated? Should there be a competition? Should we hire paid employees? Or build a robot?
  • Surely Catalans, Bretons, Alsatians speaking users would be enchanted if the French speaking users told them "this article should be added to your WP"... WPen users probably wouldn't understand the present "tigre bleu de l'Euphrate" controversy in France, so why should WPfr import information from WPen that is irrelevant for French speaking users? (I don't translate the part about the NSA, which is a bad example, because the subject is treated in both WPfr and WPen) --Flappiefh (talk) 09:14, 27 June 2015 (UTC)[reply]
Merci de votre opinion, Gérald. C'est important pour nous d'écouter ce qu'en pensent de notre projet les utilisateurs. Juger si un article est important pour Wikipédia dans une certaine langue est une tâche difficile et intrinsèquement subjective -- c'est dur même pour un humain, et encore plus pour un algorithme ; on ne peut donc pas expecter que ni humain ni algorithme y soit parfait. Notre méthode statistique pour estimer l'importance d'un article non-existant est basée, parmi d'autres facteurs, sur le contenu de l'article dans les langages dans lesquels l'article existe (plus de détails ici, en anglais). Nous pensons donc que ce serait possible d'offrir des recommandations utiles aux utilisateurs catalans, basques, bretons et alsaciens (et peut-être, un jour, même gascons ;)), et nous pensons qu'offrir des recommandations est meilleur que ne pas en offrir. En outre, il est important de souligner qu'il s'agit, voici, d'un système de recommandation, qui a été developpé pour aider les utilisateurs à trouver des articles dont ils pourraient s'intéresser à la traduction. Cela ne veut pas dire que les articles proposés soient les plus importants dans un sens définitif et objectif (car ce sens objectif n'existe pas). --Cervisiarius (talk) 18:55, 27 June 2015 (UTC)[reply]

Alternative sets of articles[edit]

Just throwing the idea: if at some point or for some reason it comes handy to reduce the "universe" of articles on which to run the algorithm and send messages, it would be nice to try one or more of the Mix'n'match sets. Some of those sets may be more appropriate for certain language pairs than others, but they all have some degree of usefulness; the most useful are the completed ones of course, where the non-notable items have been discarded. --Nemo 21:50, 27 June 2015 (UTC)[reply]

Even better: "Not in the other language" (also by Magnus Manske) This tool looks for Wikidata items that have a page in one language but not in the other. And the user can choose a category tree! --Atlasowa (talk) 00:29, 28 June 2015 (UTC)[reply]
If I had a shiny new article for every time Magnus has preemptively solved a problem... What a fine tool, thanks for sharing. SJ talk  06:45, 29 June 2015 (UTC)[reply]
I know that tool and it was already recommended for ContentTranslation (though it's impossible to find with Phabricator's search; one related task is phabricator:T96147), Developing_new_language_editions_of_Wikipedia etc. However, that's a recommender system of its own. What I'm suggesting here is a restriction of the set of articles on which to run this recommender system, as opposed to running it over the whole 5 millions articles of the English Wikipedia (most of which are necessarily cruft). --Nemo 08:22, 29 June 2015 (UTC)[reply]
Thanks Nemo and others for your comments. Nemo, to be clear: we did not consider all the missing articles of enwiki for the frwiki test. We chose the top 150K or 300K (Ewulczyn (WMF) can confirm the number). Still, I'm with you that this is a big set and sometimes we may want to consider smaller sets like Mix'n'match. It didn't cross our mind earlier, thanks for pointing it out. One issue we should be aware of is that although not always true, generally speaking the smaller the set of articles the algorithm can choose from, the smaller the affinity of editors for what they are recommended will be (personalization will become more problematic). We can control for this by pre-selecting a smaller subset of editors and/or providing interesting information for editors, for example, the recommended list comes from Mix'n'Match. --LZia (WMF) (talk) 05:40, 2 July 2015 (UTC)[reply]
Thanks. Reaching 150k titles should be feasible with the currently matched items of mix'n'match, but it's easy to improve matching if there is an expectation of usage.
I'd love to help such a test for the Italian Wikipedia: I'd recommend using DBI, SBN, BEIC, PG and "Women in science"; ideally with pairs such as it-de, it-fr, it-es (or all romance languages if you want to do many-to-many); DBI and SBN matches can increase matches of some tens of thousands in few days if needed, BEIC is already complete. --Nemo 08:14, 2 July 2015 (UTC)[reply]

Ranking Missing Articles[edit]

Înitialy I came here to complain abaut spam and why you ask sombody without any knowledge of French, in partikular stated by not having an french babel. But I see this is already taken care of.

What I now wonder is why you are programming a heuristik to find missing articles and not use the pages where people have already indentified a huge amount of them, in part already with the en-link. In the French Wp I belive this would be this page, or more precise the many sub pages of: fr:Wikipédia:Articles_à_créer. Please note the interwikies, most language have such a page.--Fano (talk) 20:51, 30 June 2015 (UTC)[reply]

Hi Fano. Thank you for writing a feedback here, and sorry that you were one of the editors who were contacted by mistake. We apologize for that.
Initially, we considered Lists of articles every Wikipedia should have. The issue we faced with that list was that major Wikipedias were only missing around 100 of those articles, and from that 100, some of them we could see that were not missing but were not correctly linked in Wikidata. Your suggestion is very helpful. I've made a note of that and we will consider those lists for future iterations. There are couple of potential advantages: since they are curated by editors, the notability is more likely to be taken care of. The article title is also translated to the destination language which has been one blocker for us to recommend redlinks as opposed to CX tool links. --LZia (WMF) (talk) 05:05, 2 July 2015 (UTC)[reply]

Je ne pense pas, qu'il y a des gens qui font moins que 0,05% des ses modifications dans la francophone Wikipedia et qui veulent traduire des articles en francais![edit]

Je parle un peu de Francais, mais ce n'est pas tres bien. Le seul language ou je vais contributer d’articles est allemand. En francophone Wikipedia j'ai seulement entré des interwikis et des images et effacer des images des espèces qui sont dans l'article par erreur.

Mon nombre total de modifications: 217 409 100%[4] 195 245 89,8% 20 179 9,3% 747 0,3% (seulement des corrections mineurs) 313 0,14% 114 0,05% - je ne sais pas la langue! (seulement des corrections mineurs) 97 0,04% - seulement des questions 80 0,04% - je ne sais pas la langue! (seulement des corrections mineurs) 69 0,03% (seulement des corrections mineurs) 65 0,03% (seulement des corrections mineurs)

Je ne pense pas, qu'il y a des gens qui font moins que 0,05% des ses modifications dans la francophone Wikipedia et qui veulent traduire des articles en francais! --Kersti (talk) 05:36, 28 June 2015 (UTC)[reply]

@Kersti Nebelsiek: vous avez tout-à-fait raison, vous n'auriez pas dû recevoir une invitation à traduire des articles en français et on vous a contacté par erreur. On explique ici les changements qu'on a mis en place pour éviter ce problème. Nous sommes désolés pour la gêne occasionnée par ce problème. --Dario (WMF) (talk) 19:54, 2 July 2015 (UTC)[reply]

Medium frequency and Medium wave[edit]

  • Medium frequency
  • Medium wave
  • Frequency band of 300 to 3000 kHz
  • Hectometric band

Hello in France: Medium frequency and Medium wave are merged in March 2008 because the item is the same.
Bonjour en France : moyenne fréquence et ondes moyennes sont fusionnés en mars 2008 car l’article est le même.
--F1jmm (talk) 15:14, 28 June 2015 (UTC)[reply]

Fusion des articles moyenne fréquence et onde moyenne

Fusion des articles moyenne fréquence et onde moyenne l'article onde moyenne est de 5 lignes, sans historique, l'article moyenne fréquence est complet et parle déjà des ondes moyennes sans ambiguités--Michco 4 mars 2008 à 17:08 (CET)
blanchi onde moyenne, reste à faire un redirect--Michco 4 mars 2008 à 22:11 (CET) Fait.--V°o°xhominis [allô?] 6 mars 2008 à 16:01 (CET)

--F1jmm (talk) 15:14, 28 June 2015 (UTC)[reply]

Hi F1jmm. Thank you for your feedback. This one is tricky to identify given that there are two Wikidata items associated with them, and some languages do have two separate articles for these concepts. I'm not sure how we can identify similar cases. Do you have a recommendation? One idea we have discussed is that the dashboard that shows the recommendations should also show the top search results in the destination language for the article title (after the user translates the title). This way, it is easier for the editor to verify if the article already exists, maybe as part of another article, or not.--LZia (WMF) (talk) 05:23, 2 July 2015 (UTC)[reply]

Q1931155 and Q466814[edit]

Q1931155 and Q466814 and
item is the same ?
--F1jmm (talk) 20:26, 28 June 2015 (UTC)[reply]

Hi F1jmm. I'm looking at enwiki and it has two articles on the two topics. That's at least one reason there are two Wikidata items. --LZia (WMF) (talk) 05:12, 2 July 2015 (UTC)[reply]

Selection of users as translators[edit]

Selection algorithm[edit]

I just received the email, and I have to say that the your phrase that «seeing my contribution history you think I'm an excellent candidate for contributing those articles» wasn't realistic. I would like to know how it “chose” me (it's ok to do so privately). Most importantly, note that:

  • Currently I don't have babel templates on the source nor the target language (maybe you scrapped them from commons?).
  • I haven't “made at least one edit in both the source and target Wikipedias within the last year”

The suggestions weren't bad, though.

Platonides (talk) 23:36, 25 June 2015 (UTC)[reply]

Same applies to me. I do have babel but it's only British English, not French. I do have a French account but I've contributed minimally. GreenReaper (talk) 23:40, 25 June 2015 (UTC)[reply]
Hi Platonides, Hi GreenReaper. Sorry if you have been chosen by mistake. The way the editors were identified for the current test is explained here (We've edited the paragraph slightly since you read it Platonides since there was a problem in the way we explained the process.) We have heard clearly that we should improve the way we identify editors. We will post our current thoughts about how we can improve this to the research page in the next couple of hours. --LZia (WMF) (talk) 21:01, 26 June 2015 (UTC)[reply]
Same for me (including "The suggestions weren't bad"). The updated editor selection algo is still far too greedy. --Rainald62 (talk) 22:14, 27 June 2015 (UTC)[reply]
Hi Rainald62. We're currently considering the use of SUL data and increasing the threshold for language proficiency based on Babel template to 3 or more. We are also doing some tests based on the current data collected to find better threshold. Do you have other recommendations for things we can consider to improve the selection? --LZia (WMF) (talk) 20:09, 2 July 2015 (UTC)[reply]

Could I sign up for this, but from enwiki to simplewiki?[edit]

I do not know French that well, and I recently got an email stating that my assistance would be helpful in translating English to French. I think it would be much better if I could get alerts translating from enwiki to simplewiki, as I can actually do that. Chess (talk) 02:41, 26 June 2015 (UTC)[reply]

Hi Chess. Thank you for taking the time to write to us. Apologies that you received a message in a language you are not comfortable with. We have fixed this problem. You can read more about the latest lessons learnt here.
Regarding enwiki to simplewiki: that's the language we did the first test of the algorithm on to make sure everything works. In the future, we want to have more language pairs available. Since the algorithm is agnostic to the language pair choice (as long as there is enough data in both pairs). I hope we can have a place where we can expose enwiki to simplewiki and other language pairs not too far in the future. Thanks again for your feedback.--LZia (WMF) (talk) 21:57, 26 June 2015 (UTC)[reply]

Email is OK, French is not …[edit]


unlike several others here, I don’t mind being contacted by email – in fact, I prefer it and think it’s the right medium for a project like this.

My problem, however, is that I was contacted in French, a language I don’t speak. I have no idea how why I was addressed in French – does this project only target the French Wikipedia?

I’m German, and most of my contributions are to the German Wikipedia, although recently, I have started to contribute to the English Wikipedia, too. I once edited an article in the French Wikipedia where I corrected an incorrect date of death in a biographical article – easy to do without understanding much of the language.

So there seems to be some issue with choosing the correct language.

--Uli Zappe (talk) 03:10, 26 June 2015 (UTC)[reply]

Ditto, except I'm an Australian speaking only English. I've made exactly one non-user page edit on the French Wikipedia. Mark Hurd (talk) 04:15, 26 June 2015 (UTC)[reply]
I'm a photographer, i probably edited french (and english) wikipedia by including pictures, but i also could never ever edit an article in french. --Ailura (talk) 06:16, 26 June 2015 (UTC)4[reply]
I was surprised too being contacted in French and was looking on how to set my preferred language. I don't mind getting emails, but I would expect them in Dutch (my native tongue) or English. Mbch331 (talk) 06:47, 26 June 2015 (UTC)[reply]
Thanks a lot for these feedback. Something went wrong, and we are looking for it!
Sorry about this, Trizek (WMF) (talk) 09:12, 26 June 2015 (UTC)[reply]
There are more German language users who report that they were contacted in French, and were initially wondering whether this is some kind of phishing email due to no name at the bottom and m:Wikimedia Research Team being an archived page. --AKlapper (WMF) (talk) 09:22, 26 June 2015 (UTC)[reply]
Hi AKlapper (WMF). We need to get to the bottom of no-name email. All emails should have a signature that tells the name of the team, the Foundation's address and phone number. If any email is missing that information, we need to figure out exactly why. Can you help me with identifying who has received emails with no name? Thank you! --LZia (WMF) (talk) 21:28, 26 June 2015 (UTC)[reply]
@LZia (WMF): Sure, how can I help? The thread I linked to was initiated by user Pp.paul.4 (talk · contribs) who you might want to contact? --AKlapper (WMF) (talk) 16:23, 29 June 2015 (UTC)[reply]
@AKlapper (WMF):, looking at the thread again, the thread shows an email that is signed by the Research team and the Foundation's contact. It's not a no-name email.
@Pp.paul.4: is there something I'm missing? I understand the email being in French to non-French speakers was a mistake and I apologize for that. However the emails should all have a signature, similar to the one you have copied to the thread.
Hi everyone. Thank you for your feedback and our sincere apologies that you were contacted while you were not a French speaker. The way we identified users for the current test is explained here. Through the feedback we have received we have learned that the approach used needs to be improved. We have a plan for how we improve this step that we will share in the research page in the next 1-2 hours. Please check there to learn more. If you have comments for improvement on the changes we suggest, please share those with us. We would appreciate them.
Again, sorry that the initial method did not work smoothly, and thank you for writing your feedback. --LZia (WMF) (talk) 21:25, 26 June 2015 (UTC)[reply]

Bug with imports[edit]

I haven't edited frwiki in the past year (and I don't speak French), but a page I edited on enwiki was imported there. This apparently was enough for me to be listed as having edited it, since I got the email about translating articles for it. Can this be changed? Jackmcbarn (talk) 03:38, 26 June 2015 (UTC)[reply]

Yes, there is a bug with the imported revisions, which are expected to have rev_user = 0. See phabricator:T9240#1393969; I'm not able to check whether the issue already has its own bug report due to the lack of a functioning issue tracker search. Maybe This knows. --Nemo 09:43, 26 June 2015 (UTC)[reply]
I wasn't aware of this issue. Is it WMF cluster specific? I would suggest to file a task, and if we ever find that it has already been filed, there is no harm in merging it. (That is my policy, anyway, until Phabricator gets a proper search tool.) This, that and the other (talk) 10:22, 26 June 2015 (UTC)[reply]
Hi Jackmcbarn. Thank you for writing a feedback. It's appreciated. The bug you mentioned is definitely a problem but it seems even if we had it solved that would choose you as a candidate in the test since you had changed image links in January 2014. This second issue is something we will be fixing in the future tests. We have addressed it in lessons learnt. Thanks again for your time and feedback. --LZia (WMF) (talk) 22:36, 26 June 2015 (UTC)[reply]

Translator language pair selection is faulty[edit]


Thanks for your email. I'd be happy to help with this project, but you've incorrectly identified me as someone who translates from English to French. In fact I work in the opposite direction, from French to English. It is a common misapprehension that the translator is "perfectly bilingual" (whatever that might mean) and therefore can translate in either direction. But in fact, for serious purposes at least, professional translators generally work only into their mother tongue.

I have come across several articles that exist only in French that might be translated to English, but I must confess I have done nothing about it. I'd welcome suggestions in this FR>EN pair, and a streamlined process for the translation that automates the implementation of all the necessary wiki features. LaFolleCycliste (talk) 08:37, 26 June 2015 (UTC)[reply]

Hi LaFolleCycliste, and thank you for your feedback.
For the moment, the algorithm is only tested from EN to FR for research purposes. Feedback collected will be used to improve it. The objective is to create a relevant and useful algorithm, of course, add other languages bidirectionally.
Concerning the tool used to translate (called "Content translation"), you can have more information about it and how it works on this page (or this one in French).
Best, Trizek (WMF) (talk) 09:03, 26 June 2015 (UTC)[reply]
How was the pair selected? Why didn't you try Spanish->French instead, for instance? Among other benefits, that would also have Apertium suggestions, as far as I can understand. --Nemo 09:47, 26 June 2015 (UTC)[reply]
Hi Nemo. The pair was selected based on the amount of content available in the source and destination languages, the ease of eye-balling results for sampled manual QA, and the assumption that the set of editors who speak both French and English is large. We did not consider all pairs possible before making this choice, though FR <-> ES is a good pair, too (except our eye-balling process would be slower and set of possible articles would be more limited). --LZia (WMF) (talk) 22:25, 26 June 2015 (UTC)[reply]
Ok, thanks for the answer. --Nemo 20:22, 27 June 2015 (UTC)[reply]

Ich sprech kein Wort französisch und nur sehr bedingt englisch[edit]

Warum also werde ich mit so einer Mail belästigt? Ich musste sie mit Google übersetzen lassen. Gruss --Nightflyer (talk) 18:32, 26 June 2015 (UTC)[reply]

Das Team entschuldigt sich, falls die Email als lästig empfunden wurde. In der Tat ist eine der wichtigsten Lektionen aus dem Nutzertest, daß die Methode zur Erkennung von Französischkenntnissen noch sehr verbesserungswürdig ist. Alle Nutzer, die einmal frwiki editiert haben, wurden in den Kreis der potentiell angeschriebenen Nutzer aufgenommen, doch ist nun nachträglich sehr klar, daß dies nicht ausreichend war, da viele Nutzer Wikipedia in Sprachen editieren, die sie nicht sprechen (z.B. um Bilder einzufügen, Text zu formatieren oder Interwikilinks zu aktualisieren). Entschuldigung nochmals, und danke für das Feedback! --Cervisiarius (talk) 17:36, 27 June 2015 (UTC)[reply]


I am a little bit confused what research means here. Research could be that you want to know something better than before. Some of your statements seem to go into this direction, as e.g. the question whether there is a difference between random and personalized suggestions. This could be "pure research". I think that the questions of scale and selection bias are related to this idea. However, other statements seem to point to a different kind of research: applied research. Here these questions simply do not matter. You want to find a way how more multilingual users can and will write translations. In this case, selection bias is something good (and not bad) since you have to find out which users are willing and able to do this and you have to find out how you best address and support them (and not others who are either not willing or not able). There is a third kind of research which you obviously didn't have in mind: en:action research, research "by doing", blurring the boundaries between researchers and "research subjects". This kind of research would be very suitable to communities like the Wikipedia communities and is indeed often applied in community psaychology, but it requires that the "research subjects" have a voice (and, ideally, an equal voice) in the selection of targets, methods, and conduction of research.

However, all kinds of research have their ethical rules. A general rule is that you should know about the research and its scale and targets and methods beforehand, and also about the possible consequences. Action research implies particular preconditions: it demands that the decision how to proceed is not entirely yours but (to a variable degree) also the "research subjects'" decision. This is not only an external condition, thought out by a group of ethics specialists. Rather it is a vital condition in order to achieve a relationship of trust which is indispensable for community research. I am not convinced that you have thought this through.--Mautpreller (talk) 08:44, 2 July 2015 (UTC)[reply]

Mautpreller, next up is Linkspamming Wikipedia as "experimental research evaluation". More links to Justin Bieber, yay! --Atlasowa (talk) 13:52, 2 July 2015 (UTC)[reply]
Wow! Sounds like a lot of action but hardly action research. I have to think about it. Which of my articles could do with a Justin Bieber link?--Mautpreller (talk) 18:45, 2 July 2015 (UTC)[reply]

Notability as a proxy for importance[edit]

Based on the French Wikipedia test's feedback, one of the aspects of the research that was identified to be improved is the way we measure the importance of missing articles in the destination language. For that test, the articles were sorted by their predicted pageviews but pageviews do not capture notability though they may be considered as one signal for notability.

What are the characteristics of a notable article? How can we define notability? Ideally, we should start with a simple definition and add more complexity to it as we get a better handle of the measure. --LZia (WMF) (talk) 15:52, 6 July 2015 (UTC)[reply]

  • Notability is only tested upon a deletion request. The fact an article exists doesn't imply the article itself (or the subject in general) is notable; it may just not have been inspected yet. So, you could see how similar an article is to the set of deleted articles, or how big a portion of its related articles (outgoing and incoming links?) were deleted. If you use deletion, there's also the nice advantage that you can speak of "deletion likelihood" instead of "notability"; the claim you can automate the latter would upset a lot of people especially upon the inevitable mistakes. ;) --Nemo 20:09, 6 July 2015 (UTC)[reply]

Presentation at WMF Metrics and activities meetings/2015-09[edit]

15 minutes presentation
  • When: 3 September 2015 starting at 18:00 UTC (11:00 Pacific Daylight Time)
  • Where:
    • Physical location: 5th floor, Wikimedia Foundation office, San Francisco.
    • IRC participation: log can be downloaded here
    • Video recording: YouTube / Commons
WMF Metrics & Activities Meeting September 2015

Apparently WMF considers this experiment on Wikipedians a success. More "personalization" by algos coming to the guinea pigs to Increase article coverage to WMF wishes, no explicit consent needed! --Atlasowa (talk) 12:54, 10 September 2015 (UTC)[reply]

See also "Worked on the backend to provide suggestion to translate. Suggestions feature is soon coming to the Content translation dashboard and phabricator:T111901. --Nemo 19:16, 10 September 2015 (UTC)[reply]
Hi Atlasowa. I do hope that we can have more of the constructive discussions in this page. Please help me achieve that hope, and please assume good faith. We need to build more trust to be able to move forward in a constructive way.
To your point about "no explicit consent needed!": There was no discussion of whether we will seek consent or not in that presentation. The presentation focused on the research results of the test. I have heard very clearly that some editors did not like to be contacted by email (while some specifically said they did like the email or they didn't mind) and as the person who leads this research I'm committed to make sure those voices are heard, at least for the purposes of the research I'm responsible for, and that I do as much as I can to address the concerns/wishes. There are couple of specific steps that may be of interest to you:
  1. We are working on a tool that the editors can choose to pull recommendations from. This is not a finished tool, I'm sharing it with you so you can check it out and see that it's real. (We also have started to address the issues raised around language imperialism by offering more language pairs in the tool.)
  2. We did not do further tests with emails when we didn't see it's necessary to do so. This is not something to celebrate, but it's something to mention so we're clear about how these decisions are made. We did not take the question of email or not for frwiki test lightly to begin with (although we made a mistake in the selection process that made things harder for quite a few people as discussed earlier in this page), and I never had the intention of sending out more emails unless we really had to. To this point, although we had in our goals for the summer quarter to run an experiment in eswiki, I did not call for us going ahead with that simply because we got all the results we needed from the frwiki test and there was no reason to contact more editors (via email or not, with consent or not) further.
  3. As Nemo mentioned, we are working with the ContentTranslation team to have recommendations as part of the CX tool. This is again to make sure that editors can "pull" recommendations when they choose to as opposed to we "pushing" recommendations to them.
If you have more recommendations for us, please keep them coming here. Thanks. --LZia (WMF) (talk) 18:52, 5 October 2015 (UTC)[reply]
I tried to find recommendations interesting to a librarian, so I entered en→it similar to "National Central Library (Florence)", but I got quite irrelevant suggestions. I suspect the recommender works well only with articles which have hundreds users: with "Internet Archive" I got better suggestions. With "Benjamin Martin" I get only disambiguation pages. ;-) I can't search several of the most popular BEIC-created articles because either they are available only in French, or are only in Still, a very curious tool! --Nemo 21:46, 5 October 2015 (UTC)[reply]
Thanks for checking it out, Nemo.
  • For "National Central Library (Florence)", what did you expect to see?
  • @Ewulczyn (WMF):, is it intended to show disambiguation pages? I remember we discussed not to include them early on after the frwiki test but you may have used a different logic for including them. Nemo brought up "Benjamin Martin" as the seed and how that results in disambiguation pages.
We do not try to prevent users from using disambiguation pages as seed articles. The Benjamin Martin article is a disambiguation page. The system recommends articles that also look like disambiguation pages but are not marked as disambiguation pages in the page_props table. Ewulczyn (WMF)
  • @Ewulczyn (WMF): Can you comment on Nemo's observation on BEIC-created articles? I /think/ since we changed the algorithm's definition of importance, if an article only exists in one language, it has a lower chance of being recommended, unless, for example, it's being read by many people from different countries? Is that intuitively what's happening here?
If you want to use articles that only exist in frwiki or itwiki, then you need to use those languages as a source. The tool currently does not provide either of these options Ewulczyn (WMF).
--LZia (WMF) (talk) 23:26, 5 October 2015 (UTC)[reply]

What about French users who responded "yes" as translators to Wikimedia Research?[edit]

I am a user mainly working for When I received the email Aidez à améliorer l'exhaustivité de Wikipédia en français (26.06.2015), I sent a feedback with my choice in the list of 5 recommendations propositions: Picasso and the Ballets Russes that seemed to me a very interesting translation to do (28.06.2015). I sent my translation in French of this article, under the title Picasso et les Ballets russes, to Wikimedia Research, to Robert West and Leila Zia (31.08.2015). What about the work of the users who said "yes"? Buster Keaton (talk) 18:04, 24 November 2015 (UTC)[reply]