Talk:Privacy policy/Archives/2014

From Meta, a Wikimedia project coordination wiki

NSA, FISC, NSL, FISAAA, PRISM...

The following discussion is closed: Closing first couple sections given staleness, and apparently completeness will archive these sections when last two are done. Jalexander--WMF 22:23, 18 December 2013 (UTC)

The WMF and many people with access to nonpublic information (like (for users with accounts) their IP addresses and possibly their email addresses) are subject to the contradictory laws of the USA. The WMF and many people with access to nonpublic information may be required to make such information available to unaccountable agencies while being legally restrained from telling them that the information was shared. Admitting new information sharing mechanisms, or even just the requests may result in imprisonment without trails, without access to the laws leading to imprisonment, or even transcripts of the decisions, evidence, or who their accusers were.

Until the WMF and people with access to nonpublic information remove themselves from such jurisdictions, the guarantees in the WMF's privacy policy, the access to nonpublic information policy, the data retention guidelines, the transparency report, and the requests for user information procedure, are untrue.

To service campaign contributors, your information may be given to third parties for marketing purposes.

Your data may be secretly retained by the WMF for as long as required by US agencies, and/or by those agencies themselves for as long as they want.

The WMF may be prevented from revealing their actual policies but forced to claim that they protect users' privacy per their public policies. -- Jeandré, 2013-09-04t12:47z

See also Talk:Privacy policy/Call for input (2013)#Technical and legal coercion aspects.

Hi Jeandré, while I'm someone who knows for a fact that we would strongly rebel against secret requests and unreasonable demands from the government (any government) I'm certainly sympathetic to these concerns (I think much of what the US government has done is illegal and immoral). That said I have yet to see where we could 'go' to remove everyone from jurisdictions where this (or other equally bad issues) would be a problem. Europe, for example, is generally not better, it has significant issues as well. Jalexander (talk) 20:07, 4 September 2013 (UTC)
As far as I know, the voters in New Zealand and Iceland care about doing the right thing, and don't have the same kinds of laws as the USA and UK. -- Jeandré, 2013-09-05t09:27z
Les lois européennes sont infiniment plus protectrices que les lois américaines. Pourquoi croyez-vous que les grosses sociétés informatique (Google, Micro$oft, Apple, etc.) essaient d'imposer, heureusement sans trop de succès (voir les quelques affaires récentes, par exemple entre Google et les CNIL européennes) , que ce soit le droit américain qui s'applique au détriment du droit européen ? 78.251.243.204 20:18, 5 September 2013 (UTC)
Et de toutes façons ce n'est pas seulement une question de quelle loi est plus protectrice ou pas, c'est une question de que les lois des différents pays doivent être respectées. Chaque pays est souverain et établit ses lois de manière démocratique, on n'a pas à lui imposer des lois qui n'ont aucune légitimité. Seuls les Américains votent pour élire leur congrès. Les lois américaines ne s'appliquent donc qu'à eux 78.251.243.204 20:21, 5 September 2013 (UTC)
Nous somme désolée, 78.251.243.204, mais nous pouvons pas échapper la juridiction du gouvernement des États-Unis en ce moment. La Fondation est sensible aux préoccupations concernant la vaste capacité du gouvernement à accéder à l'information des utilisateurs. La Fondation croit fermement en le droit des utilisateurs de participer aux projets en manière anonyme et croit en la protection de la vie privée des utilisateurs. Pour ces raisons, la Fondation recueille beaucoup moins d'informations des utilisateurs que d'autres sites comparables, et conserve l'information pour la période plus courte qui est compatible avec les objectifs de l'entretien, la compréhension, et l'amélioration des sites Wikimedia, et nos obligations au titre de la loi. Nos "directrices de rétention d'information," qui sont à venir bientôt, expliquent ces pratiques plus en détail. DRenaud (WMF) (talk) 21:48, 11 December 2013 (UTC)

PRISM etc

Not sure if this is completely on topic, please point me towards the discussion if not, this is not my area of knowledge.

  1. Is the Wikimedia Foundation subject to the same FISA laws that Microsoft, Google etc have had to comply with and give over information?
  2. If so does the Wikimedia Foundation record anything they may want?
  3. If so this privacy policy will need to reflect this.

--Mrjohncummings (talk) 16:06, 4 September 2013 (UTC)

The WMF has been very clear that we have not been contacted in relation to that. General Counsel Geoff Brigham said in a blog post that "The Wikimedia Foundation has not received requests or legal orders to participate in PRISM, to comply with the Foreign Intelligence Surveillance Act (FISA), or to participate in or facilitate any secret intelligence surveillance program. We also have not “changed” our systems to make government surveillance easier, as the New York Times has claimed is the case for some service providers." Philippe (WMF) (talk) 20:58, 4 September 2013 (UTC)
Just to add to what Philippe has said, it is our understanding of the law that we can not be forced to 'Lie' (though they can force us to not comment/confirm including while we fight for it to be released), while I can certainly understand people's concerns about "them not even being able to tell us if it's true" I really do stress that we haven't received anything and would fight like crazy if we did. Also, we're really really bad liars, we are an incredibly leaky organization. Jalexander (talk) 08:03, 5 September 2013 (UTC)
This may be a crackpot idea, but given that you cannot be forced to lie, but can be forced to keep quiet, would it be possible for somebody - perhaps in the legal department - to report on a regular basis in a regular spot that "We haven't been contacted by the US Gov't this week to provide any information on users"? Smallbones (talk) 01:05, 7 September 2013 (UTC)
"Also, we're really really bad liars, we are an incredibly leaky organization." I assume that you're joking, but if you're not, why have a privacy policy at all? (Not joking.) -- Gyrofrog (talk) 03:59, 8 September 2013 (UTC)
Given the choice between believing Microsoft/Google/Facebook/US.gov or Snowden, I'd go with Snowden every time. I think the current evidence shows that the people at Google are lying by commision because they're being forced to. While I have orders of maginitude more trust in the people at the WMF than those at Google, I think Ladar Levison's decision to shut down Lavabit and his strong recommendation against trusting organizations "with physical ties to the United States" indicates that he didn't want to lie by commision. -- Jeandré, 2013-09-05t09:27z
Appreciate the discussion. Smallbones (talk · contribs)' suggestion is that we implement what is actually the well-known Warrant canary scheme. Part of Jeandré du Toit (talk · contribs)'s excellent point is that it seems like either Google or Snowden are lying, and that if Google is lying, warrant canaries don't seem to work against the full might of the US Government. Was lavabit publishing a warrant canary? More importantly, should the WMF be doing so on a more regular basis? (the comments from Philippe & Jalexander are great for today, but not regularly made.) --Elvey (talk) 22:21, 8 September 2013 (UTC)
Even if the 2013-06-08 released slide is wrong, and organizations are not currently forced to lie by commission, but only by omision; then a warrant canary still wouldn't help if a WMF developer is asked to contravene the privacy policy (and/or the access to nonpublic information policy, the data retention guidelines, the transparency report, the requests for user information procedure) and forced not to tell the people who provide the warrant.
Every possible person with the ability to contravene these policies and who is subject to US law, would then have to provide daily warranties. I'm not actually suggesting this, because I think the 2013-06-08 released slide is correct, and organizations like Google are being forced to lie by commission.
Until this is clarified, I don't think any privacy policy from any organization "with physical ties to the United States" can be truthful unless it clearly states that it can't currently protect anyone's privacy if the powers that be come knocking. -- Jeandré, 2013-09-23t10:09z
Is it possible for anyone to verify exactly what software the WMF's servers are running and how the software is configured? It is trivial to download Mediawiki and various extensions, but is it possible for anyone to verify that the version of Mediawiki as run by the WMF isn't modified to provide information to the NSA? --Stefan2 (talk) 12:57, 5 September 2013 (UTC)
We are very transparent about our servers, how they are configured, and what they run. For example, you can see our production code and deployment recipes on Gerrit and piles of additional information on Wikitech. So I don’t think we object to transparency like that in principle. But verification that source code matches specific binaries is an extremely difficult challenge, even under relatively small and controlled circumstances where you can control every part of the build, and where you’re simply asking about a binary at one point in time, rather than on a live, running system. To do the same thing for an entire network infrastructure (not just Mediawiki, but the web server, operating system, network switches, etc.) would be effectively impossible, both in terms of difficulty and in terms of making it secure (since it would require trusted access to the live system in order to perform monitoring). Even if it were achievable, it would also make management difficult in practice: for example, we sometimes have security patches deployed that are not yet public (for legitimate, genuine security reasons), and we also have to be able to change configurations quickly and fluidly in response to changes in traffic, performance, etc., and doing this would be difficult if configurations and binaries had to be checksummed, compared, verified, etc. - LVilla (WMF) (talk) 02:05, 6 September 2013 (UTC)
Given everything that's happened, I'm not so sure I trust anyone anymore about what is and isn't watched/kept. I now assume everything is being watched/recorded/analyzed online. You can only hide in the bushes for so long, eventually you'll want to come out and play (online), so I guess you suck it up and move on. Government never tells you about it, one guys leaks it, then they move to make it more transparent and do the about face. Makes you wonder what else they're hiding, and it's sad that they have to hide it from us... 99.251.24.168 02:35, 6 September 2013 (UTC)
I understand why you are finding it hard to trust anyone, and I am glad that Stefan2 was trying to be creative about ways to increase trust. I just don't think this particular idea solves the problem. If it helps, we're trying to work on this issue; most notably right now by pushing the US government to allow more transparency from targets of national security letters. Suggestions on how else we can do that are welcome. - LVilla (WMF) (talk) 17:09, 6 September 2013 (UTC)
Of course it would be a bad idea to give anyone unlimited read access to the live servers. For example, it would allow anyone to extract any information from any database table, including information normally only available to checkusers and oversighters. Thanks, your reply sounds reassuring. --Stefan2 (talk) 19:13, 6 September 2013 (UTC)
Although I do not have any questions at this time concerning this, I wanted to thank you for addressing it in advance as it would have come to mind as I do live in the United States. Koi Sekirei (talk) 00:50, 8 September 2013 (UTC)
Prisms may still be used for disco parties. — The preceding unsigned comment was added by 180.216.68.185 (talk) 14:29, 11 Sep 2013 (UTC)

I'm probably about to be dismissed as a nut case, but I would favor simply havin g Wikipedia programmed to automatically post any government requests in an appropriately titled article. 24.168.74.11 19:33, 12 September 2013 (UTC)

It's not nutty to want more transparency on this issue, but it's impossible to do this in an actual, automated fashion, and not clear that a semi-automated process is legal. We will be pushing shortly an overall transparency report, and we plan to do that regularly in the future. Hopefully that resolves some of the concerns. -LVilla (WMF) (talk) 16:29, 30 September 2013 (UTC)

Subject to US law

I think we should expand the section on the data being kept in the USA, and therefore subject to American laws. The PATRIOT Act comes to mind, where they can and will use any data you store in the US at any point in time against you at a later date. Doesn't matter where you live. So you might not want to post that nasty anti-American rant on a talk page, it might come back to bite you in the choo-choo later... Or the DMCA. I think of a certain Russian computer scientist who could have been arrested had he came to the US to give a speach as he posted information on anti-circumvention measures (Dmitry Sklyarov) ... Oaktree b (talk) 22:09, 4 September 2013 (UTC)

While some of this may be true (though there are lots of laws in Europe and other countries which can be problematic with what you post too and the US allows) I'm not sure I understand your example. There is very little (if any) added risk to posting your anti-american rant on the talk page on an American server. There are certainly risks, but the PATRIOT act does not necessarily make it more risky (especially given the legal system and our desire to fight against demands) then many other location options. Jalexander (talk) 00:29, 5 September 2013 (UTC)

This section concerns me as well as worries me. "to comply with the law, or to protect you and others" I think most of us are aware that our freedom in all areas is slowly but steadily eroding. In many countries, there is not even a pretense at giving freedom priority over other values, while in many others it is only a pretense. I wonder if there is a country left in the world that has not put that value at the bottom of a list of many other values like security and equality. Politicians and lawyers can and will find a way to abuse that which they can abuse for their own purposes. Laws were made to facilitate the sending of millions of people into concentration camps, why should they stop at keeping knowledge sacred? "to comply with the law, or to protect you and others" That is a mightily large back door.

Well I live in Canada, and even if I do my edits in Canada, should I do something distasteful to the Americans, they can hold me at the border for some stupid reason. We also have data privacy laws here in Canada (PIPEDA), but those don't apply to Canadian data stored on American servers. My point is you're essentially at their mercy, whether you like it or not. Just so people are made to understand that. You live in country XYZ, but American law applies to your edits and any data you divulge, so beware. 99.251.24.168 02:09, 6 September 2013 (UTC)
C'est partiellement mais pas complètement vrai, je pense. Une légende court depuis longtemps qui voudrait que c'est la loi du pays où se trouve les serveurs qui s'applique. La jurisprudence n'est pas encore établie, mais pour l'instant c'est faux. Les serveurs étant situés aux EU, les lois américaines s'appliquent en partie. Mais les producteurs et les consommateurs de contenu étant dans d'autres pays, d'autres lois peuvent s'appliquer. Par exemple, pour la Wikipédia francophone, une grosse partie des producteurs et les consommateurs de contenu se trouvant dans d'autres pays comme la France, le Canada, la Belgique, etc., il est très probable que certaines des lois de ces pays s'appliquent. Par exemple, une société dont le siège et les serveurs sont localisés au Luxembourg ont été condamné à appliquer le droit français ; Twitter a été poursuivi pour ne pas appliquer les lois françaises relatives à la liberté d'expression, mais l'affaire n'est pas allée jusqu'au procès car Twitter a préféré passer un accord avec les parties civiles ; Google est attaquée par les différentes CNIL européennes pour non respect des lois européennes de protection des données personnelles, plus contraignantes que les lois américaines ; dans ces deux cas, Twitter et Google prétendent qu'ils ne doivent appliquer que les lois américaines, mais cela est fortement contesté, et on peut douter que la justice leur donne raison. Ce serait très commode pour les entreprises multinationnales, mais quelle perte de souveraineté pour les citoyens et les pays concernés ! Je n'y crois pas du tout 78.251.253.2 11:18, 6 September 2013 (UTC)
Thanks for your comment. Please see my response to a related discussion here. YWelinder (WMF) (talk) 19:42, 7 September 2013 (UTC)

Legal response

Thanks for raising this question. I’ll tackle it in two parts:

First, generally: as we say in more detail in the policy’s section on our legal obligations, we must comply with applicable law, but we will fight government requests when that is possible and appropriate. For example, unlike some websites, we already are pretty aggressive about not complying with subpoenas that are not legally enforceable. (We’ll have precise numbers on that in a transparency report soon.) We’d love to hear specific feedback on how we can improve that section, such as additional grounds that we should consider when fighting subpoenas.

In addition, we are currently working on a document that will explain our policy and procedure for subpoenas and other court orders concerning private data. We will publish the document publicly, follow it when responding to requests, and also provide it to law enforcement so that they know about our unusually strict policy on protecting user data.

Second, with regards to surveillance programs like PRISM and FISA court orders: We are subject to US law, including FISA. However, as we have previously publicly stated, we have not received any FISA orders, and we have not participated in or facilitated any government surveillance programs. In the unlikely instance that we ever receive an order, we are making plans to oppose it.

Beyond the legal realm, we continue to evaluate and pursue appropriate public advocacy options to oppose government surveillance when it is inconsistent with our mission. For example, the Wikimedia Foundation signed a letter with the Center for Democracy and Technology requesting transparency and accountability for PRISM. If you are interested in proposing or engaging in advocacy on this issue, please consider joining the advocacy advisory group. We also continue to implement technical measures that improve user privacy and make surveillance more difficult. For example, we enabled HTTPS on Wikimedia sites by default for logged in users. For more information, see our HTTPS roadmap.

As always, we greatly appreciate your input on this complex issue. Please note that if you have questions that are specific to surveillance, and not tied to the privacy policy itself, the best place to discuss those is on the Meta page on the PRISM talk page, not here.

Best, Stephen LaPorte (WMF) (talk) 00:03, 6 September 2013 (UTC)

La question n'est pas de résister du mieux possible à l'application de lois avec lesquelles nous ne sommes pas d'accord : les lois sont là, elles ont été votées démocratiquement, nous devons les appliquer, point barre. Nous ne devons pas faire de politique ! Occupons-nous plutôt d'écrire l'encyclopédie, et appliquons les lois quand elles s'appliquent, de quelque pays qu'elles soient 78.251.253.2 11:38, 6 September 2013 (UTC)
Nous ne devons pas faire de politique? C'est une position que j'ai du mal à comprendre, pour la raison suivante: à quoi bon contribuer à une encyclopédie si elle aussi devient un instrument de répression? Au contraire, je suis persuadé que l'histoire nous apprend que nous devons résister aux lois injustes le mieux possible ... bien qu'on puisse parler de votes démocratiques dans le cas des lois en question, je conteste cette interprétation (à la surface, c'en étaient -- modulo la désinformation, la corruption/le lobbyisme, la pression venant des services secrets ...), elles ont été promulguées par un électorat en majorité analphabète en matière de technologie, donc sujet à toute sorte de manipulation -- les avis d'experts indépendants ne comptent plus pour des nèfles. C'est la peur qui gouverne la société pré-(techno)fasciste, pas la raison.
Summary: I strongly oppose unquestioning compliance with unjust laws, passed democratically or not. We can not abstain from being political in this matter because otherwise what we do becomes part of the unjust system. Ɯ (talk) 10:51, 10 September 2013 (UTC)

Localisation des serveurs aux Etats-Unis et loi applicable

The following discussion is closed: closing this section as well since it looks finished up for now. Will archive after the full set (including the last, still open section below) is done.

Les explications indiquent que les serveurs sont situés aux Etats-Unis et que nous devons accepter que ce soit la loi américaine de protection des données personnelles qui s'applique, même si elle est moins protectrice que la nôtre, et que dans le cas contraire nous ne devons pas utiliser Wikipédia. Ca veut dire que nous devons nous barrer tout de suite ? De toutes façons, je ne crois pas que ce soit légal. La Wikipédia francophone concernant en grande partie des Français (ainsi que des Québécois, Belges, Africains, Suisses, etc.), je pense que les juridictions des publics concernés ont leur mot à dire, et que leurs lois doivent d'appliquer. La jurisprudence n'est pas encore bien établie, mais d'ores et déjà certains décisions judiciaires sont allées dans ce sens. En tous cas, personnellement, je ne suis pas du tout d'accord pour donner mon consentement à ce que ce soit la loi américaine qui s'applique. Bien trop dangereux ! La loi américaine n'est pas assez protectrice ! Sans parler de toutes ces lois liberticides prises à la suite des attentats du 11 septembre, sans grand contre-pouvoir pour contrôler leur mise en oeuvre ! 78.251.246.17 22:55, 4 September 2013 (UTC)

Pourquoi parles-tu uniquement de la Wikipédia francophone ? Il existe plusieurs centaines de projets dans plein de langues, dont les pays pourraient également avoir leur mot à dire. En clair, la fondation ne peut pas suivre toutes les lois du monde et s'arrête donc à celle de son pays. Elfix 07:47, 5 September 2013 (UTC)
Le problème est qu'on a plusieurs centaines de projets dans plein de langues, mais aussi plusieurs centaines de pays qui, que vous le vouliez ou non, sont souverains, ont leurs propres lois, et ont le droit d'avoir leurs propres lois. C'est un fait. Qu'on le veuille ou non. Et la question n'est pas de savoir si la fondation peut suivre toutes les lois du monde, la question est qu'elle DOIT suivre les lois du monde, car ses activités ne s'arrêtent pas aux frontières de son pays mais s'étendent dans le monde entier. Non seulement elle DOIT suivre les lois des pays auxquels ses activités s'étendent, mais pour un pays comme la France ou n'importe quel pays européen, dont les lois sont beaucoup plus protectrices vis-à-vis de la vie privée des citoyens que la loi américaine, c'est même hautement souhaitable. C'est la raison pour laquelle cette clause est mauvaise. Si l'excuse pour laquelle la Fondation explique qu'il faut adopter la loi américaine, même si elle est moins protectrice que celle de notre pays, est que les serveurs sont aux Etats-Unis, dans ce cas rapatrions les serveurs en Europe. Dans tous les cas ce sont les lois les plus protectrices que nous devons respecter, car si nous respectons les lois les plus protectrices, alors nous respectons toutes les lois, y compris les lois américaines ou de tous les pays 78.251.243.204 18:26, 5 September 2013 (UTC)
J'ai fait le point en anglais plus haut, mais c'est la même: toute information que vous soumettez au Wikipedia anglais/françcais/allemand etc. est gardée aux USA, donc votre loi locale ne s'applique probablement pas. Au Canada par exemple, nous avons LPRPDE (PIPEDA en anglais) pour la protection des données et des documents électroniques; toute information qui n'est pas sur un ordinateur canadien n'est pas protégée. Donc, si pour une raison ou un autre, Obama ou le gouvernement américain décide de fouiller dans votre information, tant pis! Toute protection locale s'arrête à la frontière. Vous n'avez qu'à regarder le cas d'Edward Snowden ou de Julien Assange; on peut très facilement vous rendre la vie très difficile s'ils décident que vous êtes l'ennemi des USA... Gare à vous. Caveat emptor. 99.251.24.168 02:24, 6 September 2013 (UTC)
Bonjour 99.251.24.168 et merci de votre réponse :-) J'ai moi aussi répondu plus haut. Je pense au contraire que les lois des pays souverains ont toute chance de s'appliquer. Mais dans le cas que vous décrivez de données canadiennes conservées sur des serveurs américains, les lois américaines s'appliquent AUSSI, et c'est bien normal, les EU sont un pays souverain, comme le Canada. Dans les affaires de ce type, qui concernent plusieurs pays, le droit applicable est toujours un compromis entre les différents droits concernés. Ne croyez pas que seules les lois du pays hébergeant les serveurs s'appliquent. Cave canem ! ;-) 78.251.253.2 11:47, 6 September 2013 (UTC)

Thank you for your comments and my apologies for responding in English. Jurisdiction is a complex issue that is determined based on a case-by-case analysis. Generally, we apply U.S. law, but we are sensitive to European data protection laws. For example, a version of this privacy policy was reviewed by a privacy counsel in Europe to ensure consistency with general principles of data protection.

The important issue for our users' data is our commitment to privacy rather than the general privacy law in the country where the Wikimedia Foundation is based. Our privacy policy generally limits the data collection and use to what is necessary to provide and improve the Wikimedia projects. For example, we commit to never selling user data or using it to sell them products. In other words, the commitments we make in this policy go beyond commitments made by many online sites, including those based in Europe. And we encourage users to focus on and provide feedback about those commitments because the commitments are ultimately what matters for their privacy on the Wikimedia sites.YWelinder (WMF) (talk) 19:36, 7 September 2013 (UTC)

Certes, plus que de savoir si c'est la législation de tel ou tel pays qui s'applique, c'est plutôt les détails des Règles ou Charte de protection des données personnelles de Wikimédia qui nous importent. Cependant, les législations (américaines, européennes) sont des références communes et pratiques offrant une base rassurante, parce qu'elles ne nous sont pas complètement inconnues. Dans cette logique, et pour nous aider à mieux appréhender la Charte, serait-il possible qu'une personne compétente nous fasse un résumé de ce qui diffère entre cette Charte et les législations américaine ou européennes ? Comment la Charte se situe-t-elle par rapport à ces législations ? 85.170.120.230 10:43, 8 September 2013 (UTC)
Hi 85! There is currently no significant body of federal online privacy law in the United States. There are, however, some specific federal and state-by-state laws that mostly have to do with the treatment and disclosure of particularly sensitive materials, such as medical and criminal records. These kinds of privacy laws, for the most part, do not apply to us, as we do not collect such types of sensitive information.
We are, of our own volition, doing as much as we are capable of to meet the expectations of community members domestically and abroad, well above and beyond what United States law requires of us.
Regarding user information that we may be required to furnish pursuant to formal legal process, you will find this issue addressed here. For more information on how United States privacy law compares with privacy laws in Europe, you may wish to consult the writings of Professor Paul M. Schwartz. DRenaud (WMF) (talk) 23:30, 18 December 2013 (UTC)
Thanks for the link: it was an interesting read, though more focused on the dynamics of politics (with some scattered mention of the last EU directive drafts). The report by professor Chris Hoofnagle linked above, while definitely more boring, has a more "usable" list of differences and issues. --Nemo 12:49, 19 December 2013 (UTC)
The following discussion is closed: Closed since it looks like most of the discussion should now go to The active Safe Harbor discussion that Luis responded too. I'll leave for a bit before archiving unless it's reopened. Jalexander--WMF 04:53, 7 January 2014 (UTC)

Localisation des serveurs aux Etats-Unis et loi applicable bis

Je demande le retrait du paragraphe Où se trouve la Fondation et qu’est-ce que ceci implique pour moi ? 78.251.243.204 19:05, 5 September 2013 (UTC)

My apologies for the response in English. If someone would be so kind as to translate this into French, I would be much obliged. Are there any particular reasons that you are requesting removal of that section? Is there any specific language that concerns you? If so, please specify. Mpaulson (WMF) (talk) 22:23, 5 September 2013 (UTC)
Traduction / translation : « Excusez-moi de répondre en anglais. Si quelqu'un avait la gentillesse de tranduire mon message en français, je lui en serai reconnaissant. Y a-t-il des raisons particulières pour que vous demandiez le retrait de cette section ? Y a-t-il une langue spécifique qui vous concerne ? Si tel est le cas, veuillez le préciser. » Jules78120 (talk) 22:37, 5 September 2013 (UTC)
Merci Mpaulson de votre réponse (et merci à Jules78120 pour sa sympathique traduction :-) ). Les raisons particulières qui me poussent à demander le retrait de cette section sont les mêmes que celle déjà développées plus haut dans la section Localisation des serveurs aux Etats-Unis et loi applicable et dans plusieurs autres sections telles par exemple que NSA, FISC, NSL, FISAAA, PRISM... Je me permets juste d'être un peu plus insistant dans ma demande, avec votre permission :-) 78.251.243.204 00:54, 6 September 2013 (UTC)
So, while we as an organization and I personally have some sizable objections to PRISM and many of the actions taken by the US government recently with regards to privacy, removing this section will not actually change the applicability of US law. The Foundation is located in the US, meaning that using our sites leads to the transfer of data to the US, and thus is subject to US law. Mpaulson (WMF) (talk) 01:09, 6 September 2013 (UTC)
Bien sûr que les serveurs sont situés aux EU et que les lois américaines s'appliquent (à ce propos, on devrait peut-être songer à redéménager les serveurs en dehors des EU !). Par contre, je ne suis pas d'accord avec la phrase « Vous consentez également au transfert de vos informations par nous depuis les États-Unis vers d’autres pays qui sont susceptibles d’avoir des lois sur la protection des données différentes ou moins contraignantes que dans votre pays, en lien avec les services qui vous sont fournis. » Je ne suis pas d'accord pour que mes données soient transmises n'importe où, y compris à des entreprises situées dans des pays où les lois autoriseraient n'importe qui à faire n'importe quoi avec. Si nos données sont transmises, elles ne doivent l'être qu'avec la garantie que nos données seront protégées au moins autant que dans notre pays, ou en tous cas au moins autant qu'aux EU. Quelque soit l'entreprise ou le pays vers lesquels sont transmises nos données, on doit s'assurer que la Charte de confidentialité soit garantie. Sinon, on ne transmet pas. La Charte n'établit, je trouve, pas ce point assez clairement (par exemple les paragraphes Si l’organisation est cédée (très peu probable !) et À nos prestataires de services manquent à mon avis de précision) 78.251.253.2 12:36, 6 September 2013 (UTC)
P.S. : EU en français = Etats-Unis = United States = US en anglais ; je m'excuse, j'aurais dû écrire Etats-Unis en toutes lettres :-) 85.170.120.230 01:51, 7 September 2013 (UTC)
Unfortunately, US privacy law is still very much developing and the EU considers the US to have less stringent data protection laws than the US. So using a Wikimedia Site means that, if you are a resident of Europe, your data is being transferred to a country with less stringent data protection laws that your country. There isn't really a way for you to use the Wikimedia Sites without consenting to that kind of transfer unfortunately. But differences in privacy regimes aside, the Wikimedia Foundation seeks to put into place contractual and technological protections with third parties (no matter what country they may be located in) if they are to receive nonpublic user information, to help ensure that their practices meet the standards of the Wikimedia Foundation's privacy policy. Mpaulson (WMF) (talk) 18:59, 6 September 2013 (UTC)
This is not quite correct. If I visit google.com from Italy, I'm asked whether I want to accept a cookie or not, though in USA you are not. Moreover, Google managers were held criminally liable for privacy violation in a meritless case which however ruled that «the jurisdiction of the Italian Courts applies [...] regardless of where the Google servers with the uploaded content are located».[1] --Nemo 19:26, 6 September 2013 (UTC)
What does this mean: "the EU considers the US to have less stringent data protection laws than the US"? PiRSquared17 (talk) 19:27, 6 September 2013 (UTC)
«Special precautions need to be taken when personal data is transferred to countries outside the EEA that do not provide EU-standard data protection.»[2] «The Commission has so far recognized [...] the US Department of Commerce's Safe harbor Privacy Principles, and the transfer of Air Passenger Name Record to the United States' Bureau of Customs and Border Protection as providing adequate protection.»[3] «In many respects, the US is a data haven in comparison to international standards. Increasing globalization of US business, evidenced by the Safe Harbor agreement, is driving more thinking about data protection in other countries. Still, political and economic forces make a European style data protection law of general applicability highly unlikely in the near future».[4] WMF is also not in [5], FWIW. --Nemo 19:46, 6 September 2013 (UTC)
Note that we cannot be in the Safe Harbor program, because the Federal Trade Commission does not have jurisdiction over non-profit organizations. (See "Eligibility for Self-Certification" on the Safe Harbor page.) We would likely join if we could. -LVilla (WMF) (talk) 22:47, 17 September 2013 (UTC)
Interesting. I was merely answering PiRSquared17's question, but if the WMF would like to join the self-certification program if only it was possible, why not adhere to those obligations in the policy? It won't trigger the law obligations (and advantages), but WMF is free to voluntarily stick itself to higher standards. --Nemo 14:13, 27 September 2013 (UTC)
Indeed. This is another example of a response we have seen elsewhere on this page, where WMF has argues that as a non-profit it is not required to adhere to certain privacy-related standards. It would of course be possible to adhere to those standards voluntarily, and I think there should be an explicit statement of what consideration if any has been given to such voluntary adherence. Spectral sequence (talk) 17:15, 27 September 2013 (UTC)
@Mpaulson : J'ai l'impression que vous avez mal compris mon abréviation EU, qui signifiait Etats-Unis (d'Amérique). Pardon. Ceci dit, même si les lois américaines sont en effet souvent considérées moins protectrices des données personnelles que les lois européennes, les Règles de protection des données personnelles (Privacy Policy) de Wikimédia peuvent tout à fait garantir un niveau de protection supérieur aux lois américaines. Garantir un niveau de protection inférieur aux lois américaines ne serait pas légal, mais garantir un niveau de protection supérieur aux lois américaines, et même supérieur aux lois européennes ou à d'autres lois, est tout à fait possible et compatible avec le droit américain. Il suffit d'adopter des Règles au moins aussi protectrices que les différentes législations nationales (un plus grand commun dénominateur des différentes législations, donc). Je ne vois pas ce qui nous en empêche. Et il faut bien entendu que tous les prestataires de services s'engagent ensuite à respecter ce niveau de protection (comme déjà stipulé dans le paragraphe À nos prestataires de services) 85.170.120.230 02:22, 7 September 2013 (UTC)
Dans un but de meilleure compréhension, serait-il possible que quelqu'un de compétent nous explique en quoi ces Règles de Confidentialités diffèrent du droit européen ? En quoi elles seraient moins protectrices que celui-ci ? Une explication du genre de celle donnée ci-dessus dans la section What is changing? serait très intéressante ! 85.170.120.230 02:32, 7 September 2013 (UTC)
En particulier, comme évoqué par Nemo, comment se situe la WMF par rapport au cadre juridique Safe Harbor ? 85.170.120.230 12:10, 8 September 2013 (UTC)
Hi Anonymous. Without going into exhaustive detail, the United States as a whole largely has no explicit privacy framework. The Safe Harbor framework is not so much a United States privacy framework as a system where organizations in the United States can agree to maintain minimum levels of protection similar to that provided in the European Union. This is a particularly helpful system for large companies that tend to have a big physical presence in Europe (and therefore are definitely subject to European laws) and have the need to send massive amounts of personal information between the United States and the European Union. As LVilla mentioned earlier, even if we had the resources available to meet the exact standards required to participate in the Safe Harbor program, we are not eligible because the FTC (who enforces the program) does not have jurisdiction over WMF because it's a non-profit. In the United States, there are federal (i.e. national) laws that may touch on privacy, such as those protecting children, but even those may not apply to every organization or every situation. There are also state laws that address specific aspects of privacy, but those vary from state-to-state and also tend to only address specific scenarios. California is amongst the most protective, but still does not come anywhere the regulatory framework that the European Union has.
One way organizations in the United States have attempted to provide higher standards is through their commitments to do so in their privacy policies. This is what we are doing here with our privacy policy. This draft is meant to explain the minimum levels of protections we can guarantee at this point in the organization's evolution. We are striving to provide greater protections as we learn and grow (and it should be noted that nothing in this or any privacy policy draft we will ever have will prevent us from providing greater protections than outlined in the policy). Mpaulson (WMF) (talk) 18:14, 27 September 2013 (UTC)

Closing off, stale. Will archive in 24-48 hours, a new section is probably best if further questions. Jalexander--WMF 22:15, 6 November 2013 (UTC)

Actually I think this is perfect. Comment by Spectral sequence 17:15, 27 September 2013 (UTC) has not been addressed (yes, we know this is legal in USA; would it be legal in EU? not hard to understand the question). LVilla said above "We would likely join if we could", so let's pretend that you can: what would it entail? --Nemo 22:42, 6 November 2013 (UTC)
By the way, Restoring Trust in EU-US data flows - Frequently Asked Questions (European Commission - MEMO/13/1059 27/11/2013). --Nemo 09:13, 2 December 2013 (UTC)
Hello Nemo, thanks for this link. We are in the process of researching and preparing a response to address Spectral sequence's questions. Stephen LaPorte (WMF) (talk) 20:06, 9 December 2013 (UTC)
Nice, looking forward to it. --Nemo 12:01, 19 December 2013 (UTC)
Please see LVilla's discussion of Wikimedia compliance with the Safe Harbor framework here. Thanks! RPatel (WMF) (talk) 18:35, 2 January 2014 (UTC)

Collection of "unique device identification numbers"

The following discussion is closed: Closing as apparently done, please reopen if needed. Jalexander--WMF 00:33, 15 January 2014 (UTC)

MOVED FROM WIKIPEDIA VILLAGE PUMP

Hi, at http://meta.wikimedia.org/wiki/Privacy_policy/BannerTestA, it says:

Because of how browsers work and similar to other major websites, we receive some information automatically when you visit the Wikimedia Sites. This information includes the type of device you are using (possibly including unique device identification numbers), the type and version of your browser, your browser’s language preference, the type and version of your device’s operating system, in some cases the name of your internet service provider or mobile carrier, the website that referred you to the Wikimedia Sites and the website you exited the Wikimedia Sites from, which pages you request and visit, and the date and time of each request you make to the Wikimedia Sites.

What sort of "unique device identification numbers" is it referring to? I thought browsers didn't provide that information. 86.169.185.183 (talk) 17:40, 4 September 2013 (UTC)

Looking at similar privacy policies, it looks like this may refer to mobile devices: "AFID, Android ID, IMEI, UDID". --  Gadget850 talk 17:45, 4 September 2013 (UTC)
You mean that when you access a website through a browser on an Android device the website can collect a unique device ID? Is that really correct? (I can believe it for general apps, where, presumably the app can do "anything" within permissions, but I didn't think there was any such browser-website mechanism). 86.169.185.183 (talk) 18:58, 4 September 2013 (UTC)
I think this question is more appropriate for the Talk page discussion on the privacy policy draft. Steven Walling (WMF) • talk 20:31, 4 September 2013 (UTC)

I see that this information is "receive[d] [...] automatically". That doesn't necessarily mean this information needs to be collected and stored. Personally I am fine with this information being temporarily handled in a volatile location in order to cater to the display needs of each individual device. I do not however, believe that this information should be stored or used for any other means. Participation in this data-mining should be off by default. WMF would of course be free to nag users into opting in. Because this is a _free_ encyclopedia, users should be _free_ to at least view it in the way they want, without having all their habits and device details harvested non-consensually. Contributions? Edits? Sure, take all you want. There's an implicit agreement to such data-mining when a user submits an edit. But there isn't one from just viewing a page. --129.107.225.212 16:59, 5 September 2013 (UTC)

Thanks, but that is not really relevant to my question (not sure if it was supposed to be), My question is whether it is technically possible for a website to obtain "unique device identification numbers" from a web browser. The text implies that it is; previously I believed it wasn't. I am hoping that someone will be able to answer the question. 86.167.19.217 17:27, 5 September 2013 (UTC)
You are correct in stating that browsers are sandboxed from retrieving this type of information. However, our mobile apps and our mobile app deployment infrastructure may utilize "unique device identification numbers" to identify mobile devices (such as a device tokens, device unique user agents, or potentially UDIDs). Our mobile apps may need this ID for certain functionality, such as sending push notifications or delivering test deployments. Thanks, Stephen LaPorte (WMF) (talk) 17:11, 6 September 2013 (UTC)
I think we have no intention of accessing or recording device UDID, IMEI number, or anything else like that. (It's also getting increasingly hard for apps to get access to those, as the OS vendors don't like creepy apps either.) In the cases where we do usage tracking and need identifiers, they'll be either based on something already in the system -- like your username/ID -- or a randomly-generated token. --brion (talk) 17:20, 6 September 2013 (UTC)
In that case, I think the wording needs adjusting since it currently says "Because of how browsers work [...] we receive some information automatically when you visit the Wikimedia Sites [...] possibly including unique device identification numbers". Mobile apps are not "browsers". 86.160.215.210 20:53, 9 September 2013 (UTC)
Thanks -- I made a small change to clarify that it applies to mobile applications. - Stephen LaPorte (WMF) (talk) 22:33, 6 November 2013 (UTC)
Thanks to the long term foundation policy of enabling widespread vandalism from IP addresses (because who cares how much time dedicated users spend reverting vandalism when they could be productively editing.. far more important not scare off someone who wants to add 'is a dick' to a biography), and the genius decision to enable vandalism from IPv6 addresses, Wikimedia is now actively enabling access to unique identifying data not just by Wikimedia admins, but by absolutely anyone in the world. Unless a Wikipedia user forced onto an IPv6 network takes extraordinary steps- steps which they are highly unlikely to be aware of unless they are reasonably technically savvy and thus have a Wikipedia account anyway- they will now be trackable to the household, if not the *device* level. Genius! John Nevard (talk) 14:37, 14 September 2013 (UTC)
w:IPv6#Privacy indicates that IPv6 privacy extensions are enabled by default on most systems. LFaraone (talk) 23:03, 1 January 2014 (UTC)

Further clarification on unique identifiers for mobile applications?

Below, @Nemo bis: asked for clarification about why the policy still mentions unique device identification numbers after Brion's response. The intention for this sentence is to clarify that our applications could possibly collect unique device identification numbers, which may still be applicable for some applications, although not all of them. This sort of technical detail will depend precisely on the operating system, device, and application. I would welcome an alternative phrasing, if you think this could be clarified further in the policy. Thanks for everyone's attention to detail here. Stephen LaPorte (WMF) (talk) 20:49, 22 November 2013 (UTC)

Yes, add that said unique device identification numbers are not accessed nor recorded, per Brion above. Covering them and not explicitly excluding their usage is worse than not mentioning them at all. --Nemo 10:35, 25 November 2013 (UTC)
Hello Nemo, after confirming with Brion, I clarified in the policy that unique device identifiers may possibly be used for some beta versions of our mobile applications. For example, we may need to use a unique number to whitelist devices that are beta testing an application on some versions of iOS. We cannot say that they will never be accessed or recorded, but it would be in a limited circumstance like this. Thanks, Stephen LaPorte (WMF) (talk) 00:00, 8 January 2014 (UTC)

When May We Share Your Information? Because You Made It Public

The following discussion is closed: closing because discussion seems to be done, please reopen if not

Privacy policy/Archives/2014#Because_You_Made_It_Public: "Any information you post publicly on the Wikimedia Sites is just that – public."

Does this mean the WMF is allowed to share any of the information, by any means, in any form, for any purpose, to anyone? --Aviertje (talk) 13:03, 1 December 2013 (UTC)

It means that, for example, the WMF can distribute dumps with all your edits, etc. in them. I think this should be changed to exclude oversighted (or deleted?) info, though, even if it was originally public. PiRSquared17 (talk) 15:57, 1 December 2013 (UTC)
I doubt that going back to redact information from old dumps is really feasible, though. Anomie (talk) 14:15, 2 December 2013 (UTC)
We could, in theory, delete it from the dumps we provide. However, many other people mirror and distribute those dumps, and we can't (as a practical matter) reach out and take those down. So any promise here to exclude deleted information would be a false promise. We'd prefer to be up-front, and warn people that their public edits really are public- that's what this language attempts to do.
That said, I sort of see the original commenter's point about the language being perhaps somewhat confusing. We'd be happy to listen to any suggestions on how to improve it.-LVilla (WMF) (talk) 19:40, 9 December 2013 (UTC)
In theory, yes. But actually doing so would probably be technically prohibitive. Anomie (talk) 14:46, 10 December 2013 (UTC)
Oh, yes, absolutely. Don't worry, I highly doubt Legal (at least under my watch) will be in the business of forcing anybody to be open up and edit dumps :) -LVilla (WMF) (talk)
@Aviertje: I should have said this earlier, but this is about information you post publicly, as opposed to information we record privately and then later make public. So, for example, if you put your real name in your user name, or post your mailing address on your talk page, that is public information; we can't reasonably know about it or treat it specially (though in some circumstances the community may help you delete it). Does that make sense?
If it would help, we could add something like the italic text: "Any information you post publicly on the Wikimedia Sites is just that – public. For example, if you put your mailing address on your talk page, that is public, and not protected by this policy. Please think carefully about your desired level of anonymity before you disclose personal information on your user page or elsewhere." If you have any other suggestions on how to make it more clear, please let us know. -LVilla (WMF) (talk) 23:41, 18 December 2013 (UTC)
I've put this into the policy; thanks for the suggestion, Aviertje.

Regarding site visiting logs

The following discussion is closed: Closing because this looks to be answered, please reopen if not

First question: is our every visit to wikimedia sites logged (e.g. some ip, logged in or not, visited page https://meta.wikimedia.org/w/xxxx at some time) and stored? If yes, then how long will it be stored? The current Privacy policy says: "When a visitor requests or reads a page, or sends email to a Wikimedia server, no more information is collected than is typically collected by web sites. The Wikimedia Foundation may keep raw logs of such transactions, but these will not be published or used to track legitimate users.", in which the "may keep raw logs" is ambiguous. Also, regarding "these will not be published or used to track legitimate users." does that mean these data can be used to track illegitimate(for example, suspected vandalism) users?

Second question: recently I just heard some user claiming that though Checkusers' range of access excludes user visit log, in some necessary occasions they can apply to access those data. Is that true?--朝鲜的轮子 (talk) 06:57, 4 December 2013 (UTC)

CheckUser does not have access to a user's visit log. Legoktm (talk) 20:23, 4 December 2013 (UTC)
By "does not have access", do you mean "never ever, even when there is need", or "possible when checking such log can be helpful to proving connections between users"?--朝鲜的轮子 (talk) 03:15, 5 December 2013 (UTC)
Checkusers only have access to what is stored in the checkuser table. A user's visits are not stored in that table. Hence, checkusers "never ever" have access to it via the CheckUser tool. Legoktm (talk) 03:17, 5 December 2013 (UTC)
And Checkusers will never ever use anything beyond reach of Checkuser tool?--朝鲜的轮子 (talk) 03:56, 5 December 2013 (UTC)
What User:Legoktm wrote is incomplete. Is there other information, stored on some hardware controlled by the Wikimedia Foundation, in addition to the information available to checkusers? If so, what information is available at that location, and who has access to it? --Stefan2 (talk) 21:56, 7 December 2013 (UTC)
Well, I would likely be the one they'd have to apply to - and I've never heard of such a thing. To my knowledge, there is no such application process or access to any other data. I don't want to categorically speak to what may or may not be on the servers - I'm not technical enough to know - but I can say that if it exists, it is not and has not been used that way. At least, not for the last several years that I've been around. Philippe (WMF) (talk) 00:41, 8 December 2013 (UTC)
wikitech:Logs has a summary of the sorts of raw access logs that are probably being referred to here (note this may not be a complete list). Access to this data is limited to people with access to the servers involved, and as far as I know getting access requires an NDA and is generally limited to WMF employees and contractors involved in maintaining the site and software. Also as far as I know, the sorts of illegitimate uses this data might be used to track are more along the lines of someone trying to break or break into the servers, not on-wiki vandalism. BJorsch (WMF) (talk) 14:37, 9 December 2013 (UTC)
The current privacy policy only allows sampled logs, which means it's hard to do any tracking/user profiling/fingerprinting/user behaviour analysis/however you may wish to call it. The proposed text, in short, proposes to allow unlimited tracking; see in particular #Unsampled request logs/tracking and #Reader Privacy at the Internet Archive for more information. --Nemo 14:17, 7 December 2013 (UTC)
I think the major concern on unsampled tracking is fundraising and research. What about anti-vandalism? Does WMF think that it is necessary and legitimate to use anything to if it helps to identify a vandal, in principle?--朝鲜的轮子 (talk) 22:52, 11 December 2013 (UTC)
Thank you for your questions 朝鲜的轮子! For the first one, as you say, we collect different types of information from users (either automatically or intentionally) when visiting Wikimedia Sites, logged in or not. We are currently working on data retention guidelines that will apply to all non-public data we collect from Wikimedia Sites. The guidelines will describe the different types of information that we collect (with examples), and will describe for how long each type of information would be retained. The data retention guidelines would work along with the Privacy Policy, being updated over time to reflect current retention practices, and will allow us to further fulfill our commitment in the Privacy Policy of keeping your private data “for the shortest possible time”.
Regarding your second question, I believe Philippe (WMF)’s comment covers exactly what you ask. From my knowledge, we have no way for Checkusers to access any type of raw server log in their Checkuser capacity. Furthermore, we have never given log access to the community and we have no intention of doing so. Please let us know more information on where you heard this if you want us to dive deeper into this. Thanks again! --JVargas (WMF) (talk) 00:08, 19 December 2013 (UTC)

Please add concerning user profiles

The following discussion is closed: closing as apparently done, discussion continues at #Handling_our_user_data_-_an_appeal. Please reopen if necessary. Jalexander--WMF 00:35, 15 January 2014 (UTC)

Sorry, me English is not good enough to write it directly in English, so I hope somebody will translate it.

  • Wir veröffentlichen ohne Deine ausdrückliche Zustimmung kein Nutzerprofil von Dir, also Daten, die Deine zeitlichen Editiergewohnheiten und Interessengebiete zusammenfassen. Wenn wir Daten an andere weitergeben, die das Erstellen solcher Profile ermöglichen (zum Beispiel WikiLabs), so verpflichten wir sie, ebenfalls keine in dieser Weise aggregierten Nutzerdaten ohne Deine Zustimmung zu veröffentlichen.

--Anka Friedrich (talk) 11:25, 7 December 2013 (UTC)

Rough translation: "Withour your explicit consent, we do not publish user profiles about you, i.e. data summarizing your temporal editing habits and interest areas. If we release data to others who enable the generation of such profiles (e.g. WikiLabs), we require them to likewise not publish user data that have been aggregated in this way, except with your consent." Regards, Tbayer (WMF) (talk) 03:34, 10 December 2013 (UTC)
Considering that second sentence would require us to stop publicly releasing data dumps and to break history pages and the API, I would oppose such a change. Anomie (talk) 14:49, 10 December 2013 (UTC)
Tbayer, thank You for Translation. --Anka Friedrich (talk) 15:25, 15 December 2013 (UTC)
Anomie, no, but everybody, who gets the dump or gets access to the API would have to aggree not to aggregate data without consent. --Anka Friedrich (talk) 15:25, 15 December 2013 (UTC)
The dumps and access to the API are given to everyone in the world without restrictions. And I oppose requiring people to "sign up" so we can force them to agree to some pointless requirement before allowing them to access these things. You also overlooked history pages, Special:Contributions, and other on-wiki interfaces which would also have to be restricted or broken. Anomie (talk) 14:16, 16 December 2013 (UTC)

@Anka Friedrich: This issue is now being discussed in great detail below in #Handling_our_user_data_-_an_appeal, so I will suggest closing this section out and continuing the discussion below. Thank you again for your serious, thoughtful contribution to this discussion, despite the language barrier. —LVilla (WMF) (talk) 00:26, 8 January 2014 (UTC)

Regarding some introductory remarks

The following discussion is closed: closing and will archive in a couple days unless reopened, appears to be answered with changes made by geoff and no responses

Hi,

I would like to share some observations from reading the introductory remarks of this document. I apologize if anything has already been brought up.

  • "[1] Gathering, sharing, and understanding information is what built the Wikimedia Sites. [2] Continuing to do so in novel ways helps us learn how to make them better. [3] We believe that information-gathering and use should go hand-in-hand with transparency. [4] This Privacy Policy explains how the Wikimedia Foundation, the non-profit organization that hosts the Wikimedia Sites, like Wikipedia, collects, uses, and shares information we receive from you." — That sounds really strange to me. What built the Wikimedia Sites? Our contributions to the project (i.e. content contributed by volunteers), and that's pretty obvious to the reader of this document. Reading [1] in isolation, information is hence understood in the sense of information about a public figure or a historical event. However, between [1,2] and [2,3,4], the meaning of "information" gradually shifts. Suddenly, "information" is no longer what is contributed to the projects but, in fact, "personal information." If I weren't sure that you're writing this policy in good faith, I'd probably interpret this as a (pretty obvious) rhetorical trick.
I see your point, Pajz. What would be your recommended rewrite? One possibility:
The Wikimedia movement is founded on a simple, but powerful principle: we can do more together than any of us can do alone. We cannot work collectively without gathering, sharing and analyzing information about our users as we seek new ways to make our Wikimedia Sites more useable, effective, safer, and useful.
We believe that information-gathering and use should go hand-in-hand with transparency. This Privacy Policy explains
Geoffbrigham (talk) 22:41, 18 December 2013 (UTC)
Yep, that's fine IMO. — Pajz (talk) 16:50, 19 December 2013 (UTC)
I will ask James to make the change (after Michelle gives her thumbs up). Thanks for the suggestion. Geoffbrigham (talk) 17:58, 19 December 2013 (UTC)
Done Jalexander--WMF 19:52, 19 December 2013 (UTC)
  • "The Wikimedia Sites were primarily created to help you share your knowledge with the world, and we share your contributions because you have asked us to do so." — Really? As far as I'm aware, the Wikimedia Sites were primarily created to help you be able to access all knowledge of the world ("Imagine a world ..."). The sentence sounds like the sites were primarily a platform for users to express themselves whereas, in fact, I think it's quite clear that contributors are the means, not the end.
To be honest, I'm OK with this formulation. The Wikimedia vision is: "Imagine a world in which every single human being can freely share in the sum of all knowledge. That's our commitment." Consistent with this vision, the sites were created to allow users (like you) to "share [their] knowledge of the world." Also, a person's contributions are shared only when the user requests that we do so as part of this overall vision. I'm open to an alternative proposal that captures the needs of this paragraph, but for now I would personally leave it as it is. :) Geoffbrigham (talk) 23:01, 18 December 2013 (UTC)
Ah, never heard of that mission statement. What I had in mind was "Imagine a world in which every single person on the planet is given free access to the sum of all human knowledge. That's what we're doing." (https://en.wikiquote.org/wiki/Jimmy_Wales#Sourced) (Which makes sense.) Hmm. I don't like the mission statement either, but in this case it's outside the scope of this policy. — Pajz (talk) 16:50, 19 December 2013 (UTC)
Interesting. The nuances in the differences are meaningful. Here is the official vision statement (I think): http://wikimediafoundation.org/wiki/Vision I accordingly will leave this as it is. Geoffbrigham (talk) 17:58, 19 December 2013 (UTC)
  • "Because everyone (not just lawyers) should be able to easily understand how and why their information is collected and used, we use common language instead of more formal terms. Here is a table of translations" — I don't quite see the connection between the first conditional clause and the presentation of a "table of translations". Actually, I was quite amused reading this passage. It sounds like "Hey, we want to make things really simple here, that's why we replaced everything difficult with a word, and here's the dictionary you need to understand these words". Which, of course, would be simplification voodoo. I think you should separate these statements. You first point is that you use common language, your second point is that, in order to avoid redundancy (or whatever), you prepend some definitions.
How about this:
Because everyone (not just lawyers) should be able to easily understand how and why their information is collected and used, we use common language instead of more formal terms throughout this Policy. To help ensure your understanding of some particular key terms, here is a table of translations:
Geoffbrigham (talk) 23:01, 18 December 2013 (UTC)
That's better. — Pajz (talk) 16:50, 19 December 2013 (UTC)
Great. I will ask for the change (as above). Geoffbrigham (talk) 17:58, 19 December 2013 (UTC)
Done Jalexander--WMF 19:57, 19 December 2013 (UTC)

Best, — Pajz (talk) 17:05, 13 December 2013 (UTC)

(Technical:) Cannot translate navigation box title

The following discussion is closed: closed because discussion appears resolved/done please reopen if not

Hi, I can't find the message that corresponds to "More On What This Privacy Policy Doesn’t Cover" in the original. — Pajz (talk) 12:25, 26 December 2013 (UTC)

It's the first search result for the string.[6] Or, just use the translation link provided at the top of all the pages related to this draft, Special:Translate/agg-Privacy_policy_2014_draft. --Nemo 22:19, 30 December 2013 (UTC)
When I click on "German" under "Other languages:", I get to Datenschutzrichtlinie. When I click on "Translate" in the upper bar, I cannot find untranslated messages at all. Similarly, "translate this page" gets me to an empty page. This is completely confusing. — Pajz (talk) 23:42, 30 December 2013 (UTC)

(resolved, — Pajz (talk) 23:42, 30 December 2013 (UTC))

Google Analytics, GitHub ribbon, Facebook like button, etc.

The following discussion is closed: closing because it appears to be answered/dealt with, please reopen if not. Jalexander--WMF 00:36, 15 January 2014 (UTC)
See also Exclusion of on-wiki actions from privacy policy, "Wikimedia Sites", Revision of "What This Privacy Policy Doesn’t Cover"

If a user wants to include Google Analytics, the GitHub ribbon, a Facebook like button, etc., does the proposed Wikimedia privacy policy forbid that? http://status.wikimedia.org/ is an example of a domain that currently loads Google Analytics. https://tools.wmflabs.org/ricordisamoa/dui/ is an example of a domain that currently loads the GitHub ribbon. --MZMcBride (talk) 02:52, 30 December 2013 (UTC)

Does github actually recommend hotlinking? PiRSquared17 (talk) 02:59, 30 December 2013 (UTC)
Dunno. In this case, it looks like it's GitHub account on Amazon Web Services, maybe? The relevant HTML is pasted below. --MZMcBride (talk) 05:28, 30 December 2013 (UTC)
<a href="//github.com/ricordisamoa/labs/tree/master/dui"><img style="position: fixed; top: 0; right: 0; border: 0;" src="//s3.amazonaws.com/github/ribbons/forkme_right_orange_ff7600.png" alt="Fork me on GitHub"></a>

Third-party cookies are forbidden. So as long as facebook/google cant sneak in one of those alternate tracking systems, your question is pretty much answered. Alexpl (talk) 10:15, 30 December 2013 (UTC)

Not really. I think you may be conflating cookies and requests. --MZMcBride (talk) 23:21, 1 January 2014 (UTC)
Google/Amazon can still track the IPs, visit times, etc. I think they might also get the URL of the page the user was visiting, but I'm not an expert in HTTP stuff. PiRSquared17 (talk) 23:41, 1 January 2014 (UTC)

In either case, I agree that the policy should address this. Either explicitly allow or disallow this so we don't have to have this discussion on every tool. PiRSquared17 (talk) 23:37, 1 January 2014 (UTC)

The draft policy very clearly allows it. Everything is allowed, as long as it's done by a sysop or third party e.g. on a MediaWiki:Common.js file or on non-WMF servers, or by a non-WMF tool maintainer. Adding Google Analytics to a Wikimedia project is not disallowed etc. --Nemo 23:40, 1 January 2014 (UTC)
Seriously? I can see allowing it on Labs, perhaps, but we definitely should not allow sysops to track readers. PiRSquared17 (talk) 23:42, 1 January 2014 (UTC)
Sure. The language was changed a bit in latest revisions but the substance is still the same: «situations are not covered by our Privacy Policy [...] Third-party data-collecting tools that are placed on Wikimedia Sites by volunteers or other third parties». Those may be graciously removed on request but it would not be a legal obligation. --Nemo 00:09, 2 January 2014 (UTC)
Wow, then there's almost no point to this policy. I could get information like ip and username, send it to some server, and that would be allowed? @LVilla (WMF): please consider fixing this. PiRSquared17 (talk) 00:11, 2 January 2014 (UTC)
Yeah. 111.111.111.111. looked at a music-video on youtube from 8:12 to 8:20, opened a new tab and visited a conservative political website from 8:13 to 8:15, then switched to wikipedia and read the article about "Hemorrhoids" in the english WP for the next 15 Minutes - Hey, lets generate some appropriate Ad´s for this guy! No WP reader should ever have to expect something like this, and no one would accept the lame excuse that this massive violation of privacy happend at WP-laps. Alexpl (talk) 08:16, 2 January 2014 (UTC)
Taking these slightly out of order:
status.wikimedia.org: This is a separate website, provided for us by a third party - what we call a "Service Provider" under the "More on what this privacy policy doesn't cover". When this privacy policy goes into effect, status will either (1) be covered by this policy or (2) get a separate privacy policy describing the privacy practices of the Service Provider. In general, we will negotiate contracts with Service Providers so that they do not use third-party tracking tools like GA, and otherwise are reasonably careful about privacy, but we want to be flexible so that we can make case-by-case judgments based on the specific facts of the particular website.
Labs: Labs projects are not covered by this privacy policy (unless they are created by WMF employees). That policy says that developers can't "share any Private Information outside of your Labs Project", which is supposed to cover things like this, but people have raised questions about it and I'm trying to draft some more clear language for it.
Volunteer-added data collection:
The privacy policy is a promise between the Foundation and users of the website. So, if the privacy policy tells users "the website will not do X", then the Foundation is promising that the website will not do X. This is why things done by users on the site aren't covered by the policy - we don't want to make a promise we can't keep.
We did not intend to say that these tools are acceptable: if you read the section Nemo cited, it encourages people to report them to us for investigation, or even remove them themselves! (We also make a similar comment in the section about cookies.) We're just saying that the Foundation can't make promises that these tools won't exist.
We should probably make it more clear that these tools should comply with the same general standards as things done by WMF. This is tricky to do right, so I don't have any proposals right now, but I am thinking hard about it and I welcome suggestions in the meantime.
Hope this helps clarify. Like I said, we're taking the point about third parties seriously and we're trying to figure out how to best address it, but it might take a while because of lingering holiday travel.-LVilla (WMF) (talk) 01:59, 3 January 2014 (UTC)
Hm, so WMF does not want to be held responsible for some individuals contribution which makes the website track its readers, but is willing to remove such tools if informed about and does encourage authors to remove them on their own as soon as identified. Makes sense. But if WMF starts to collect those data on their own, officially, like proposed in our "User site behavior collection", the entire idea becomes somewhat pointless. Now "The website will not do X" - and the "third party" can just download those data, feed them in their system - done. Alexpl (talk) 09:09, 3 January 2014 (UTC)
No, it's very different. For example, in the experiment you linked to, the only personally identifying data involved (user-agent string and IP address) are anonymized or not collected, and are not shared with third parties. In the more general case, we strive to collect only limited information, and we only share what we collect under very specific circumstances that protect your information. ("When we give access to personal information to third-party developers or researchers, we put requirements, such as reasonable technical and contractual protections, in place to help ensure that these service providers treat your information consistently with the principles of this Policy and in accordance with our instructions.") That's obviously not the case if a random person inserts Google Analytics or a tracking pixel on a page. So our collection is very different than collection by third parties, because of how we control the data after we collect it.-LVilla (WMF) (talk) 15:53, 3 January 2014 (UTC)
Do volunteers get access to the data collected in this experiment or contribute to the programming :) ? Alexpl (talk) 21:12, 3 January 2014 (UTC)
@Alexpl: I don't know the specifics of this particular experiment. In general, as the policy says, when personal information is collected by WMF through the site then volunteers who have access to the data will be asked to agree to confidentiality agreements, and volunteers who help write code will be subject to code review. Does that answer the question?—LVilla (WMF) (talk) 23:07, 3 January 2014 (UTC)
You are the second WMF person here who doesnt know the details of that experiment. Considering the extent (tracking readers) thats scary. But I made my point. Good luck. Alexpl (talk)
We're a big organization now. If we all had to know every detail of every project in every team, we wouldn't get much done. So I don't think it is scary. -LVilla (WMF) (talk) 02:32, 7 January 2014 (UTC)
And MWalker (WMF) has responded above about that experiment (diff). -LVilla (WMF) (talk) 01:08, 9 January 2014 (UTC)
@Alexpl: Iff the experiment goes ahead after the privacy policy discussion has died down, I am more than happy to ping you for code review. The raw data collected will not be easily available to the general public without them signing an NDA (unless something changes with how EventLogging works and where it puts its data) but the findings from the study will definitely be made public. Mwalker (WMF) (talk) 01:30, 9 January 2014 (UTC)
Thx, but thats not my field of expertise. I was only alarmed by the laxy, informal way the guys seemed to be able to put a potentially very dangerous concept into action. But if its carefully controlled and limited, I´m fine. Alexpl (talk) 11:26, 9 January 2014 (UTC)
@MZMcBride: @PiRSquared17: @Nemo bis: @Alexpl:Here's some proposed language to address the third-party data collection question.

Data Collection by Third Parties

In some circumstances, volunteers and other third parties may have the ability to place a data-collecting tool, such as a third-party cookie, script, gadget, tracking pixel, or share button, on a Wikimedia Site. Such tools are not permitted on Wikimedia Sites unless they first obtain permission from the affected user. Because this can be done without our knowledge, we cannot guarantee that these tools will not be installed. However, if you come across such a third-party tool, you can remove the tool yourself, tell administrators on the relevant Wikimedia Site, or report it to privacy[at]wikimedia.org so we can address the problem.

This would be in the "Important Information" section of the policy. I think the language meets the primary need, which is making much more clear that (1) this is not acceptable, and (2) anyone - WMF, admins, etc. - can solve the problem if they come across it. Thoughts? Comments? —LVilla (WMF) (talk) 20:46, 7 January 2014 (UTC)
Good. PiRSquared17 (talk) 21:27, 7 January 2014 (UTC)
Nice. So I could remove a WMF Labs tracking tool, because Labs is a third party and didnt ask me for permission ? Alexpl (talk) 11:28, 8 January 2014 (UTC)
A better route in the case of a tool hosted on Labs and included on a site covered by this privacy policy might be to check with the Labs developer first. Labs has built-in filtering of IP addresses at the proxy, which makes things like tracking pixels hosted at Labs very different from tracking pixels hosted elsewhere, since they can't get the IP address via the pixel - all they'd know is that someone visited the page, unlike a normal tracking pixel. Also, the privacy policy that makes it fairly difficult to use for collection of personal data. (They could violate the privacy policy, but in that case, the better approach is to ask the developer to fix it and then ask us to shut the tool down if they won't fix it.) So it is highly unlikely that a Labs tool is a data-collecting tool in the sense we mean here.
Answering that makes me realize that "data-collecting tool" is somewhat vague. I think the right thing to do is to call it a "tool that collects personal data", so basically this:

In some circumstances, volunteers and other third parties may have the ability to place a tool that collects personal data, such as a third-party cookie, script, gadget, tracking pixel, or share button, on a Wikimedia Site.

I removed script and gadget because, while they can be tools that collect personal data, that isn't their primary purpose, so they probably aren't as helpful as examples. But open to discussion/suggestion on that point. Does that edit make sense/improve things? —LVilla (WMF) (talk) 00:50, 9 January 2014 (UTC)

Changing Do Not Track section to clarify language, reflect fact that specification is not yet done

The following discussion is closed: Closing because no comments were made to the proposed language, please reopen if proposing changes

The week before Christmas, the World Wide Web Consortium's Tracking Protection Working Group changed the definitions of "tracking" and "first party/third party" in the proposed Do Not Track standard. WMF's behaviors still appear to be in compliance with the proposed standard. However, the changes highlight that the standard is still very much a work in progress. Because of that, I'd like to move the current reference to the standard from the privacy policy itself to a FAQ that makes more clear that the standard is a work-in-progress.

The old language is:

As noted throughout this policy, we are strongly committed to not sharing the information we collect from you with third parties, except under very specific circumstances. In particular, we do not allow tracking by third-party websites you have not visited (including analytics services, advertising networks, and social platforms), nor do we share your information with any third parties for marketing purposes. We may share your information only under particular situations, which you can learn more about in the “When May We Share Your Information” section of this Privacy Policy.

Because of this commitment, this Policy is generally as protective as, or more protective than, a formal implementation of the Do Not Track specification, and so we do not respond to the Do Not Track signal.

For more information regarding DNT signals, please visit our FAQ, Do Not Track Us, and the World Wide Web Consortium's Do Not Track Specification.

The new language would be:

We are strongly committed to not sharing non-public information with third parties. In particular, we do not allow tracking by third-party websites you have not visited (including analytics services, advertising networks, and social platforms), nor do we share your information with any third parties for marketing purposes. Under this policy, we may share your information only under particular situations, which you can learn more about in the “When May We Share Your Information” section of this Privacy Policy.

Because we protect all users, we do not change our behavior in response to a web browser's "do not track" signal.

For more information regarding Do Not Track signals and how we handle them, please visit our Do Not Track FAQ.

The relevant part of the FAQ answer would change from:

Because of this commitment, our Privacy Policy is generally as or more protective than a formal implementation of the Do Not Track specification, and so we do not respond to the Do Not Track signal.

to:

Because of this commitment, we protect everyone, and do not change our behavior in response to a web browser's DNT signal. We believe that, as of this writing, this approach is as or more protective than the obligations for "first parties" set out in the World Wide Web Consortium's Do Not Track specification. However, the specification is still being revised and changed, often in important ways. We will continue to monitor the specification as it moves towards completion and update our behavior and this FAQ consistent with the principles laid out in the Privacy Policy.

Anyone have thoughts/comments/suggestions on this language? If not, I'll put it in soon. Thanks. -LVilla (WMF) (talk) 19:53, 30 December 2013 (UTC)

The above-described changes were made in the "Important Info" section of the Privacy Policy, as well as the Do Not Track FAQ. --JVargas (WMF) (talk) 22:56, 7 January 2014 (UTC)

Safe Harbor

The following discussion is closed: closed for lack of response to clean up PP discussions, please reopen if necessary. Jalexander--WMF 00:35, 15 January 2014 (UTC)

Nemo asked above what we could say about Safe Harbor compliance, putting aside the fact that we can't actually comply because the FTC cannot enforce the statute against us. So I have spent some time looking deeper into the question.

The Safe Harbor is made up of several principles. While we comply with most of them, we don’t see how we can comply with the “Onward Transfer” principle. In essence, this principle states that an organization can only transfer automatically collected information (like IP addresses) to third parties if the third party (1) is an agent and (2) the agent (a) is subject to the Safe Harbor, (b) is subject to the EU Directive, or (c) enters into a written agreement as strong as the relevant Principles.

Generally, we comply with this requirement, because in most cases we simply never transfer data to third parties. However, we do sometimes transfer certain information to third parties: our volunteers. In particular, we sometimes transfer automatically collected IP addresses. They, in turn, often transfer that information to other third parties, like the WHOIS tools that are used to gather more information about the IP addresses.

We do not at this time see how we can comply with the Onward Transfer principle (and therefore comply with Safe Harbor) while still allowing volunteers to fight abuse of the site. We do not believe our volunteers are our “agents”, and (in the ongoing identification discussion) relatively weak written agreements have been rejected. We also don’t see how we can practically prohibit those volunteers from using tools like WHOIS.

This is unfortunate, but again a situation where our unusual structure makes certain kinds of compliance difficult. As usual, we'll have to continue making best efforts in other areas to reduce the risk to users. Hope that helps clarify the situation. -LVilla (WMF) (talk) 19:49, 31 December 2013 (UTC)

Because You Made It Public: IP addresses

The following discussion is closed: close as apparently done, please reopen if necessary

It makes sense that things you post are public. What will be most surprising is that by posting without signing in, your IP address will also be public. I know it's been said elsewhere, but it feels like a glaring omission to not restate it in this section. //Shell 09:59, 10 January 2014 (UTC)

We do say this in the margin summaries farther up: "If you do not create an account, your contributions will be publicly attributed to your IP address." But I'm not opposed to saying it again, especially with the coming of IPv6. How about we state the following in the margin note to this section: "If you edit without registering or logging into your account, your IP address will be seen publicly." Geoffbrigham (talk) 01:07, 11 January 2014 (UTC)
Either in the margin summary like you suggest, or in the policy text itself is fine to me. I just feel it needs to be said here too. //Shell 07:43, 11 January 2014 (UTC)
Thanks for the suggestion, Shell. It's been added to the policy text itself. Mpaulson (WMF) (talk) 00:47, 15 January 2014 (UTC)

Generation of editor profiles

The following discussion is closed: Closing as apparently done, see Luis' comment at end for links to where conversation continuing. Please reopen if necessary. Jalexander--WMF 00:34, 15 January 2014 (UTC)

I'd like once more to point out serious concerns about the generation and publication of detailed user profiles on Wikimedia websites or servers. This issue is repeatedly dealt with, at least, on the German Wikipedia (i.e. here, and actually again in the signpost equivalent at deWP). While the toolserver's policy accords to European standards concerning data privacy, wmlabs (which will completly replace the toolsever in 2014) does not meet these requirements. A contributor's poll at Meta clearly showed the community's preference of an opt-in solution for user data mining tools. Nevertheless WMF is giving the opportunity to run a detailed user profiling tool that does not allow an opt-in, even not an opt-out. We are aware that American data protection standards differ from European standards, and that such tools are considered to be legal in the USA. They are yet not needed by anyone. Thus, we still hope that WMF does not impose US points of view on their global contributors, whenever weak data policies are not required by US law, nor needed by contributors to improve the projects' contents. Looking forward a WMF statement on this issue. --Martina Nolte (talk) 20:53, 24 November 2013 (UTC)

Can you expand on what you mean by a 'US point of view'? --Krenair (talkcontribs) 01:06, 25 November 2013 (UTC)
Sure. User contribution data are publicly available in the edit histories. According to US law, it is okay to aggregate these data and generate detailed user profiles; people tend to feel okay with such a tool. In European countries an aggregation of personal data and the publication of user profiles without consent are considered illegal; people feel offended by such a tool. The views on what is okay or not okay depend on local laws. Laws reflect a culture's values and points of views. --Martina Nolte (talk) 04:27, 25 November 2013 (UTC)
+1 - I would generally like to underline this. -jkb- 10:13, 25 November 2013 (UTC)
Other discussions: Kurier (2013-09), Kurier (2013-10), labs-l (2013-09), labs-l (2013-10). I regret bringing this up on dewiki a little, as I didn't realize it would start this much drama. On the other hand, I do think that this is something we really should be discussing. But all the data will be public as long as db dumps with detailed info are published. PiRSquared17 (talk) 17:39, 26 November 2013 (UTC)
No need to regret it, no drama. This is an important discussion and it has to be made: 5th most used website, 1.7 billion edits with user information, 14 years of data collecting, our data. NNW (talk) 18:23, 26 November 2013 (UTC)
You're right. It's good that this is being discussed at least. I was a bit surprised that almost nobody commented about it on enwiki though. PiRSquared17 (talk) 20:21, 26 November 2013 (UTC)
Perhaps the experience of the 20th century might explain why Germans are quite sensitive concerning these topics. NNW (talk) 09:07, 27 November 2013 (UTC)
Right, the raw data are available by dumps. But not yet aggregated to individual user profiles. WMF could even think about slimming down these dumps; a matter of data economy (as much personal data as needed, as few personal data as possible). Editors agreed to publish their content contributions under a free licence; they do not automatically agree to publish their editing behaviour, or even their individual profiles. As I said, the "drama" is due to a quite different view on data privacy issues. --Martina Nolte (talk) 19:49, 26 November 2013 (UTC)
I'm another who feels that this is a really pertinent Privacy issue which requires careful consideration here. And not just from a purely legal perspective (after all, if the Foundation is adopting a "cuddly" approach to volunteers, legality is surely just one dimension in the picture). User profiling—with its abuses as well as uses—is one reason why I prefer to edit Wikipedia as an IP. —MistyMorn (talk) 11:20, 27 November 2013 (UTC)
  • I am disappointed that the concerns that were raised here and in this RFC are not addressed by the current draft of the new privacy policy. It is amazing that these collected records are not even mentioned in the section Information We Collect. --AFBorchert (talk) 20:30, 30 November 2013 (UTC)
I may have missed something, but the only comment I can see from a WMF member is this. The fact that user profiling—including provision of potentially sensitive personal information—may be done either with or (though rather more arduously for most) without tools made publicly available through Wikimedia doesn't mean that users cannot be informed of such possibilities in the present document. MistyMorn (talk) 20:37, 3 December 2013 (UTC)

To make clear that the above mentionned questions are not individual concerns of single Wikipedia/Wikimedia contributors, I'd like to point to this site (German language yet, a translation is planned). --Martina Nolte (talk) 19:40, 9 December 2013 (UTC)

Hi, Martina: thanks for notifying us about that discussion. We're discussing this issue and considering how best to handle. -LVilla (WMF) (talk) 20:21, 9 December 2013 (UTC)

I've been a Toolserver user for 6 years and the EU data protection directive along with other TS oddities has been a thorn in development. An example: My User activity tool which lists (more or less) publicly available data to make it easier to prune membership lists. If data-mining were allowed we could partially generate and manage these lists automatically. Or email an inactive user that is familiar with a particular city, if questions come up. See whose on IRC and likely up at this time of day.

Additionally, our cultural partners have requested in-depth analytics that cannot be done on the Toolserver because of the privacy policy. WikiProjects are also interested in see who reads their pages, how much they read, what links they follow, what search terms or forums brought them there, and more.

Finally, do not falsely misrepresent the German/European view as some sort of "global" view. The US and many other countries will not adopt data-protection style legislation (despite what WM-DE has said to my face). Also, it's technically impossible third parties from doing analysis on their own, the data is public afterall. You've had your chance and chose to continue decommissioning to (IIRC) free 5-10% of WM-DE's budget. —Dispenser (talk) 18:53, 10 December 2013 (UTC)

You know that a majority of users voted for opt-in? That's what's usually called This is wanted by the community. And you can check that not all of the opt-in voters come from Germany/Europe. NNW (talk) 20:29, 10 December 2013 (UTC)
A majority, but not an impressive one: 54% (possibly slightly higher if some people voted for multiple options) is not what I would call overwhelming consensus. More to the point, there seems to be a pretty strong split based on which projects people come from: I looked over the list of voters on the RfC, and I recognized a great many names from the English Wikipedia under "Remove opt-in completely" and almost none under "Keep opt-in". Not very scientific, I know, but I suspect a more methodical analysis would support the same conclusion. I'm not sure there's any middle ground to be reached on this in terms of the privacy policy; I expect the eventual solution will be to have Labs' policy be that you can't offer data-mining services, or you have to make them opt-in, for projects where the community has indicated they don't want them (or hasn't indicated that they do want them). Emufarmers (talk) 13:10, 13 December 2013 (UTC)

Standing in for Erik as he’s on vacation, my position is that we shouldn’t introduce a policy limitation on what can/can’t be created on WMF servers for public data. However, we can look into adopting a mechanism by which the community can disable specific tools on the basis of community consensus. Legal tells me the Labs Terms of Use already allows the Foundation to take something down if necessary, but a formal mechanism for disabling specific tools based on community consensus has not yet been developed.

This approach would allow the community both the ability to experiment and be bold with how we use our data, as well as provide a check on a tool if the tool is deemed inappropriate. I think this strikes the right balance of experimentation and user privacy protection.

Obfuscating user contributions data or limiting our existing export will not happen because we have a commitment to not only make all of Wikipedia available, but to allow anyone to fork and take the history, content, etc. with them. Removing that ability would be a disservice to the community and we currently have no plans to revisit it. Tfinc (talk) 21:38, 19 December 2013 (UTC)

One question that immediately comes to mind is "which community?". For example, consider the ongoing complaints by some on dewiki about wanting to prevent people from creating tools to analyze contributions. Is consensus on dewiki enough to take the tool down for everyone? Or a consensus on a metawiki discussion contributed to mainly by German editors? And then what if other wikis' communities who weren't notified of this discussion (or ignored it) are upset when a useful tool goes away? Or would we just force the tool author to "break" their tool so it doesn't work on dewiki but continues to function on all other wikis? Anomie (talk) 14:23, 20 December 2013 (UTC)
As the one who'd be tasked with enforcing this, I can tell you that I would require a very clear consensus, and that if the consultation seems to be dominated by a particular subgroup I'd make a serious effort to widen the discussion before any action is taken. Honestly, engineering should be very hesitant to step in and disable a tool or impose conditions on it beyond those of the terms of use; but it's also our responsibility to do so if the tool breaks something or if the community is overwhelmingly opposed to it: Labs isn't free hosting, it's a home for development work that benefits the projects.

I am hoping that if any (sub) community makes it clear that it would rather opt out of some tool, the tool maintainers would be considerate enough to heed the request without intervention by operations, though – and I believe most will without hesitation. MPelletier (WMF) (talk) 15:44, 20 December 2013 (UTC)

The sort of restrictions required to prevent tools like the DUI from existing would place a significant burden on development. You'd need to restrict the ability to access a users' contributions programmatically, but simply disabling the API access is not enough: you'd also need to attempt to curb screen-scraping, via rate limits which will be circumvented. There are numerous legitimate uses of of editor data, and the Foundation has historically released all data needed to build DUI via downloadable data dumps for over 10 years. In effect, attempting to prevent editor profiling would only significantly hinder legitimate users while not preventing malicious use. LFaraone (talk) 23:15, 1 January 2014 (UTC)

Summary; response; moving discussion elsewhere

I think that this section of the discussion boils down to two questions:

  1. Should the Foundation prohibit creation of user profiles on servers controlled by the Foundation, such as Labs?
  2. Should the Foundation prohibit creation of user profiles on servers not controlled by the Foundation, either by reducing the amount of information made available, or by using legal means to restrict how information is used?

Possible responses to question #1 are being discussed below in #Note_on_Labs_Terms_.2F_Response_to_NNW. If there are additional comments to be made on that topic, please make those comments in that section.

Question #2 has been responded to here by a number of people (including Tfinc and Lfaraone), and has also been responded to in #Handling_our_user_data_-_an_appeal by a variety of users, including Anomie, verdy_p, Coren, Nemo bis, and others. If there are additional comments to be made on question #2, please make those comments in #Handling_our_user_data_-_an_appeal, not this one, so that we can all be discussing in one place.

If there is another question or issue not covered by those two discussions below, please open a new section and refer back to this one so that it can be squarely addressed.

Because this section of the discussion is being extensively discussed elsewhere, I propose closing this section. Thank you to everyone who contributed in this part of the discussion for their serious, thoughtful responses on a complex and emotional topic. —LVilla (WMF) (talk) 00:09, 8 January 2014 (UTC)

Edits about tracking and personal information

The following discussion is closed: closing to clean up discussions per lack of response, please reopen if necessary. Jalexander--WMF 00:37, 15 January 2014 (UTC)

This edits User:Elvey was remedied. User:LVilla (WMF) Elvey, please share context? (Like you did for some other thing here). Gryllida (talk) 04:30, 7 January 2014 (UTC)

To explain why I changed those -
  • this edit removed "retained" from the description of what we do with direct communications between users. I did this because we it is not accurate to say that we retain those - we may in some cases but in most cases that I'm aware of we don't.
  • this edit removed an example about tracking pixels that Elvey had edited. Elvey's edit correctly pointed out that the example was a little hard to understand, but I don't think his edit improved it. I spent a little bit of time trying to explain it better without writing a book or assuming the reader is a web developer, and failed, so I deleted it. If folks want to take another stab at it, I'm happy to discuss it here.
Sorry for not explaining this earlier, User:Elvey - I do appreciate that you were trying to improve it :) —LVilla (WMF) (talk) 00:00, 9 January 2014 (UTC)

Short summaries of each section

The following discussion is closed: closing since it looks done, please reopen if not. Will archive in a couple days if still closed. Jalexander--WMF 22:17, 22 January 2014 (UTC)

Reading an entire privacy policy requires a lot of effort, even if it's well-written. Previously we had Rory to break up the sections and give you a pause while reading; now we only have the blue section icons, which I think are the bare minimum. I propose that we add back something to make reading more pleasant. One way to do that is 500px's short summaries of each section. What do you think? Can we do something similar? //Shell 09:59, 17 December 2013 (UTC)

Hi Shell! Thank you for the suggestion. We are working on drafting some summarizing bullet points to put into the left column and will put them up as soon as we have them ready. We would definitely appreciate your input (and input from others) on the bullet points once they are ready. Mpaulson (WMF) (talk) 20:39, 18 December 2013 (UTC)
@Skalman: Some draft summaries were just put up and it would be great to get some feedback. We're still thinking about the right formatting for them as well. Jalexander--WMF 01:48, 10 January 2014 (UTC)
Overall, great! Comments added below:
Shell, I changed the wording to: You are consenting to the use of your information in the U.S. and to the transfer of that information to other countries in connection to providing our services to you and others. Does that make more sense? Thanks! RPatel (WMF) (talk) 00:41, 15 January 2014 (UTC)
Yes, it sounds clear. //Shell 09:50, 16 January 2014 (UTC)

Community comments acceptance deadline

The following discussion is closed: Closing given extension/appears set, please reopen if not. Will archive in a couple days if still open. Jalexander--WMF 22:34, 22 January 2014 (UTC)

Hi. According to the top of the page, the community comments acceptance deadline is 15 January 2014. I'm not sure this is a good idea. Discussion is ongoing and there appear to be a number of unresolved issues on this talk page and at Talk:Access to nonpublic information policy. Discussion should continue until there's consensus to move forward. --MZMcBride (talk) 07:32, 5 January 2014 (UTC)

Given the significant changes still expected, and the few, dated translations, I agree. How far should we push it out? A few weeks? Let's aim to get the expected changes ironed out in English by mid-month? --Elvey (talk) 06:28, 6 January 2014 (UTC)
I just made several edits. The last two are labeled Option A and Option B. They were prompted by this problematic edit: The new language "are kept confidential" was added, referring to 'email this user' email. That implies they are kept. (!) Should they be? Arguably not. Thus I suggest we go with option A, or if that's not accurate, to Option B. I hope we can go with Option A.--Elvey (talk) 06:28, 6 January 2014 (UTC)
@Elvey: As I noted in the edit page, please open a new section on this page for substantive edits, rather than making them directly in the doc. Thanks! —LVilla (WMF) (talk) 02:53, 7 January 2014 (UTC)
I think extending the non-public information policy discussion makes sense, and the data retention policy is still unpublished and so will obviously need to go past the 15th. But on this doc, I'd lean towards bearing down on the remaining open issues and still aiming to close on the 15th (with the obvious exception that there might be changes to it resulting from changes to the other two documents.) Michelle is on vacation, and we can't confirm that until she returns, but that is my sense of the right plan. —LVilla (WMF) (talk) 02:53, 7 January 2014 (UTC)
Hi! Just to confirm, we've extended the discussions for the privacy policy and access policy drafts until 14 February 2014 (which is the same period as the data retention guidelines). Thanks! Mpaulson (WMF) (talk) 20:53, 22 January 2014 (UTC)


Section summaries: Need an explanation

The following discussion is closed: Closing given no further comments, please reopen if still needed. Will archive in a couple days if still open. Jalexander--WMF 22:34, 22 January 2014 (UTC)

I like the summaries, but there is one which I find hard to translate. It's "You are consenting to the use of your information in the U.S. and to the transfer of that information to other countries necessary to provide our services to you and others." I'm not sure what's the literal meaning. Is it that information is only used or transferred whenever it is necessary to provide our services or does it mean that only those information which are necessary to provide our services are used or transferred? Alice Wiegand (talk) 19:45, 12 January 2014 (UTC)

I understand it as "it is technically necessary (as chosen by the internet service providers involved) to send all data, produced by a person reading or interacting with WP, over various national borders" - and the U.S. of course. Something which happens with almost everything you do online. Alexpl (talk) 08:17, 14 January 2014 (UTC)
Thank you both for your comments! I changed the wording to: You are consenting to the use of your information in the U.S. and to the transfer of that information to other countries in connection to providing our services to you and others. Does that make more sense? RPatel (WMF) (talk) 00:39, 15 January 2014 (UTC)

Id like

The following discussion is closed: closing since it appears answered, please reopen if not. Will archive in a couple days if still closed

to append another question. Are these summaries - an I didnt real all of that above - given to another business making company, like Google perhaps? Or so. else who is dealing with information? Who considers that this is not the case and how?--Angel54 5 (talk) 22:29, 18 January 2014 (UTC) Means: This one: "Despite our best efforts in designing and deploying new systems, we may occasionally record personal information in a way that does not comply with these guidelines. When we discover such an oversight, we will promptly comply with the guidelines by deleting, aggregating, or anonymizing the information as appropriate." How may I understand that? Allowed, forbidden, occasional forbidden. Thats to swimming for my taste.--Angel54 5 (talk) 22:57, 18 January 2014 (UTC)

Some volunteer or employee may put a program on a server which collects personal data. That is forbidden and WMF will do something about it as soon as such an event is discovered or reported. Since this is a wiki, such things can never be fully ruled out. Alexpl (talk) 07:42, 20 January 2014 (UTC)
Hi Angel54. Just wanted to let you know that Alexpl's interpretation is accurate. We try to ensure that data is collected, used, and retained in compliance with our policies and guidelines, but there is always a chance that someone does something that doesn't comply and when that does happen, we will try to correct it as soon as we can. Mpaulson (WMF) (talk) 21:01, 22 January 2014 (UTC)

The ability to store unsampled log data (a.k.a. loss of privacy in exchange for money)

The following discussion is closed: closing as the discussion appears over, please reopen if necessary. Will archive shortly if not. Jalexander--WMF 19:41, 11 February 2014 (UTC)

One of the changes between the existing privacy policy and the new draft is that the draft will now allow the Foundation to retain unsampled log data — in effect, this means that every single visit by every single visitor to each and every Wikimedia project (and perhaps other sites owned/run by the Foundation) will now be recorded and retained on WMF servers. It is shocking to me that the only reasons given for such a broad, controversial and hardly advertised change are (1) fundraising and (2) the ability to measure statistics in Wikipedia Zero, a project that is limited in terms of geography, scope and type of access (mobile devices).

Given that Wikipedia Zero is just one of many project led by the Foundation, and that it applies to a limited number of visitors who are using a very specific medium to access the projects, I fail to see the need to sacrifice the privacy of everyone who will ever visit a Wikimedia project. Moreover, I am disappointed and terrified to learn that the Foundation thinks it is reasonable to sacrifice our privacy in exchange for more money — especially since our fundraising campaigns appear to have been quite effective, or at least to have enabled the WMF to reach their revenue goals without much trouble. odder (talk) 22:22, 7 December 2013 (UTC)

"will now be recorded and retained" is probably a bit strong. s/will/may/ would probably be more accurate. Personally, I can see the ability to record full logs when needed to be useful in debugging, performance analysis, and analysis of which features should be prioritized for improvement or development or even possible removal. Boring stuff to most people. BJorsch (WMF) (talk) 14:46, 9 December 2013 (UTC)
"May" is only a legalese euphemism for "will" (in this case). If there are no plans to store and use unsampled log data, for whatever purpose, then surely there will be no problem to revert to the wording of the current privacy policy, which only allows storing sampled data. odder (talk) 15:35, 9 December 2013 (UTC)
Believe whatever you want, I'm not about to engage in arguing over conspiracy theories. BJorsch (WMF) (talk) 15:25, 10 December 2013 (UTC)
No, it isn't such a euphemism. You use language like this to ensure that the Foundation has the flexibility to run programs that may make use of this capability. LFaraone (talk) 23:18, 1 January 2014 (UTC)
That's my point precisely. odder (talk) 19:47, 7 January 2014 (UTC)
Maybe I missed it somewhere but it would be helpful to listen all data types from the logs. Especially I am interested in this question: Do you save every pageview incl. IP address and/or username? Do you have logs in which you can see what page I have read (!), how long I have read them etc etc. Raymond (talk) 16:55, 9 December 2013 (UTC)
Yes, I also would appreciate to know if you have, or plan, such visitor logs. --Martina Nolte (talk) 19:46, 9 December 2013 (UTC)
+1 --Steinsplitter (talk) 19:48, 9 December 2013 (UTC)
+1, by all means! Ca$e (talk) 09:36, 10 December 2013 (UTC)
+1 ...84.133.109.103 09:38, 10 December 2013 (UTC)
+1 -jkb- 09:41, 10 December 2013 (UTC)
+1 I told you so."Dance" Alexpl (talk) 09:57, 10 December 2013 (UTC)
+1 for showing an example of the currently log data. --Zhuyifei1999 (talk) 10:08, 10 December 2013 (UTC)
+1 ---<(kmk)>- (talk) 13:46, 10 December 2013 (UTC) there is no need to trade my privacy for (even more) funds.
+1 -- smial (talk) 14:33, 11 December 2013 (UTC)
See wikitech for the format of the raw logs we receive from the front end caches. This data is sent 1:1 to a log aggregation server where it gets downsampled in real time. See the filters.*.erb files for what HTTP paths we currently log data on and with what frequency. The format is
pipe <sample rate> filter-program <filter -d specifies project, -p specifies the page> >> <output location>
Mwalker (WMF) (talk) 01:19, 9 January 2014 (UTC)
I do not know of any logs that record all pageviews, or of any plans to start collecting such logs. The logs testwiki.log and test2wiki.log mentioned on wikitech:Logs do contain user information and URL (as part of a larger amount of debugging information) for requests that aren't served from the caches, but only for testwiki and test2wiki which the vast majority of people have no reason to ever visit. I also don't know of any logs or log analyses that show pages read by any user or IP or how long anyone might have spent reading any particular page. BJorsch (WMF) (talk) 15:25, 10 December 2013 (UTC)
I thought the fundraising people already do exactly that and call it "User site behavior collection". (I cant actually tell from that link if those proposals have already been implemented ?!?) Alexpl (talk) 18:04, 10 December 2013 (UTC)
I was not aware of that. Note though that's a proposal and not something that is currently being done. It seems like a useful study though, and it's far from tracking everyone all the time that some of the more paranoid here seem to be expecting. BJorsch (WMF) (talk) 00:52, 11 December 2013 (UTC)
I am pretty sure their intentions were good. But due to the nature of the wikipedia project, it seems a bit "pre Snowden" or just unworldly to believe that WMF can limit the access to those Data once the mechanism to collect them has been installed. The first dude with access, who seeks future employment at a hip company (...), can do irreversable damage and sell every WP-contributor out. Alexpl (talk) 08:58, 12 December 2013 (UTC)
Two things; first, I delayed that experiment until after we'd sorted out the new privacy policy. No code has been written, no code has been deployed. Second, the experiment was designed explicitly to not send any data to the server beyond averages a single time (nor store locally on the client anything beyond counts and times). I wasn't even going to use cookies which could be sniffed from the wire. Data in the experiment that was stored locally would have been useful for statistical correlation if someone had access to your computer (or network connection); but if that was the case they wouldn't need to bother with my data, they would get it directly. I'll point out that the following places document what information would actually be collected: From the RfC, raw source Schema:EventCapsule and Schema:AnonymousUserSiteUsage. If you have concerns specifically about these data, I encourage you to put them on the talk page of the RfC. Mwalker (WMF) (talk) 01:05, 9 January 2014 (UTC)
@User:BJorsch (WMF). If there are no logs and no plans to start collecting them, why does was the draft changed, so that the foundation would be allowed to do just that?---<(kmk)>- (talk) 18:52, 10 December 2013 (UTC)
For the reasons that have been officially stated, perhaps? But really, you'd probably want to ask one of the people involved in drafting this. I just commented here to add a few other potential uses for the ability to collect non-sampled logs when needed, since people seemed to be focusing overmuch on the two examples in the draft. BJorsch (WMF) (talk) 00:52, 11 December 2013 (UTC)
does "I do not know of any logs that record all pageviews, or of any plans to start collecting such logs. " mean Wikimedia does not have such logs and plans or is it meant literally; you BJorsch do not know about it? ...Sicherlich Post 08:39, 11 December 2013 (UTC)
+1 -- smial (talk) 14:32, 11 December 2013 (UTC)
The latter, obviously. I'm certainly not aware of everything everyone associated with the Foundation does or plans, nor am I in any position to set policy. BJorsch (WMF) (talk) 15:19, 11 December 2013 (UTC)
I guess we just assumed that the "WMF" tag in your signature would grant you preferential access to all relevant information on this matter :) Alexpl (talk) 16:57, 11 December 2013 (UTC)

Okay, so now we know the private opinion and asuming of you BJorsch. Is it possible to get an officical statement of the WMF? ...Sicherlich Post 17:20, 12 December 2013 (UTC)

BJorsch's opinion is that users asking these questions are more paranoid. I would sure prefer an official, and hopeully more sober, WMF statement on this logging issue. --Martina Nolte (talk) 17:36, 20 December 2013 (UTC)

Why do you need special logging for WP Zero? PiRSquared17 (talk) 20:27, 20 December 2013 (UTC)

No official reply since December 7

The future?

The Wikimedia Foundation did not manage to post an official reply to my (and other people's) concerns, even though I started this section on December 7, way before the end–of–year holiday period started. I am very disappointed that it seems to always take such a long time to get a reply from the WMF (the same thing happened for the draft access to nonpublic data policy); it has a very paralyzing effect on any discussion, introduces further concerns and worries, and effectively lengthens any consultation period to a state when no one cares anymore. odder (talk) 10:07, 21 January 2014 (UTC)

Maybe they're giving up on this update. It would not be unwise to rewrite everything from scratch. --Nemo 10:09, 21 January 2014 (UTC)
Is the silence an indication that we might need to add WMF to File:Prism slide 5.jpg and that the Foundation therefore is unable to answer? --Stefan2 (talk) 12:07, 21 January 2014 (UTC)

We are working on a response to the question of unsampeld log data. It's taking some time to verify things internally, and we were previously focusing on posting the data retention guidelines (which hopefully answered some of your other questions, in addition to Matt and Brad's other responses above). Thanks for your patience. Stephen LaPorte (WMF) (talk) 21:22, 22 January 2014 (UTC)

Because we now have three separate parts of the discussion addressing unsampled logs, I suggested to Toby that he post a new section and we continue the discussion there. He has done that here, so please follow up there if it leaves unanswered questions. —LVilla (WMF) (talk) 01:09, 5 February 2014 (UTC)

So, what is the purpose of all this?

The following discussion is closed: closing the top section given staleness but leaving unsampled logs area open will archive when both sections done. Jalexander--WMF 22:25, 18 December 2013 (UTC)

I've read the draft from beginning to end, and I have no idea what you wanted me as a user to get from it. What's the purpose, what does it improve compared to the much shorter and more concise current policy which provides very clear and straightforward protections such as the four (4) magic words «Sampled raw log data» (see also #Data retention above)? Is the purpose just adding tracking pixels and cookies for everyone, handwashing (see section above) and generally reducing privacy commitments for whatever reason? --Nemo 21:31, 4 September 2013 (UTC)

Hi Nemo, Thanks for your comment. I outlined some specific reasons for why we needed an update above. YWelinder (WMF) (talk) 01:12, 6 September 2013 (UTC)
See here for Yana's summary. Geoffbrigham (talk) 02:12, 6 September 2013 (UTC)
The summary only says things I already knew, because I read the text. What's missing is the rationale for such changes, or why the changes are supposed to be an improvement. One hint: are there good things that we are not or will not be able to do due to the current policy and what changes are proposed in consequence?
Additionally, the summary doesn't even summarise that well IMHO, e.g. the language about cookies is not very clear and you didn't write anything about making request logs unsampled (which means having logs of all requests a user makes). --Nemo 06:47, 6 September 2013 (UTC)
I've forwarded your question to our tech team. Relevant members of the tech team are out for a conference and will respond to this shortly.YWelinder (WMF) (talk) 01:04, 12 September 2013 (UTC)

Unsampled request logs/tracking

The following discussion is closed: closing as this section appears over (more discussion further down ), please reopen if necessary. Will archive shortly if not. Jalexander--WMF 11:10, 12 February 2014 (UTC)
Hey Nemo!
You have raised the question why we want the ability to store unsampled data and that’s a great question!
Two important use-cases come to mind. The first use case is funnel analysis for fundraising. As you know, we are 100% dependent on the donations by people like you -- people who care about the mission of the Wikimedia movement and who believe in a world in which every single human being can freely share in the sum of all knowledge.
We want to run the fundraiser as short as possible without annoying people with banners. So it’s crucial to understand the donation funnel, when are people dropping out and why. We can only answer those kind of questions if we store unsampled webrequest traffic.
The second use case is measuring the impact of Wikipedia Zero. Wikipedia Zero’s mission is to increase the number of people who can visit Wikipedia on their mobile phone without having to pay for the data charges: this is an important program that embodies our mission. Measuring the impact means knowing how many people (unique visitors) are benefiting from this program. If we can measure this then we can also be transparent to our donors in explaining how their money is used and how much impact their donations are making.
I hope this gives you a better understanding of why we need to store unsampled webrequest data. It is important to note that we will not build long historic reader profiles: the Data Retention Guidelines (soon to be released) will have clear limits on how long we will store this type of data.
Best regards,
(in my role as Product Manager Analytics @ WMF)
Drdee Drdee (talk) 23:03, 12 September 2013 (UTC)
Thank you for your answer. Note that this is only one of the unexplained points of the policy, though probably the most controversial one (and for some reason very well hidden), so I'm making a subsection. I'll wait for answers on the rest; at some point we should add at the top a notice of the expected improvements users should like this policy for (this is the only one mentioned so far apart from longer login duration, if I remember correctly).
Frankly, your answer is worse than anything I could have expected: are you seriously going to tell our half billion users that you want them to allow you to track every visit to our websites in order to target them better for donations and for the sake of some visitors of other domains (the mobile and zero ones)? This just doesn't work. I'm however interested in knowing more.
  • Why does fundraising require unconditional tracking of all visits to Wikimedia projects? If the aim is understanding the "donation funnel" (note: the vast majority of readers of this talk doesn't understand you when you talk like this), why can't they just use something like the ClickTracking done in 2009-2010 for the usability initiative, or the EventLogging which stores or should store only aggregate data (counts) of events like clicks of specific things?
  • I know that Wikipedia Zero has struggled to find metrics for impact measure, but from what I understood we do have some metrics and they were used to confirm that "we need some patience". If we need more statistics so desperately as to desire tracking all our visitors, I assume other less dramatic options have been considered as well? For instance, surely the mobile operators need how much traffic they're giving out for free that they would otherwise charge; how hard can it be for them to provide this number? (Of course I know it's not easy to negotiate with them; but we need to consider the alternatives.) --Nemo 06:51, 13 September 2013 (UTC)
Hi Nemo,
I think you are switching your arguments: first you ask why we would need to store unsampled webrequest data. You specifically asked "are there good things that we are not or will not be able to do due to the current policy and what changes are proposed in consequence?". I give you two use cases both being a type of funnel analysis that require unsampled data (the two use cases are btw not an exhaustive list). Then you switch gears by setting up a Straw man argument and saying that we will use it for better targeting of visitors. That's not what I said, if you read my response then I said we want to know when and why people drop out of a funnel.
The fact that you quote our half billion users indicates that we need unsampled data: we don't know for sure how many unique visitors we have :) We have to rely on third-party estimates. You see even you know of use-cases for unsampled data :)
Regarding Wikipedia Zero: the .zero. domain will soon be deprecated so that will leave us with only the .m. domain so we cannot restrict unsampled storage to .zero. In addition, most Wikipedia Zero carriers do not charge for .m. domains as well.
Regarding the Fundraising: I am answering your question and I am sure you know what a donation funnel is; I was not addressing the general public. EventLogging does not store aggregate data but raw unsampled data.
I am not sure how I can counter your argument 'This just doesn't work'.
Drdee (talk) 19:08, 18 September 2013 (UTC)
I'm sorry that you feel that way, I didn't intend to switch arguments. What does "We want to run the fundraiser as short as possible" mean if not that you want to extract more money out of the banners? That's the argument usually used by the fundraiding team, that the higher the "ROI" is the shorter the campaign will be. If you meant something else I'm sorry, but then could you please explain what you meant?
I'm also sorry for my unclear "This just doesn't work"; I meant that in this section I'm asking why the users, with whom we have a contract, should agree to revise it: what do they gain ("what is the purpose")? I still don't see an answer. For instance, knowing for sure how many unique users we have is not a gain for them; it's just the satisfaction of a curiosity the WMF or wikimedians like me can have.
As for Zero, I don't understand your reply. Are you saying that yes, other ways to get usage stats were considered but only unsampled tracking works? And that I'm wrong when I assume that operators would know how much traffic they're giving for free? --Nemo 14:55, 27 September 2013 (UTC)
Hi Nemo, I'll let other folks chime in to articulate the needs for the Fundraiser and Zero, I am with you on the fact that Wikimedia should collect as little data as possible but let me expand on the point you make about "curiosity regarding UVs". Measuring reach in terms of uniques is more than just a matter of "curiosity". We currently rely on third-party data (comScore) to estimate unique visitors but there are many reasons why we want to reliably monitor high-level traffic data based on uniques. We recently obtained data about the proportion of entries from Google properties as part of a review of how much of our readership depends on search engines. I cite this example because any significant drop in search engine-driven traffic is likely to affect Wikimedia's ability to reach individual donors, new contributors and potential new registered users. Similarly, we intervened in the past to opt out of projects such as Google QuickView based on evidence that they were impacting our ability to reach and engage visitors by creating intermediaries between the user and the content. Using UV data (particularly in combination with User Agents) also helps us determine whether decisions we make about browser support affect a substantial part of our visitor population. As Diederik pointed out, EventLogging does collect unsampled behavioral data about user interaction with our websites to help us run tests and improve site performance and user experience. The exact data collected by EventLogging is specified in these schemas and is subject to the data retention guidelines that the Legal team is in the process of sharing. DarTar (talk) 20:23, 9 December 2013 (UTC)
Because we now have three separate parts of the discussion addressing unsampled logs, I suggested to Toby that he post a new section and we continue the discussion there. He has done that here, so please follow up there if it leaves unanswered questions. —LVilla (WMF) (talk) 01:08, 5 February 2014 (UTC)

Strip Wikimedia Data Collection to the Barest Minimum - Further Considerations

The following discussion is closed: closing as the discussion appears over, please reopen if necessary. Will archive shortly if not. Jalexander--WMF 19:24, 11 February 2014 (UTC)

Thanks Privacycomment for this post. I just want to add my perspective with some ideas on how to look at data-relevant processes in general and how to use the artificial differences in national laws on an action done in the physical or digital world.

  • First and foremost Wikipedia is a labor of love of knowledge nerds worldwide. This means that it is from an outside view an "international organization" much like the Red Cross - only to battle information disasters. This could be used to get servers and employees special status and protections under international treaties (heritage, information/press etc)
  • History teaches that those protections might not be a sufficient deterrent in heated moments of national political/legal idiocy, so Wikimedia should enact technical as well as content procedures to minimize the damage.

Data Protection

  • Collect as few data as possible and purge it as fast as possible. Period. You cannot divulge what you do not have.
  • Compartmentalize the data so that a breach - let's say in the US - does not automatically give access to data of other countries' userbases.
  • Play with laws: as there are a lot of protections well established when used against homes, or private property shape your installation and software to imitate those - no "official" central mail server that can be accessed with provider legislature, but a lot of private servers that are each protected and must be subpoenaed individually etc...
  • Offer a privacy wikipedia version that can only be accessed via tor - and where nothing is stored (I know this might be too much to admin against spam pros)
  • Use Perfect forward secrecy, hashes etc to create a situation, where most of the necessary information can be blindly validated without you having any possibility to actually see the information exchanged. This also helps with legal problems due to deniability. Again - compartmentalize.

Physical and digital infrastructure concerns

  • An internal organization along those lines and with the Red Cross as an example would offer a variety of possibilities when faced with legal threats: First and foremost, much like choosing where to pay taxes, one could quickly relocate the headquarters for a specific project to another legal system so that one can proof, that e.g. the US national chapter of wikimedia has no possible way of influencing let's say the Icelandic chapter who happens to have a national project called wikipedia.org
  • Another important step in being an international and truly independent organization is to finally use the power of interconnected networks and distribute the infrastructure with liberal computer legislation in mind much more as is now the case. Not to compare the content - just the legal possibilities - of the megaupload case with those of wikimedia, as long as US authorities have physical access to most of the servers, they do not need to do anything but be creative with domestic laws to hurt the organisation and millions of international users, too...
  • If this might be too difficult, let users choose between different mirrors that also conform to different IT legislation

Information Activism

  • Focus on a secure mediawiki with strong crypto, which can be deployed by information activists

So: paranoia off. But the problem really is that data collected now can and will be abused in the next 10, if not 50-100 years. If we limit the amount of data and purge data, those effects can be minimized. No one knows if something that is perfectly legal to write now might not bite one in the ass if legislation is changed in the future.

Cheers, --Gego (talk) 13:53, 9 September 2013 (UTC)

Hi Gego,
The idea of having a secure mediawiki with strong crypto is a technical proposal and as such is best to be presented as an RFC on Mediawiki but it's outside the scope of the new Privacy Policy.
Drdee (talk) 00:40, 7 November 2013 (UTC)
Hello @Gego: I appreciate the vision that you have for a privacy conscious Wikipedia, and I hope some of your points are consistent with our intention in the privacy policy's introduction. If you are reading Wikipedia, you can currently access it via Tor, and data dumps may enable more alternative opportunities to view Wikipedia. As Dario explained below, there are some practical benefits to collecting limited amounts of data, and we describe how that data is retained in the draft data retention guidelines. Building more distributed corporate infrastructure is a complex problem (for example, being present in multiple legal jurisdictions often increases the cost of compliance, and may require removing content for defamation, copyright, or other unexpected legal reasons), and it is probably beyond the scope of this draft update to the privacy policy. Thank you for raising these points, and I am glad you thinking about these issues on a long term basis. Best, Stephen LaPorte (WMF) (talk) 23:53, 9 January 2014 (UTC)

Information about passive readers

The following discussion is closed: closing as it seems the discussion in this thread is over, more discussion below in thread started by Toby. Will archive shortly if still closed. Jalexander--WMF 00:30, 13 February 2014 (UTC)

There's a lot of discussion about the data collected from those who edit pages, but what about those who passively read Wikipedia? I can't figure out what's collected, how long it's stored, and how it's used.

Frankly I don't see why ANY personally identifiable information should EVER be collected from a passive reader. In the good old days when I went to the library to read the paper encyclopaedia, no one stood next to me with a clipboard noting every page I read or even flipped past. So why should you do that now?

I don't object to real time statistics collection, e.g., counting the number of times a page is read, listing the countries from which each page is read from at least once, that sort of thing. But update the counters in real time and erase the HTTP GET log buffer without ever writing it to disk. If you decide to collect some other statistic, add it to the real-time code and start counting from that point forward.

Please resist the strong urge to log every single HTTP GET just because you can, just in case somebody might eventually think of something interesting to do with it someday. This is EXACTLY how the NSA thinks and it's why they store such a terrifying amount of stuff. 2602:304:B3CE:D590:0:0:0:1 14:54, 10 September 2013 (UTC)

2602, I will be linking to this comment from below but you may be interested in the section started at the bottom of the page at Tracking of visited pages . Jalexander (talk) 03:37, 11 September 2013 (UTC)

There are a number of use cases for collecting unsampled data, including generating detailed understandings on how readers interact with Wikipedia content and how this might change over time, finding and identifying very low frequency (but important) events, and and looking at interactions with long-tail content that may reveal new sources of editors. But it's important to understand that we are interested in the behavior of Wikipedia readers, but in aggregate, not as individuals. TNegrin (WMF) (talk) 01:49, 19 December 2013 (UTC)

Dear 2602,
We need to store webrequest data for a very limited time from a security point of view: in case of a DDoS we need to be able to investigate where it originates and block some ip ranges. Sometimes we need to verify whether we are reachable from a certain country. And there other uses cases so not storing webrequest is not an option. The Data Retention guidelines, which will be published soon, will put clear timeframes on how long we can store webrequest data.
I hope this addresses your concern.
Best, Drdee (talk) 00:51, 7 November 2013 (UTC)
The current policy only allows sampled logs. Are you saying that the sysadmins are currently unable to protect the sites from DDoS? I never noticed.
Also, https://blog.archive.org/2013/10/25/reader-privacy-at-the-internet-archive/ , linked below, shows it definitely is an option. --Nemo 10:07, 8 November 2013 (UTC)
Fuller discussion of unsampled logs below now. As I state there, the current policy does allow unsampled logs. —LVilla (WMF) (talk) 01:10, 5 February 2014 (UTC)

English Wikipedia account creation procedures violate the WMF Privacy Policy

The following discussion is closed: closing as this discussion appears done, please reopen if necessary. Will archive shortly if not. Jalexander--WMF 11:14, 12 February 2014 (UTC))

http://wikimediafoundation.org/wiki/Privacy_policy#Discussions says, "Users are not required to list an email address when registering." I would like to inquire upon the requirement of the English Wikipedia to provide an email account when registering for an account as indicated with the instruction "The first is a username, and secondly, a valid email address that we can send your password to (please don't use temporary inboxes, or email aliasing, as this may cause your request to be rejected)." (Emphasis not added, it is originally in bold.) There is no option to register without giving an email address and this appears to be a prima facie violation of the WMF Privacy Policy. Thank you. 134.241.58.251 20:57, 21 January 2014 (UTC)

Hi, there is no violation of the privacy policy. An email is required to create an account on your behalf and for us to communicate the password across to you. Also, the email is not listed at all. It is kept on record for ~90 days per the Wikimedia's Data retention policy and is only displayed to the user who is creating the account plus other users who have a valid reason to view it. So, there is no violation, just technical limitation. John F. Lewis (talk) 21:04, 21 January 2014 (UTC)
The labs interface for creating an account (an unofficial method) is separate from MediaWiki's actual account creation (the official method) in that the former is used to request someone other than yourself use the latter to create the account for you by-proxy and then turn it over to you. This is potentially done for numerous reasons, including being affected by account-creation-blocked blocks (particularly from abusive schools with a shared IP), being unable to read CAPTCHAs, getting nabbed by the automated similar-account-name extension, and so forth. From the account creator standpoint, they'd ask the same questions and require the same information as needed on the labs interface. For example, as John F. Lewis said, an email account to email the newly created account's password to is one obvious reason. However, when it comes to abusive shared IPs, an email address, especially from the blocked school or other organization, may be the only way to distinguish one user from another. As such, providing an email address in that instance would be the difference between someone actually being willing to create the account for you versus being summarily declined as just another abusive user from that institution. --slakr 21:45, 21 January 2014 (UTC)
This does present a very clear and unambiguous violation of the WMF Privacy Policy. It is unclear how anyone could 1) defend it and 2) allow it to continue. 24.61.9.111 00:03, 22 January 2014 (UTC)

Thank you for your comments! John F. Lewis and Slakr are correct that there are valid reasons for Labs to request an email address when you create an account (sending a password to you, etc.). This is not a violation of the Privacy Policy because when the Policy states that you are only required to provide a user name and password to create an account, it is referring to a standard account. Labs accounts are non-standard accounts, and so it is not a violation of the Policy to require users to provide more information. RPatel (WMF) (talk) 20:36, 22 January 2014 (UTC)

@RPatel (WMF): You are misinterpreting the discussion. The question was about "standard accounts" (or as I would call them MediaWiki/SUL accounts), but those that are created by request. Have you ever heard of w:WP:ACC? Basically you request that they create an account for and provide an email so they can give you a temp password. The account created is a standard account, no different from any other Wikipedia/Wiktionary/Wikibooks/etc. account except for how it was created (non-standard creation method?). This discussion is not about Labs or wikitech accounts, which are very different from standard accounts. PiRSquared17 (talk) 21:46, 22 January 2014 (UTC)
Hi PiRSquared17, you’re right that I was responding to only one part of the discussion. To answer the initial question (which links to the Labs account creation page), Labs accounts are non-standard accounts and may require additional information to create. To create a standard account, a user generally does not have to provide more than a username and password, but, as you pointed out, if an individual is unable to complete the CAPTCHA or has an issue getting his or her desired username, then there is a process run by ACC that requires that individual to provide an email address to create an account. To cover this exception, what if we added the following language to the FAQ:
Typically you do not need to provide more than a username and password to create a standard account with WMF; however, if you create a standard account using a system run by a third party, then you may be required to provide additional information, such as an email address.
Also, to address 24's comment, while we certainly hope that in the future we can come up with a different way for those who have difficulties with CAPTCHA to create an account, we do not view the ACC process as violating the Privacy Policy. Please let me know if you have any thoughts, thank you!
Thank you, all, for the above replies. Regarding this standard vs non-standard account business all I am referring to is making an actual, regular, no-nonsense account for the English Wikipedia. I cannot use the above linked account creator system because it requires an email account. This requirement is a clear and unambiguous violation of the Wikimedia Foundation Privacy Policy. Full stop. Any suggestion that it is not a violation due to some sort of disconnect between the account creation system and the Wikipedia is absurd in the extreme and entirely rejected. Until such a time that the email requirement is obviated there will be and continues to be a privacy policy violation. 134.241.58.251 15:25, 23 January 2014 (UTC)
And it will be removed, when at such a time the MediaWiki interface evolves to automatically know who you are and log you in without a password. Until then, an email is required to send you the password. John F. Lewis (talk) 15:54, 23 January 2014 (UTC)
When there is a conflict between a technical process and the privacy policy the technical component must be immediately removed, with no delay whatsoever until such a time that the technical component can be brought into compliance with the privacy policy. Certainly you agree that privacy trumps technical widgets? 134.241.58.251 16:31, 28 January 2014 (UTC)
I'm sorry I misunderstood your original question and thought you were linking to a Labs account creation page. But please note that the link is still to a third-party process for making a standard account. We have adjusted the the Policy and the FAQ section to reflect that third-party processes may require you to provide information such as an email address. RPatel (WMF) (talk) 20:34, 28 January 2014 (UTC)
Hello. You're confusing me a bit, RPatel. Nobody else in this discussion but you brought up Labs accounts. https://accounts.wmflabs.org/ is for ACC (to request an account on enwiki, not a Labs account, a real standard account). "To answer the initial question (which links to the Labs account creation page), Labs accounts are non-standard accounts". No, it links to the ACC interface, which is for normal standard old MediaWiki SUL enwiki accounts, not Labs/Gerrit or any other type of account from what I can tell. Maybe I'm misinterpreting your comments. Anyway, your addition would solve the problem IMO. PiRSquared17 (talk) 15:59, 23 January 2014 (UTC)
Hi PiRSquared17, you did not misinterpret; I misunderstood the initial question (because of the "labs" part of the URL and being unfamiliar with the ACC process). Sorry about that. Nevertheless, the language above was changed in the Policy and the FAQ section, so that the ACC process is not in violation. Thanks for your help in clarifying the question for me. RPatel (WMF) (talk) 20:34, 28 January 2014 (UTC)

"Hide the information in your user page"

The following discussion is closed: closing as it appears resolved/set please reopen if not. Will archive shortly if still closed. Jalexander--WMF 00:31, 13 February 2014 (UTC)

The section "Account Information & Registration" has the following sentence: Once created, user accounts cannot be removed entirely (although you may hide the information in your user page). As far as I'm concerned, the part in brackets is simply untrue. Non-admins do not have the ability to remove revisions of their user page from public view, and if a user's user page is protected or that user is blocked, they are also unable to remove any information from the current revision. darkweasel94 (talk) 20:31, 28 January 2014 (UTC)

Maybe they mean blanking the page or adding NOINDEX. PiRSquared17 (talk) 18:08, 1 February 2014 (UTC)
Not even that can necessarily be done by everybody, depending on the wiki's sysop decisions. That's my point. darkweasel94 (talk) 18:14, 1 February 2014 (UTC)
Actually, what was meant by "hide the information" was to blank the page. That's true for the vast majority of cases. If a user's user page is protected from being edited by themselves, I would hope that a request using a local {{edit protected}} or similar template would be honored. I would think that the very few circumstances where it might not are extreme edge cases, no? Philippe (WMF) (talk) 01:19, 4 February 2014 (UTC)
If it's written in a privacy policy, users expect it to be always true, not just in the vast majority of cases. Things that depend on local admins' goodwill shouldn't be in a privacy policy. darkweasel94 (talk) 07:18, 4 February 2014 (UTC)
Hi Dearkweasel94 and PiRSquared17. Thank you for your suggestion. I have edited the policy draft so that it reads: "Once created, user accounts cannot be removed entirely (although you can usually hide the information on your user page if you choose to)." Hope that addresses your concerns. Mpaulson (WMF) (talk) 21:44, 11 February 2014 (UTC)
I think it would be preferable to remove the part in brackets entirely, but it's good enough. Thank you. darkweasel94 (talk) 21:52, 11 February 2014 (UTC)

Summaries making even longer?

The following discussion is closed: closing as this discussion appears done, please reopen if necessary. Will archive shortly if not. Jalexander--WMF 11:14, 12 February 2014 (UTC))

Tell me I'm dreaming, please: [7]. I proposed summaries on the page side to take away all those parts of the main text which are not a policy but rather digressions, and now you add even more? And on the same page, so that translators can't even choose to translate either the summaries or the meat, but have to translate both? When will you be satisfied, when this already-weak proposal is ten times as long as the current policy and no single language has a complete translation? --Nemo 08:44, 29 January 2014 (UTC)

As we've pointed out repeatedly, if you want a short privacy policy, there are three options: the policy can allow you to collect nothing, it can confuse people about what you collect, or it can allow you to collect everything. None of those are satisfying options, so we have a long policy. I realize you disagree with our choice to go into great detail about what we collect and how we handle it, but that discussion is settled. (You're still welcome to propose changes that shorten and clarify; after you pushed me on one section, I was able to take 200 words out, which I appreciate. Deleting the whole section, as you originally proposed, is still not welcome :)
There were multiple requests above for summaries explaining each section, and it is considered a good practice to summarize these (admittedly long) sections to help people find the relevant and important parts. That's why we added them. I agree and sympathize that this is a difficult challenge for the translators, but that is only one factor among many. (I know Michelle also has plans with regards to getting professional translators for the final draft, as we did with the initial draft, but I'm not sure what those are so I'll let her speak to that.) —LVilla (WMF) (talk) 01:06, 5 February 2014 (UTC)
We understand the length of the policy presents challenges for translators. We are hoping to alleviate some of this pressure by providing professional translations, when the policy is finalized and adopted by the Board, in the same languages we originally made the draft available in. Of course, these translations can always be improved upon, so we are grateful to the community for any additional help they can provide in improving the professional translations. We wish we could provide professional translations in every language for the community to use as a starting point, but unfortunately, professional translations are extremely expensive to the point that it's cost-prohibitive. Mpaulson (WMF) (talk) 01:13, 5 February 2014 (UTC)

Display IP numbers establish a two-class privacy policy

The following discussion is closed: closing as this discussion appears done, please reopen if necessary. Will archive shortly if not. Jalexander--WMF 11:14, 12 February 2014 (UTC))

I am strictly against the public display of IP addresses of unlogged users. Even user who accidently not logged in, for technical or non-technical reasons, are victims of this unfairness. The line “’’ However, if you contribute without signing in, your contribution will be publicly attributed to the IP address associated with your device.’’” discovers the special disadvantage of not signing in. But there is no real reason why. Of course, there is no technical reason, because the separation of presentation, workflow and hardware control is part of every modern software development. It seems the only good reason for displaying the IP addresses is to build up a pressure for get an account - like the commercial part of internet do... --Gamma (talk) 16:10, 31 January 2014 (UTC)

How would you suggest to tell different logged-out users apart if not through their IP address? It's not particularly hard to make an account. If you can type two matching passwords, you're in. darkweasel94 (talk) 17:53, 31 January 2014 (UTC)

Hm, if you are a vandal, you currently have to think twice before you make destructive edits. At least at job or at school, your boss or teacher will be able to identify the IPs associated to their system. From that point of view, a public IP can protect the WP. Alexpl (talk) 08:12, 1 February 2014 (UTC)

Hi Gamma, thank you for your comment. I just wanted to refer you to a discussion earlier in the consultation period about hiding the IP addresses of users. That discussion also includes a response from Erik (Eloquence), our head of engineering. Hope that addresses your concern. RPatel (WMF) (talk) 00:51, 4 February 2014 (UTC)

Electronic Frontier Foundation's Best Practices for Online Service Providers

The following discussion is closed: closing as this discussion appears done, please reopen if necessary. Will archive shortly if not. Jalexander--WMF 11:15, 12 February 2014 (UTC))

Does WMF currently follow the best practices described by the EFF https://www.eff.org/wp/osp ? Does the proposed text ensure it does/will? --Nemo 12:10, 4 February 2014 (UTC)

We consulted the document when first drafting the policy, and actually went to EFF's offices to talk about it, but I haven't looked at it since then. My recollection is that our compliance is mixed: some things this draft complies with (e.g., on subpoenas/warrants; privacy policies generally); some things we don't (e.g., Safe Harbor, as already discussed; the discussion on opt-out, which I think is now obsolete because of DNT); some things we don't currently but will (e.g., data obfuscation/aggregation, some of the recommendations on HTTPS). We're also aiming to hit six stars on their Who Has Your Back report next year. —LVilla (WMF) (talk) 00:29, 5 February 2014 (UTC)
Thanks. That document is not comprehensive, nor 100 % relevant to WMF, nor up to date, but it does contain some suggestions which make me think it would be nice to be able to say that we follow its suggestions, or their principles, where relevant. Random example: "Obscuring the third, fourth and sixth octets of all MAC addresses" is probably not relevant but something similar could be done by analogy for unique device identification numbers.
Tabular reports and rankings are a nice comprehension tool, indeed: nice to hear about the 6 "Who Has Your Back" stars plans, also nice would be a good score at the Encrypt the Web Report. --Nemo 21:57, 5 February 2014 (UTC)

Data Retention Guidelines posted

We're happy to announce that the first draft of the new data retention guidelines are now available for your review, feedback, and translation. This draft is the result of a collaboration between many teams within the Foundation, including Analytics, Operations, Platform, Product, and Legal.

As with the other privacy documents, this draft is just that: a draft. We want to hear from you about how we can make it better. As suggested in the discussion about timelines above, we plan to hold the community consultation period for this draft open until 14 February 2014.

Thanks - looking forward to the discussion. —LVilla (WMF) (talk) 21:30, 9 January 2014 (UTC)

Great to see. I've commented on Talk:Data retention guidelines. //Shell 00:30, 10 January 2014 (UTC)

Non-English speakers beg some love

Just two weeks before the proposed end date for this consultation, the translation statistics are very depressing: the top language is French with 80 % translated, only 5 languages are more than 2/3 translated. This means that 4 months have been wasted not involving the global community in the discussion and not spotting translatability issues that will bite later.
One obvious reason here is that the draft is a +200 % length increase compared to the current policy: otherwise, we'd have roughly 14 languages fully translated instead of 0. If the WMF staff is serious about making a privacy policy that people can understand, well of course that's not easy and it probably entails rewriting it from scratch under new premises, to embed the initial feedback received so far and reduce the length by about 66 %. --Nemo 10:28, 30 December 2013 (UTC)

If the document is too long, have you tried editing it down? :-) --MZMcBride (talk) 10:30, 30 December 2013 (UTC)
I've made some specific edit proposals but Luis declared they were not "serious" (though in the end he did make some minor changes). I didn't bother making more. --Nemo 10:32, 30 December 2013 (UTC)
If you want to suggest edits that make the language clearer without changing the policy, by all means - many people have done that and gotten changes in, including you, and the policy is better (and shorter) for it. (The unserious suggestion, if I recall correctly, essentially amounted to removing an entire section of the document.) -LVilla (WMF) (talk)
I don't know what's aggregated in this group but the policy document is fully translated into several languages, including French and German. — Pajz (talk) 12:06, 30 December 2013 (UTC)
No it isn't, the "More On What This Privacy Policy Doesn’t Cover" section is not translated at all in German. The only group you should use is the complete one I linked above. --Nemo 20:14, 30 December 2013 (UTC)
And how is one supposed to do that (see #(Technical:) Cannot translate navigation box title above)? — Pajz (talk) 22:11, 30 December 2013 (UTC)
Like the rest of the policy, this was originally translated into formal German (as well as four other widely-used language) by paid translators. It is now untranslated because we rewrote the whole section, making it 1/6th shorter (at Nemo's request and in part based on his suggestions), and the translators haven't caught up. This is unfortunate, but also unavoidable when you make changes to a translated document.
We can't have the professional translators re-translate every time we make a change - besides the money, the overhead of entering the new translations in with every change would be huge. (And my understanding is that the volunteer translators often aren't happy with the quality of the professional translations anyway :) So we can either slow the editing of the policy, or accept that sometimes sections of it will not be translated while we're discussing it. I admit neither option is ideal, but I think we have made the right choice in leaning towards fast changes.
If there are things we're doing wrong that are hindering the volunteer translators, I'm happy to listen to suggestions on that front - I do think we've been fixing translation software mistakes as quickly as we can, but if not, let me know. -LVilla (WMF) (talk) 01:52, 31 December 2013 (UTC)
If it's really essential that we get this translated beyond 5 languages, why don't we just pay for translation in the top 10-20 languages? This is too important to wait on. Steven Walling (WMF) • talk 00:53, 31 December 2013 (UTC)
Hi Nemo, what's stopping you from sending out a call for volunteer translators via the usual two channels - the translation notifications system and/or the Translators-l mailing list? As far as I can see, neither has been done in this case so far, so it's likely that many interested translators do not yet know about this translation task. (Regarding the first channel, there is a little technical issue in that notifications can only be sent for translatable pages, not for aggregated groups - cf. bug 56187 - but that can be mitigated by sending at least a notification for the main page, and linking the aggregated group in the accompanying text message.)
Regards, Tbayer (WMF) (talk) 03:42, 31 December 2013 (UTC)
Done I eventually sent out a translation notification myself (with an emphasis on main text of the privacy policy, to help translators focus their energy, but also inviting translation of the whole group). Many thanks to the volunteers who have since then already translated or updated around 500 translation units; hopefully more will be done over the coming days. So if it's really the case that there are indeed still serious translation problems remaining, we should have a good chance to uncover them before the deadline a week from now.
BTW, we are planning to do the same for the draft for the new data retention policy, which is going to be published soon.
Regards, Tbayer (WMF) (talk) 23:34, 8 January 2014 (UTC)

Dearchived, I don't see any solution here. My eleemosynary skills are clearly lacking: be careful to beg a coin, you may get a kick and a punch. --Nemo 09:07, 29 January 2014 (UTC)

Handling our user data - an appeal

Preface (Wikimedia Deutschland)

For several months, there have been regular discussions on data protection and the way Wikimedia deals with it, in the German-speaking community – one of the largest non-English-speaking communities in the Wikimedia movement. Of course, this particularly concerns people actively involved in Wikipedia, but also those active on other Wikimedia projects.

The German-speaking community has always been interested in data protection. However, this particular discussion was triggered when the Deep User Inspector tool on Tool Labs nullified a long-respected agreement in the Toolserver, that aggregated personalized data would only be available after an opt-in by the user.

As the Wikimedia Foundation is currently reviewing its privacy policy and has requested feedback and discussion her by 15 January, Wikimedia Deutschland has asked the community to draft a statement. The text presented below was largely written by User:NordNordWest and signed by almost 120 people involved in German Wikimedia projects. It highlights the many concerns and worries of the German-speaking community, so we believe it can enhance the discussion on these issues. We would like to thank everyone involved.

This text was published in German simultaneously in the Wikimedia Deutschland-blog and in the Kurier, an analogue to the English "Signpost". This translation has been additionally sent as a draft to the WMF movement-blog.

(preface Denis Barthel (WMDE) (talk), 20.12.)

Starting position

The revelations by Edward Snowden and the migration of programs from the Toolserver to ToolLabs prompted discussions among the community on the subject of user data and how to deal with it. On the one hand, a diverse range of security features are available to registered users:

  • Users can register under a pseudonym.
  • The IP address of registered users is not shown. Only users with CheckUser permission can see IP addresses.
  • Users have a right to anonymity. This includes all types of personal data: names, age, background, gender, family status, occupation, level of education, religion, political views, sexual orientation, etc.
  • As a direct reaction to Snowden’s revelations, the HTTPS protocol has been used as standard since summer 2013 (see m:HTTPS), so that, among other things, it should no longer be visible from outside which pages are called up by which users and what information is sent by a user.

On the other hand, however, all of a user’s contributions are recorded with exact timestamps. Access to this data is available to everyone and allows the creation of user profiles. While the tools were running on the Toolserver, user profiles could only be created from aggregated data with the consent of the user concerned (opt-in procedure). This was because the Toolserver was operated by Wikimedia Deutschland and therefore subject to German data protection law, one of the strictest in the world. However, evaluation tools that were independent of the Foundation and any of its chapters already existed.

One example is Wikichecker, which, however, only concerns English-language Wikipedia. The migration of programs to ToolLabs, which means that they no longer have to function in accordance with German data protection law, prompted a survey of whether a voluntary opt-in system should still be mandatory for X!’s Edit Counter or whether opt-in should be abandoned altogether. The survey resulted in a majority of 259 votes for keeping opt-in, with 26 users voting for replacing it with an opt-out solution and 195 in favor of removing it completely. As a direct reaction to these results, a new tool – Deep User Inspector – was programmed to provide aggregated user data across projects without giving users a chance to object. Alongside basic numbers of contributions, the tool also provides statistics on, for example, the times on weekdays when a user was active, lists of voting behavior, or a map showing the location of subjects on which the user has edited articles. This aggregation of data allows simple inferences to be made about each individual user. A cluster of edits on articles relating to a certain region, for example, makes it possible to deduce where the user most probably lives.

Problems

Every user knows that user data is recorded every time something is edited. However, there is a significant difference between a single data set and the aggregated presentation of this data. Aggregated data means that the user’s right to anonymity can be reduced, or, in the worst case, lost altogether. Here are some examples:

  • A list of the times that a user edits often allows a deduction to be made as to the time zone where he or she lives.
  • From the coordinates of articles that a user has edited, it is generally possible to determine the user’s location even more precisely. It would be rare for people to solely edit area X, when in fact they came from area Y.
  • The most precise deductions can be made by analyzing the coordinates of a photo location, as it stands to reason that the user must have been physically present to take the photo.
  • Places of origin and photo locations can reveal information on the user’s means of transport (e.g. whether someone owns a car), as well as on his or her routes and times of travel. This makes it possible to create movement profiles on users who upload a large number of photos.
  • Time analyses of certain days of the year allow inferences to be drawn about a user’s family status. It is probable, for example, that those who tend not to edit during the school holidays are students, parents or teachers.
  • Assumptions on religious orientation can also be made if a user tends not to edit on particular religious holidays.
  • Foreign photo locations either reveal information about a user’s holiday destination, and therefore perhaps disclose something about his or her financial situation, or suggest that the user is a photographer.
  • If users work in a country or a company where editing is prohibited during working hours, they are particularly vulnerable if the recorded time reveals that they have been editing during these hours. In the worst-case scenario, somebody who wishes to harm the user and knows extra information about his or her life (which is not unusual if someone has been an editor for several years) could pass this information on to the user’s employer. Disputes within Wikipedia would thus be carried over into real life.

Suggestions

Wikipedia is the fifth most visited website in the world. The way it treats its users therefore serves as an important example to others. It would be illogical and ridiculous to increase user protection on the one hand but, on the other hand, to allow users’ right to anonymity to be eroded. The most important asset that Wikipedia, Commons and other projects have is their users. They create the content that has ensured these projects’ success. But users are not content, and we should make sure that we protect them. The Wikimedia Foundation should commit to making the protection of its registered users a higher priority and should take the necessary steps to achieve this. Similarly to the regulations for the Toolserver, it should first require an opt-in for all the tools on its own servers that compile detailed aggregations of user data. Users could do this via their personal settings, for example. Since Wikipedia was founded in 2001, the project has grown without any urgent need for these kinds of tools, and at present there seems to be no reason why this should change in the future. By creating free content, the community enables Wikimedia to collect the donations needed to run WikiLabs. That this should lead to users loosing their right of anonymity, although the majority opposes this, is absurd. To ensure that user data are not evaluated on non-Wikimedia servers, the Foundation is asked to take the following steps:

  • Wikipedia dumps should no longer contain any detailed user information. The license only requires the name of the author and not the time or the day when they edited.
  • There should only be limited access to user data on the API.
  • It might be worth considering whether or not it is necessary or consistent with project targets to store and display the IP addresses of registered users (if they are stored), as well as precise timestamps that are accurate to the minute of all their actions. The time limit here could be how long it reasonably takes CheckUsers to make a query. After all, data that are not available cannot be misused for other purposes.

Original signatures

  1. Martina Disk. 21:28, 24. Nov. 2013 (CET)
  2. NNW 18:52, 26. Nov. 2013 (CET)
  3. ireas :disk: 19:23, 26. Nov. 2013 (CET)
  4. Henriette (Diskussion) 19:24, 26. Nov. 2013 (CET)
  5. Raymond Disk. 08:38, 27. Nov. 2013 (CET)
  6. Richard Zietz 22px|8)|link= 22:18, 27. Nov. 2013 (CET)
  7. Alchemist-hp (Diskussion) 23:47, 27. Nov. 2013 (CET)
  8. Lencer (Diskussion) 11:54, 28. Nov. 2013 (CET)
  9. Smial (Diskussion) 00:09, 29. Nov. 2013 (CET)
  10. Charlez k (Diskussion) 11:55, 29. Nov. 2013 (CET)
  11. elya (Diskussion) 19:07, 29. Nov. 2013 (CET)
  12. Krib (Diskussion) 20:26, 29. Nov. 2013 (CET)
  13. Jbergner (Diskussion) 09:36, 30. Nov. 2013 (CET)
  14. TMg 12:55, 30. Nov. 2013 (CET)
  15. AFBorchertD/B 21:22, 30. Nov. 2013 (CET)
  16. Sargoth 22:06, 2. Dez. 2013 (CET)
  17. Hilarmont 09:27, 3. Dez. 2013 (CET)
  18. --25px|verweis=Portal:Radsport Poldine - AHA 13:09, 3. Dez. 2013 (CET)
  19. XenonX3 – (RIP Lady Whistler) 13:11, 3. Dez. 2013 (CET)
  20. -- Ra'ike Disk. LKU WPMin 13:19, 3. Dez. 2013 (CET)
  21. --muns (Diskussion) 13:22, 3. Dez. 2013 (CET)
  22. --Hubertl (Diskussion) 13:24, 3. Dez. 2013 (CET)
  23. --Aschmidt (Diskussion) 13:28, 3. Dez. 2013 (CET)
  24. Anika (Diskussion) 13:32, 3. Dez. 2013 (CET)
  25. K@rl 13:34, 3. Dez. 2013 (CET)
  26. --DaB. (Diskussion) 13:55, 3. Dez. 2013 (CET) (Auch wenn ich das mit den Dumps etwas übertrieben finde.)
  27. --AndreasPraefcke (Diskussion) 14:05, 3. Dez. 2013 (CET) Gerade das mit den Dumps ist wichtig, und auch auf den Wikipedia-Websites sollte diese Info nicht angezeigt werden. So ungefähr (nicht genauer durchdacht, nur als ungefähre Idee): Edits von heute: wie gehabt sekundengenau angezeigt, Edits von dieser Woche: minutengenau, Edits der letzten sches Wochen: stundengenau, Edits der letzten 12 Monate: tagesgenau, Edits davor: monatsgenau – die Reihenfolge muss natürlich gewahrt werden; Edits und darauffolgende reine Reverts: ganz aus der Datenbank raus)
    Man sollte aber trotz berechtigter Interessen am Datenschutz nicht vergessen, dass diese Art der Datums-/Zeitbeschneidung ein zweischneidiges Schwert ist. Versionsgeschichtenimporte einerseits und URV-Prüfungen andererseits würden deutlich erschwert ;-) -- Ra'ike Disk. LKU WPMin 14:19, 3. Dez. 2013 (CET) (wobei für letzteres eine tagesgenaue Anzeige für den Vergleich mit Webarchiv reichen würde)
  28. --Mabschaaf 14:08, 3. Dez. 2013 (CET)
  29. --Itti 14:28, 3. Dez. 2013 (CET)
  30. ...Sicherlich Post 14:52, 3. Dez. 2013 (CET)
  31. --Odeesi talk to me rate me 16:29, 3. Dez. 2013 (CET)
  32. --gbeckmann Diskussion 17:23, 3. Dez. 2013 (CET)
  33. --Zinnmann d 17:24, 3. Dez. 2013 (CET)
  34. --Kolossos 17:41, 3. Dez. 2013 (CET)
  35. -- Andreas Werle (Diskussion) (heute mal "ohne" Zeitstempel...)
  36. --Gleiberg (Diskussion) 18:03, 3. Dez. 2013 (CET)
  37. --Jakob Gottfried (Diskussion) 18:30, 3. Dez. 2013 (CET)
  38. --Wiegels „…“ 18:55, 3. Dez. 2013 (CET)
  39. --Pyfisch (Diskussion) 20:29, 3. Dez. 2013 (CET)
  40. -- NacowY Disk 23:01, 3. Dez. 2013 (CET)
  41. -- RE rillke fragen? 23:17, 3. Dez. 2013 (CET) Ja. Natürlich nicht nur die API, sondern auch die "normalen Seiten" (index.php) sollten ein (sinnvolles) Limit haben. Eine Einschränkung von Endanwendungen durch Richtlinien lehne ich ab, genauso wie überstürztes Handeln. Man wird viel abwägen müssen und eventuell Ausnahmen für bestimmte Benutzergruppen schaffen müssen oder neue Wege, Daten darzustellen. Checkuser-Daten werden meines Wissens automatisch nach 3 Mon. gelöscht: S. User:Catfisheye/Fragen_zur_Checkusertätigkeit_auf_Commons#cite_ref-5
  42. --Christian1985 (Disk) 23:25, 3. Dez. 2013 (CET)
  43. --Jocian 04:45, 4. Dez. 2013 (CET)
  44. -- CC 04:50, 4. Dez. 2013 (CET)
  45. --Don-kun Diskussion 07:10, 4. Dez. 2013 (CET)
  46. --Zeitlupe (Diskussion) 09:09, 4. Dez. 2013 (CET)
  47. --Geitost 09:25, 4. Dez. 2013 (CET)
  48. Everywhere West (Diskussion) 09:29, 4. Dez. 2013 (CET)
  49. -jkb- 09:29, 4. Dez. 2013 (CET)
  50. -- Wurmkraut (Diskussion) 09:47, 4. Dez. 2013 (CET)
  51. Simplicius Hi… ho… Diderot! 09:53, 4. Dez. 2013 (CET)
  52. --Hosse Talk 12:49, 4. Dez. 2013 (CET)
  53. Port(u#o)s 12:57, 4. Dez. 2013 (CET)
  54. --Howwi (Diskussion) 14:26, 4. Dez. 2013 (CET)
  55.  — Felix Reimann 17:17, 4. Dez. 2013 (CET)
  56. --Bubo 18:30, 4. Dez. 2013 (CET)
  57. --Coffins (Diskussion) 19:22, 4. Dez. 2013 (CET)
  58. --Firefly05 (Diskussion) 20:09, 4. Dez. 2013 (CET)
  59. Es geht darum, den Grundsatz und das Regel-Ausnahme-Schema klarzustellen. --Björn 20:13, 4. Dez. 2013 (CET)
  60. --V ¿ 21:46, 4. Dez. 2013 (CET)
  61. --Merlissimo 21:59, 4. Dez. 2013 (CET)
  62. --Stefan »Στέφανος«  22:02, 4. Dez. 2013 (CET)
  63. -<)kmk(>- (Diskussion) 22:57, 4. Dez. 2013 (CET)
  64. --lutki (Diskussion) 23:06, 4. Dez. 2013 (CET)
  65. -- Ukko 23:22, 4. Dez. 2013 (CET)
  66. --Video2005 (Diskussion) 02:17, 5. Dez. 2013 (CET)
  67. --Baumfreund-FFM (Diskussion) 07:30, 5. Dez. 2013 (CET)
  68. --dealerofsalvation 07:35, 5. Dez. 2013 (CET)
  69. --Gripweed (Diskussion) 09:32, 5. Dez. 2013 (CET)
  70. --Sinuhe20 (Diskussion) 10:05, 5. Dez. 2013 (CET)
  71. --PerfektesChaos 10:22, 5. Dez. 2013 (CET)
  72. --Tkarcher (Diskussion) 13:51, 5. Dez. 2013 (CET)
  73. --BishkekRocks (Diskussion) 14:43, 5. Dez. 2013 (CET)
  74. --PG ein miesepetriger Badener 15:34, 5. Dez. 2013 (CET)
  75. --He3nry Disk. 16:32, 5. Dez. 2013 (CET)
  76. --Sjokolade (Diskussion) 18:15, 5. Dez. 2013 (CET)
  77. --Lienhard Schulz Post 18:43, 5. Dez. 2013 (CET)
  78. --Kein Einstein (Diskussion) 19:35, 5. Dez. 2013 (CET)
  79. --Stefan (Diskussion) 22:19, 5. Dez. 2013 (CET)
  80. --Rauenstein 22:58, 5. Dez. 2013 (CET)
  81. --Anka Wau! 23:45, 5. Dez. 2013 (CET)
  82. --es grüßt ein Fröhlicher DeutscherΛV¿? Diskussionsseite 06:42, 6. Dez. 2013 (CET)
  83. --Doc.Heintz 08:55, 6. Dez. 2013 (CET)
  84. --Shisha-Tom ohne Uhrzeit, 6. Dez. 2013
  85. --BesondereUmstaende (Diskussion) 14:57, 6. Dez. 2013 (CET)
  86. --Varina (Diskussion) 16:37, 6. Dez. 2013 (CET)
  87. --Studmult (Diskussion) 17:30, 6. Dez. 2013 (CET)
  88. --GT1976 (Diskussion) 20:51, 6. Dez. 2013 (CET)
  89. --Wikifreund (Diskussion) 22:04, 6. Dez. 2013 (CET)
  90. --Wnme 23:07, 6. Dez. 2013 (CET)
  91. -- ST 00:47, 7. Dez. 2013 (CET)
  92. --Flo Beck (Diskussion) 13:45, 7. Dez. 2013 (CET)
  93. IW 16:34, 7. Dez. 2013 (CET)
  94. --Blech (Diskussion) 17:48, 7. Dez. 2013 (CET)
  95. --Falkmart (Diskussion) 18:21, 8. Dez. 2013 (CET)
  96. --Partynia RM 22:53, 8. Dez. 2013 (CET)
  97. --ElRaki 01:09, 9. Dez. 2013 (CET) so viele Benutzerdaten wie möglich löschen/so wenig Benutzerdaten wie unbedingt nötig behalten
  98. --user:MoSchle--MoSchle (Diskussion) 03:57, 9. Dez. 2013 (CET)
  99. --Daniel749 Disk. (STWPST) 16:32, 9. Dez. 2013 (CET)
  100. --Knopfkind 21:19, 9. Dez. 2013 (CET)
  101. --Saibot2 (Diskussion) 23:14, 9. Dez. 2013 (CET)
  102. --Atlasowa (Diskussion) 15:03, 10. Dez. 2013 (CET) Der Aufruf richtet sich aber ebenso an WMDE, die ja die Abschaffung des Toolservers beschlossen hat und damit die Entwicklung zum DUI ermöglicht hat. Nur Briefträger zu WMF sein ist zu wenig. Wenn WMDE sich Gutachten zur Spendenkultur in Deutschland schreiben lassen kann, um beim WMF Lobbyismus für eine eigene Spendensammlung zu machen, dann kann WMDE ja wohl auch Gutachten zum dt./europ. Datenschutz in Auftrag geben.
  103. ----Fussballmann Kontakt 21:38, 10. Dez. 2013 (CET)
  104. --Steinsplitter (Disk) 23:40, 10. Dez. 2013 (CET)
  105. --Gps-for-five (Diskussion) 03:03, 11. Dez. 2013 (CET)
  106. --Kolja21 (Diskussion) 03:55, 11. Dez. 2013 (CET)
  107. --Laibwächter (Diskussion) 09:50, 11. Dez. 2013 (CET)
  108. -- Achim Raschka (Diskussion) 15:18, 11. Dez. 2013 (CET)
  109. --Alabasterstein (Diskussion) 20:32, 13. Dez. 2013 (CET)
  110. --Grueslayer Diskussion 10:51, 14. Dez. 2013 (CET)
  111. Daten nur erheben, wenn unbedingt für den Betrieb (bzw. rechtlich) notwendig. Alles andere sollte gar nicht erhoben werden. Die Rückschlüsse auf die Zeitzonen und das Wohngebiet (häufig auch von Benutzern selbst angegeben) sehe ich gar nicht als gravierend an. Vielmehr, dass im Wiki alles protokolliert wird. Die halte ich nicht für nötig. Wer muss schon wissen, wer vor 10 Jahren wo genau editiert hat. Nach einem Jahr sollte die Vorratsdatenspeicherung anonymisiert werden (also in der Artikelhistorie kanns dirn bleiben, da nötig, jedoch nicht in der Benutzer-Beitragsliste).--Alberto568 (Diskussion) 21:51, 14. Dez. 2013 (CET)
  112. --Horgner (Diskussion) 15:48, 16. Dez. 2013 (CET)
  113. --Oursana (Diskussion) 21:52, 16. Dez. 2013 (CET)
  114. --Meslier (Diskussion) 23:53, 16. Dez. 2013 (CET)
  115. -- Martin Bahmann (Diskussion) 09:20, 18. Dez. 2013 (CET)
  116. DerHexer (Disk.Bew.) 15:24, 19. Dez. 2013 (CET)
  117. Neotarf (Diskussion) 01:58, 20. Dez. 2013 (CET)
  118. --Lutheraner (Diskussion) 13:17, 20. Dez. 2013 (CET)
  119. --Lienhard Schulz (talk) 07:53, 21 December 2013 (UTC)
  120. --Brainswiffer (talk) 16:33, 1 January 2014 (UTC)
  121. Botulph (talk) 23:54, 31 January 2014 (UTC) Wie stets vorbehaltlich besserer Erkenntnis. Freundlicher Gruß. +verneig+

Comments

Can WMDE get an EU lawyer to assess whether such analysis of data is lawful under the current or draft EU directive and what it would take to respect it? I see that the draft contains some provisions on "analytics"; if the WMF adhered to EU standards (see also #Localisation des serveurs aux Etats-Unis et loi applicable bis) we might automatically solve such [IMHO minor] problems too. --Nemo 16:12, 20 December 2013 (UTC)

See also #Please_add_concerning_user_profiles (permalink, s) and #Generation_of_editor_profiles (permalink, s). PiRSquared17 (talk) 20:36, 20 December 2013 (UTC)

On a more personal note than the official response below, I shall repeat here advice I have regularly given to editors on the English Wikipedia in my capacity as Arbitrator: "Editing a public wiki is an inherently public activity, akin to participating in a meeting in a public place. While we place no requirement that you identify yourself or give any details about yourself to participate – and indeed do our best to allow you to remain pseudonymous – we cannot prevent bystanders from recognizing you by other methods. If the possibility of being recognized places you in danger or is not acceptable to you, then you should not involve yourself in public activities – including editing Wikipedia." MPelletier (WMF) (talk) 21:10, 20 December 2013 (UTC)

We can prevent creating user profiles by aggregating data. It has been done at the toolserver. It can be done at WikiLabs. NNW (talk) 21:29, 20 December 2013 (UTC)
No, you cannot. Those tools existed anyways, just elsewhere. You cannot prevent aggregation of public data without making that data not public anymore; including on the website itself (remove it from the API and people will just screen scrape for it) and in the dumps. Transparency isn't an accident, it's one of the basic principles of wikis in general and of the projects in particular. MPelletier (WMF) (talk) 18:12, 21 December 2013 (UTC)
Laws can prevent it though :) (looks like it may happen rather soon in EU). If everyone here takes extremist stances and collate everything as if there were no differences between publishing data and using it, or querying a database and making someone else query it, then it will be very hard to have any dialogue. To reiterate a point above, if a Wikimedia project includes Google Analytics and sends all private data to Google, our users don't care whether it was put by the WMF or a sysop, they just want it removed. --Nemo 18:23, 21 December 2013 (UTC)
No, actually, laws do not. The directive everyone refers to does not have anything to say about what people are allowed to do with publicly available information, but about private information which edit times most definitely are not.

Contrarywise, whether someone accesses a tool (or project page) is private information and this is why the rules already do forbid disclosing it; so your Google Analytics example is a good illustration of what we do already forbid. MPelletier (WMF) (talk) 20:55, 21 December 2013 (UTC)

I'm glad you have such legal certainties; I do not and I asked lawyers to comment, in the meanwhile I only said that law can forbid something if they wish (this seems rather obvious to me). As for Google Analytics, of course it's not the same thing, but it was just an example where it's easier to agree that it doesn't matter whether it's WMF or an user to place it on our servers (though the proposed draft explicitly does not cover the case of a sysop adding Google Analytics to a project). --Nemo 22:33, 21 December 2013 (UTC)
"your Google Analytics example is a good illustration of what we do already forbid." Oh, really? Just a short while ago a Software Engineer on the Wikimedia Foundation's Analytics team wrote about Analytics for tools hosted on labs?: "I don't think there are any technical reasons people can't use Google Analytics on a Labs instance. The only thing I can think of is that it'd be nice if people used something Open Source like PiWik. But I'll ask and report back in a bit." > later > "Google Analytics or any other analytics solution is strictly forbidden by Labs rules *unless* there's a landing page with a disclaimer that if the user continues, their behavior will be tracked." So that's the "good illustration of what we do already forbid": just put up a disclaimer. --Atlasowa (talk) 00:58, 22 December 2013 (UTC)
"Those tools existed anyways, just elsewhere.": This is told so often and it is still no good point. There are so many bridges and there are so many people crashing their cars into them. Does that mean we have to do it, too? A first step could be just to stop creating user profile on WMF servers. It was the end of the Toolserver limitations that started all the discussion. Of course there will be always someone who can and will do it somewhere but that is no reason to invite people to do it here on servers that are paid with donations for our work. I want to create an encyclopedia, not to collect money for spying on me. NNW (talk) 12:15, 22 December 2013 (UTC)

Additional signatures

  1. --Geolina163 (talk) 16:06, 20 December 2013 (UTC)
  2. --Density (talk) 16:35, 20 December 2013 (UTC)
  3. --Minihaa (talk) 16:57, 20 December 2013 (UTC) bitte um Datensparsamkeit.
  4. --Theaitetos (talk) 17:08, 20 December 2013 (UTC)
  5. -- Sir Gawain (talk) 17:17, 20 December 2013 (UTC)
  6. --1971markus (talk) 18:26, 20 December 2013 (UTC)
  7. --Goldzahn (talk) 19:22, 20 December 2013 (UTC)
  8. --Spischot (talk) 21:38, 20 December 2013 (UTC)
  9. --Bomzibar (talk) 22:43, 20 December 2013 (UTC)
    --Charlez k (talk) 22:51, 20 December 2013 (UTC) already signed, see above (Original signatures) --Krib (talk) 23:05, 20 December 2013 (UTC)
  10. --J. Patrick Fischer (talk) 09:14, 21 December 2013 (UTC)
  11. --Túrelio (talk) 15:07, 21 December 2013 (UTC)
  12. --Poupou l'quourouce (talk) 17:46, 21 December 2013 (UTC)
  13. --Nordlicht8 (talk) 21:54, 21 December 2013 (UTC)
  14. -- FelixReimann (talk) 11:16, 22 December 2013 (UTC)
  15. --Asio otus (talk) 11:54, 22 December 2013 (UTC)
  16. --Rosenzweig (talk) 12:26, 22 December 2013 (UTC)
  17. --Mellebga (talk) 13:47, 25 December 2013 (UTC)
  18. --Pasleim (talk) 15:24, 26 December 2013 (UTC)
  19. Elvaube ?! 13:32, 29 December 2013 (UTC)
  20. --Zipferlak (talk) 13:18, 2 January 2014 (UTC)
  21. --Gerbil (talk) 15:04, 5 January 2014 (UTC)
  22. --Sebastian.Dietrich (talk) 22:41, 9 January 2014 (UTC)
  23. --Stefan Bellini (talk) 18:57, 12 January 2014 (UTC)
  24. --SteKrueBe (talk) 23:48, 12 January 2014 (UTC)
  25. --Wilhelm-Conrad (talk) 23:02, 14 January 2014 (UTC)
  26. --Cubefox (talk) 20:37, 15 January 2014 (UTC)
  27. --Yellowcard (talk) 22:47, 16 January 2014 (UTC)
  28. --Ghilt (talk) 23:55, 19 January 2014 (UTC)

Response

Please note the response by Tfinc above in the Generation of editor profiles and my follow up to it. Obfuscating user contributions data or limiting our existing export will not happen. The Wikipedia projects are wikis, edits to it are by nature public activities that have always been, and always must be, available for scrutiny. MPelletier (WMF) (talk) 21:10, 20 December 2013 (UTC)

We don't need to keep around timestamps down to a fraction of a second forever. PiRSquared17 (talk) 21:13, 20 December 2013 (UTC)
Not sure about that. I wonder if de.wiki also has agreed to a decrease of its own right to fork, a right which they constantly use as a threat. Making dumps unusable would greatly reduce the contractual power of de.wiki, dunno if they really want it. --Nemo 21:43, 20 December 2013 (UTC)

While we believe this proposal is based on legitimate concerns, we want to highlight some of the practical considerations of such a proposal. Due to the holidays, we’ve addressed this only briefly, but we hope it serves to explain our perspective.

In summary, public access to metadata around page creation and editing is critical to the health and well-being of the site and is used in numerous places and for numerous use cases:

  • Protecting against vandalism, incorrect and inappropriate content: there are several bots that patrol Wikipedia’s articles that protect the site against these events. Without public access to metadata, the effectiveness of these bots will be much reduced, and it is impossible for humans to perform these tasks at scale.
  • Community workflows: Processes that contribute to the quality and governance of the project will also be affected: blocking users, assessing adminship nominations, determining eligible participants in article deletion discussions.
  • Powertools: certain bulk processes will be broken without public access to this metadata.
  • Research: researchers around the world use this public metadata for analysis that is useful for both to the site and the movement. It is essential that they continue to have access.
  • Forking: In order to have a full copy of our projects and their change histories all metadata needs to be exposed alongside content.

In summary, public and open-licensed revision metadata is vital to the technical and social functioning of Wikipedia, and any removal of this data would have serious impact on a number of processes and actions critical to the project. Tfinc (talk) 00:54, 21 December 2013 (UTC)

How was it possible for Wikipedia to grow 13 years without aggregating user data? What has changed since the start of WikiLabs that this is necessary? Why is it necessary for creating an encyclopedia to know the exact second of my edit 5 years ago? Where does the licenses say that it is necessary that the exact second of my edit has to be part of a fork? NNW (talk) 10:38, 21 December 2013 (UTC)
I understand the part on aggregation and analytics, but the point about seconds is quite silly: sure, seconds could not be necessary in some ideal version of MediaWiki where they don't matter; but they also don't matter at all for your privacy. To avoid inferences about timezone we should remove hours of the day, not seconds. --Nemo 18:12, 21 December 2013 (UTC)
If you read the appeal above you will see that I do know that talking about seconds is silly. But it is senseless to start with hours when some people don't understand the basic proplem with that data. Seconds just carry the topic to extremes so it may get understood that no one needs five year old timestamps for preventing vandalism or whatever. NNW (talk) 12:02, 22 December 2013 (UTC)
Actually, I read it but I don't see that. The text does not specify what level of precision in timestamps you want to achieve. --Nemo 10:19, 29 December 2013 (UTC)
I cannot offer a complete solution to this problem. The appeal in a nutshell is As much transparency as necessary, as much privacy as possible. I am not that much into technical questions. Perhaps some of the suggestions cannot be implemented for some technical reasons I don't know. Perhaps there are some better ways to keep users’ anonymity. All I did was centralizing a growing dissatisfaction about the way our data is handled and to start a discussion about it. NNW (talk) 11:56, 29 December 2013 (UTC)
Thanks. This is a frank and reasonable way to frame it. --Nemo 12:03, 29 December 2013 (UTC)
It's true that most actions of plain vandalism can be efficiently performed if we know the exact order of events, in order to revert edits correctly.
But the precision of timestamps is needed for things where there are battles related to the order of events in the history, for example battles of licences: we need to be able to prove the anteriority of a work. Precise timestamps are then needed, but we could hide this info by replacing these exact timestamps by digital signatures generated by the server, and making an API reserved to CheckUser admins, that would be able to assert which event occured before another one. IT could also be used for anonimizing contributions made by users that asked their account to be deleted and their past contributions to be fully anonymized (while maintaining the validity of their past work and provability and permanence/irrevocability of their agreed licences).
Other open projects have experienced this issue when it was difficult to assert the licencing terms (for example on OpenStreetMap before it changed its licence from CC-BY-SA to ODbL for new controbutions, and needed to check its data according to the time the user actually accepted the new Contributor Terms and actually accepted to relicence, or not, its past contributions, in order to cleanup the active online database then published exclusively using the new licence: this did not mean that the old database was illegal, but that it has been frozen at a precise timestamp, and all further edits made exclusively on the new licence that users had to accept beore continuing making new edits).
Precise timestamps are then needed for long terms, and this is not just ot fight active abuses and spams (with bots interested in a short period of time not exceeding one month; after that time, a bot alone cannot work reliably without human review to restrict its searches, if something must be reverted, or in case of doubt, with all user rights transferred to a special aggregated/anonymized user account detached from the original user).
Note that timestamps and goelocation data stored in media files are a problem, users chsould have a way to cleanup a media file from these data by reducng the precision (for example only the date, or just the year, and a weaker geolocation, or deletion of unnecessary metadata such as stored hardware ID's of digital cameras, version of the tool used to modify the photos, possibly online by using external services like Google Picasa), or other kind of information which may store such data using stealth technics such as steganography (using technics that will be discovered only years laters): Commons should have a tool to inspect these metadata, to allow the orogonal uploaded to cleanup these hidden details, to be dropped permanently by dropping also the stored old versions of these media files.
Fully anonimizing photos and videos is a really difficult challenge (it is far easier to do it on graphics with reduced color spaces or with vector graphics accepting some randomized alteration of any unnecessary geometric precision), as things initially invisible may be revealed later by new procesing algorithms (like those already used now by Google which can precisely identify places and people by looking at some small portions of photos or assembling multiple ones from the same "exposed public user account" and in the same timestamp period, or photos/videos participating to the same topic elsewhere)!
Note that these media analysis tools may also be used to "assert" the licencing terms and legitimate author of a work, that has been reused elsewhere without permission (and there are already examples where legitimate Wikimedia contents have been attacked later by abusers trying to take the authorship and building a fake anteriority). This has already done severe damages in Wikimedia projects (for example when several editions of WikiQuotes had to be fully erased and restarted from zero, a few years ago, when we could no longer prove the origin or anteriority of a work). verdy_p (talk) 13:33, 22 December 2013 (UTC)

Question of context

AxelBoldt, NNW, and everyone else...

I regret to admit that the context in which the members of the appeal came up with the feature request is unclear to me due to the language barrier. Please provide me with links of where the opt-out idea originated; even if they're in German, I will be grateful as I would not have to try to search for the discussion myself. Gryllida (talk) 07:20, 31 December 2013 (UTC)

As far as I know the opt-out idea was made by Cyberpower678 first when he started the RFC for X!'s Edit Counter [8]. Such tools at the toolserver always had an opt-in (also as far as I know). NNW (talk) 13:08, 31 December 2013 (UTC)
NNW, is there a place lack of opt-in feature was discussed, first time, for the DUI tool specifically? Gryllida (talk) 15:13, 31 December 2013 (UTC)
Gryllida, the DUI was the direct result of the RFC for X!'s Edit Counter. Any opt-in/opt-out/nothing-at-all discussions were held there. As Ricordisamoa refused to change anything (see link in the thread below) there was nothing left to discuss. Some reactions to his tool can be found at User talk:Ricordisamoa#Deep user inspector. NNW (talk) 15:40, 31 December 2013 (UTC)
NNW, «the DUI was the direct result of the RFC for X!'s Edit Counter» is a useful observation. ☺ Where can I see evidence for that, for reference, as it appears to be of relevance to this thread? Gryllida (talk) 15:56, 31 December 2013 (UTC)
[9]. NNW (talk) 16:05, 31 December 2013 (UTC)
NNW, you have linked me to the RFC text at the initial stage while its discussion section is empty. Community views could be of interest in this discussion though. ☺ For me to not go through the history manually, could you please locate the RFC in an archive and link me to that? Gryllida (talk) 15:56, 31 December 2013 (UTC)
Ah, the latest revision appears to contain the archive. Thanks! ☺ Gryllida (talk) 15:58, 31 December 2013 (UTC)
Even though a translated message about the RfC was spammed to all wikis (by me), most commenters seem to be from enwiki or dewiki. I'd say dewiki mainly wanted to keep opt-in, enwiki wanted to remove it or use opt-out, which is not surprising. PiRSquared17 (talk) 23:55, 1 January 2014 (UTC)

NNWThanks for the context. It appears that the tool functions as a proxy to already available information, and the WMF lack authority to eliminate it entirely, such as if it were hosted externally. Hence it appears useless for them to add actionable clauses about it into their privacy policy.

I only see work on an Externsion as a last resort, for the DUI tool to fail to function at the wikis that choose to request such extension with community consencus. If the community is willing to experiment, the WMF labs resources are available for collaborative community work on it. Gryllida (talk) 09:22, 3 January 2014 (UTC)

Response

Thank you to all the users who contributed to this discussion, and who signed on to this appeal. We take these concerns seriously, and understand why you are concerned, even when we disagree with some of your analysis (as we first discussed in our blog).

As I understand the appeal, there are really four main requests. I’d like to summarize and respond to each of these requests here.

Protecting users

At the highest level, the appeal asks that the Foundation "commit to making the protection of its registered users a higher priority and should take the necessary steps to achieve this". We believe strongly that we have already made protection of all of our users a high priority. This can be seen in our longstanding policies — like the relatively small amount of data that we require to participate and the steps we take to ensure that nonpublic information is not shared with third parties — and in our new policies, like the steps we've taken to add https and filter IP addresses at Labs. We will of course always have to balance many priorities while running the sites, but privacy has been and will remain one of the most important ones.

Reducing available information

More concretely, the appeal expresses concern that the publication of certain information about edits in the dumps, on the API, and on the sites, allows users to deduce information about editors. It therefore requests that we remove that information from dumps and the API.

This information has been public since the beginning of the projects almost 13 years ago. As Tfinc and others have discussed extensively above, the availability of this information has led to the creation of a broad set of tools and practices that are central to the functioning of the projects. We understand that this can lead to the creation of profiles that are in some cases uncomfortably suggestive of certain information about the respective editor. However, we do not think this possibility can justify making a radical change to how the projects have always operated, so we do not plan to act on this request.

Aggregation on Labs

The second major concern presented was that the Wikimedia Labs policy, unlike the Toolserver policy, does not explicitly prohibit volunteer-developed software that aggregates certain types of account information without opt-in consent. Because of this, the appeal requested a ban on such software on servers (like Labs) that are hosted by the Foundation.

To address this concern, I proposed a clarification to the Labs terms of use. Several users have expressed the opinion that this is insufficient, so the discussion is still ongoing about what approach (if any) should be taken on Labs. Anyone interested in this request is urged to contribute to the discussion in that section.

Collection of IP addresses

The final request in the appeal was to not "store and display the IP addresses of registered users". We currently store those addresses, but only for 90 days, as part of our work to fight abuse. This will continue under the new Data Retention Guidelines. We do not display the IP addresses of registered users, except to those volunteers who are involved in our abuse-fighting process, and then only under the terms described in this Privacy Policy and the Access to Nonpublic Information Policy. So we think we are reasonably compliant with this request.

Conclusion

As NNW put it in a comment above, the appeal seeks “as much transparency as necessary, as much privacy as possible.” The WMF strongly agrees with this goal, which is why we have always collected very little personal data, why we do not share that data except in very specific circumstances, and why we have written a very detailed, transparent privacy policy that explains in great detail what we do with the data we have. At the same time, we also recognize that providing information about edits has been part of how we have enabled innovation, flexibility, and growth. After weighing those factors, we have reached the conclusions described above. We hope that the users who signed the appeal will accept this conclusion, and continue to participate and support our shared mission. —LVilla (WMF) (talk) 00:37, 10 January 2014 (UTC)

Should users have right to know when, by whom and under which reason they have been checkusered?

Hi, I think the current draft does not address this issue directly, but allowing user to know those information(not the checkuser data itself) should make the checkuser more transparent. Is here the right policy to propose this idea?--朝鲜的轮子 (talk) 01:54, 9 February 2014 (UTC)

This would reveal information about other users. Assume that a checkuser tells me that he has been looking at my data. Immediately after that, I go to Meta:Requests for CheckUser information and find that someone has been asking for information about a user who self-identifies as living in the same country as I do. Now I suddenly know that it is very likely that the user has been using the same IP address as I have been using, which may reveal more information to me about who the user might be. This sounds bad, so let's avoid it. --Stefan2 (talk) 20:22, 9 February 2014 (UTC)

SUL account creation dates

Tracked in Phabricator:
Bug 19161

If one looks at Special:CentralAuth/darkweasel94, one can see on which dates I first visited all wikis listed there while logged in. This information, which is information about a user simply having read something, not actually actively done anything, is publicly available about everyone with a global account.

This doesn't appear to be covered either in the current privacy policy or in this draft, although it clearly has privacy implications - it can be used to find out that somebody was online on a certain date/at a certain time even if that user didn't actually contribute anything or do a logged action (other than account "creation"). Users don't normally expect that simply looking at a page will show up in any logs, so I think the privacy policy should make it clear that this is done. darkweasel94 (talk) 08:50, 26 January 2014 (UTC)

You don't think it's comparable to the new user log? PiRSquared17 (talk) 16:41, 28 January 2014 (UTC)
As far as I can see it basically is the new user log. The problem is, when you click "create your account" after typing a username, twice the same password, and solve a captcha, you actively do something that can reasonably be expected to be publicly logged. When you click an interlanguage link, you don't reasonably expect that to show up in logs. In general I feel that this draft does not sufficiently address the information available in Special:Log, it simply assumes that "public contributions" includes that. darkweasel94 (talk) 20:29, 28 January 2014 (UTC)
Hi darkweasel94. We discussed your concern internally and have added some language to the "Information We Collect" section in hopes of addressing this issue. Please let us know if you have any further concerns we can help you with! Mpaulson (WMF) (talk) 01:11, 8 February 2014 (UTC)
Yes, I think that legally at least this should be sufficient, although from a user's point of view I think it would be better to clarify under what circumstances an account is really created, because people might take that to mean just the creation on the home wiki. Proposed wording to be inserted before the last sentence of the last paragraph of that section (feel free to copy-edit, I'm not a native speaker of English):
If you have a unified account for all Wikimedia Sites, which is true for (but not limited to) all accounts created after 2008-xx-xx, account creation on a particular Wikimedia Site may occur, and be publicly logged, whenever you first visit that Wikimedia Site while logged into that unified account.
I think this (the date needs to be filled in, I couldn't quickly find it) should definitely make it clear even to users who don't know anything about how SUL works. darkweasel94 (talk) 13:45, 8 February 2014 (UTC)
The date (2008-xx-xx) still hasn't occurred, so it is unknown which date it will be. For the moment, you get a non-SUL account if you try to create an account on a project and someone else already has that user name on some other project without having activated SUL. For example, sulutil:Account tells that the user name "Account" exists on some projects but that there is no SUL account with that name. If you try to create an account with the user name "Account" on some other project, then you will get a non-SUL account with that name.
It would be a good idea to automatically create SUL accounts in cases like this to prevent the creation of further SUL conflicts. --Stefan2 (talk) 16:52, 8 February 2014 (UTC)
That is interesting to know, thank you. In that case, "all" should probably be replaced with "most", and the date should be whenever new accounts with entirely new names became SUL accounts (as my current account already is). darkweasel94 (talk) 18:02, 8 February 2014 (UTC)
Hi Dearkweasel94 and Stefan2. Thank you for your suggestions. I understand why you wanted clarification in the privacy policy that some information may be contained within public logs (and happily added language indicating this). However, I'm unclear as to why the distinctions between various SUL/semi-SUL accounts is something that should be contained within the privacy policy. Could you explain to me what the connection is between this information and the privacy practices of WMF? Thanks! Mpaulson (WMF) (talk) 21:31, 11 February 2014 (UTC)
What I mean is this: try clicking this link leading to Latin Wikipedia, a project where you haven't yet logged in from your staff account (which is a SUL account). Then, go to Special:CentralAuth/Mpaulson_(WMF). You will see that in the row "la.wikipedia.org", there is now public information about exactly when you clicked that link (or some other link leading to la.wikipedia.org, but in any case it shows that you were online at the time).
This applies only to SUL accounts, however. Non-SUL accounts can login only in one wiki, so if they click the above link, they'll be logged out on that wiki. I think it's useful to tell people in some way "if your account is relatively new, it's probably a SUL account and affected by this problem". The current wording doesn't really make it clear that simply clicking an interwiki link can count as account creation; from a newbie's point of view, account creation is done by the "create account" link, not somehow else. darkweasel94 (talk) 21:51, 11 February 2014 (UTC)
Thanks for the clarification. I understand what you are saying, but I think that level of specificity isn't really what we are going for in the privacy policy. While I think it's important (as you noted) to explain that some information about a user's actions may be accessible in public logs, I don't think it's the right place to explain in detail how the information in specific logs is gathered. Mpaulson (WMF) (talk) 23:15, 12 February 2014 (UTC)


WMF Response to concerns about unsampled data

It is important to understand Wikimedia projects including Wikipedia as they are -- critical Internet and mobile applications in a rapidly changing social and technological landscape. If we are to remain relevant and vibrant we need to be able to understand the behavior and preferences of readers and editors. From a practical standpoint, it's important that we give ourselves the tools that we need to do this. We are not simply collecting data because we can -- there are several categories of use cases that are are impossible to satisfy without unsampled data.

For example, we need to understand how desktop and mobile users interact with the Sites as browsing patterns change. To help tackle problems like that, we need to be able to capture information about a session that requires unsampled data, including the number of pages visited, session duration, and other valuable indications of engagement, and ultimately retention. As the Wikimedia movement grows both geographically and across different platforms, we need to understand the new and different ways in which users interact with the Sites.

There are also use cases around long tail behavior that require further research, which sampling would render impossible or, at best, very difficult. For example, there are lines of thinking around ratios of readership and editing where we would like to understand if pages with relatively low readership are actually good sources of editors. While this is only a theory, we need to be able to address this and similar lines of research.

Finally, we want to underline our commitment to privacy. Unsampled data does not have to be any less private than sampled data. Our commitment to aggregation and anonymization would still be applied with the same degree of effectiveness and respect for Wikimedia readers and editors. Toby Negrin, WMF Director of Analytics TNegrin (WMF) (talk) 00:07, 5 February 2014 (UTC)

It is also worth noting that the recording of raw unsampled data was permitted by the old privacy policy, which said that we sampled such raw logs to provide statistics and kept the raw logs private. We tended not to do it (primarily, as I understand it, for performance reasons), but it was permitted. The change between the old and new policy is that the new policy is much more clearly written, not that the rules in this area changed. (This is a good example of why the policy is so detailed: we wanted everyone to know what we are doing, and have good, serious discussions about it.)
I think it is also worth pointing out that in many cases, sampling is also a poor substitute for a solid set of data retention guidelines. I’m reasonably confident that even increased amounts of unsampled logging will still be a net win for user safety and privacy when combined with the new data retention guidelines.
On a more personal note, because clarity and transparency was our goal all along, it was particularly frustrating to see people speculate about why we “changed” the policy, and accuse us of acting in bad faith. The issue only came up in this discussion because of our extreme dedication to transparency. I hope that instead of that sort of speculation, this last part of the discussion can focus on the actual pros and cons of unsampled data, and why we think it is important to have that option, as Toby has outlined above. —LVilla (WMF) (talk) 00:12, 5 February 2014 (UTC)
Not "bad faith" but misjudgement. What did you expect us to think, when we find, for example, a requirement to log in on google, in order to get access to a Wikimedia project ? That feels so wrong. Alexpl (talk) 15:25, 12 February 2014 (UTC)
Alexpl, that one was fixed. On the other hand, the "response" above doesn't really respond on anything. It rather reinforces the appearance that WMF now wants to follow the model "let's collect stuff we don't really need now, one day it might come handy", rather than the usual privacy-friendly cautious approach exemplified (if not mandated) by the current privacy policy where it talks of sampled logs. --Nemo 15:35, 12 February 2014 (UTC)
If it comes in handy for wikipedia only, and access is restricted, it could be debated. But my fear is, that third parties, somewhere along that vast research process TNegrin announced, may get their hands on those data. So the requirement for a google-account on a WMF project seemed like a taste of things to come. Alexpl (talk) 16:30, 12 February 2014 (UTC)
"It rather reinforces the appearance that WMF now wants to follow the model 'let's collect stuff we don't really need now, one day it might come handy', rather than the usual privacy-friendly cautious approach exemplified (if not mandated) by the current privacy policy where it talks of sampled logs." I see the concern, but I think it is wrong. Sampled logs were false security that did not protect the people who had the bad luck to be involved in the sample. (And my understanding is that they were never intended as security, only for performance reasons.) A real data retention policy that applies to all logs, sampled or not, and makes sure that we actively delete sensitive information, is a stronger and more protective policy for all users. So I think we're moving in the right direction with this. —LVilla (WMF) (talk) 23:15, 12 February 2014 (UTC)

invalid id's in static anchors (also not working in translations)

  • all headings are translatable, they cannot work as reliable targets of links to point to them with the same links between languages
  • so we need to define static anchors hidden in tvars.
  • but static anchors must be valid ids and must be unique un the page !
  • Most anchors are invalid as they include invalid characters like spaces, or commas and other punctuation.

This means that all existing static anchors defined must be changed by making sure they are different from the default id's generated by MediaWiki for section headings. The simlest is to see that English headings always start by a capital letter (so MEdiaWiki will generate id's starting by a capital letter).

Let's then generate only id's in English starting by a lowercase letter; we can abbreviate these id's but we must remove spaces, and punctuations, possibly by replacing them by a single hyphen. We can filter out so,e non-meaningful words present in section headings, from the static anchors we generate.

Then we must make sure that anchors used in source links throughout the article will point to the correct target static anchor. Remove all dependency to translatable section headings... verdy_p (talk) 01:12, 12 February 2014 (UTC)

You are correct on all accounts, I noticed that when I was fixing the broken links that someone else pointed to me as well. I thought that I had set them all up as unique anchors placed above the header (rather then using any kind of header) and put them in tvar's but it's possible that I made some mistakes here (since it was the first of the privacy policy pages I marked for translation) and wasn't more religious about using {{anchor}} until some of the other pages. Right now I'm swamped so I don't anticipate being able to truly fix all of this until we move to foundation wiki (after the board approves) or, at best, in about 2 weeks. I want to make sure that I set enough time aside to be confidant that when I change an anchor I change all of the links to it. Because of that my goal at the moment is to 'make it work for now' and most of these links do appear to work (at least in FF and Chrome) on mediawiki. If you have some time to work on it I would very much appreciate the help but don't feel obligated. I'm going to try and work on it one of these evenings but because of my other projects (and a work trip next week) I don't want to promise anything. Jalexander--WMF 01:40, 12 February 2014 (UTC)
I fixed the invalid id's used by anchors in the page; but a few entries in the "summary" template need to be synchronized in translations (the "$id" used by tvars have identical names as the id's used in the HTML for target anchors in the Privacy policy page). These entries are now fuzzy (I fixed them, in French in order to test that they were all functional). Theses id's no longer contain any space or punctuation except the hyphen, and are often shorter (this makes translations easier to edit too). To translators: as usual, don't translate the "$id" placeholders (as usual), but make sure they are present (don't replace them by updated URLs for a specific language).
Note that this summary template can be viewed now directly in its page: its links will point to the translated version to the Privacy Policy page (in the current user language) instead of linking to the current page as they did before, but for this works only if you click the "purge" button after you have made edits to translation units in the "summary" template.
This is because that template is now fully within a noinclude section, and when viewing the template page, it locally takes a "PP" parameter pointing to the page name of the Privacy page. But when the template is transcluded n the Privacy page, this template parameter is not used, and is empty by default, so the links will point to the local page, as they did before. Hope this helps testing the links.
I also fixed the RTL layout of the page (for Arabic, Hebrew, Divehi...), but this depends on the template {{dir}}.
To Foundation admins: if you import it later to the Foundation site,that does not have the Dir template the "/Lft and /Right subtemplates will have to be edited on the WMF site depending on translations. {{dir}} is used in the /Left, /Right and /RHeader subtemplates, using the helper {{pagelang}} which is supposed to return the content language code of the current page, i.e. the suffix after the "/" (even in languages currently not supported by MediaWiki). The Dir utility template should be OK with the list of RTL klanguage translations supported anyway (the minimum list of RTL languages in the Dir template should include "ar", "bcc", "dv", "fa", "he", "ug", "ur", "yi", but you can easily complete the list by using the statistics page of the Translate tool which dives the full ist of autonyms for language codes supported, this list is sortable by autonym and you get a column of language codes to define in the Dir template).
Also the ugly black double stroke borders in section summaries in small prints (to the left in English) are now using the colors used in the main summary box at the top of page and single stroke, and the same background color is uesd. verdy_p (talk) 09:47, 12 February 2014 (UTC)
Thank you so much for all of your help Verdy, I will look through things as soon as I'm able to try and unfuzzy and synchronize. Jalexander--WMF 11:12, 12 February 2014 (UTC)
@Jalexander:: if you have thenecessary privilege, can you add "azb" (Southern Azeri, written in the Arabic script, outside Azerbaijan that uses the Latin script for Azeri "az") to the list of RTL languages in {{Dir}} ? In fact there may be a few other codes to add, if I just consider the list displayed on the Translation statistics, where I see other MediaWiki-supported languages using the Hebrew; Arabic, Aramaic, or Divehi scripts in their displayed autonym: some of these are language variants of another language already listed as RTL in their main variant, or not listed as their main variant is LTR (I think this may concern other African languages, like Haussa. The list should be reviewed by the Wikimedia Language Committee. verdy_p (talk) 14:26, 12 February 2014 (UTC)
Added azb, I agree we should do a review and see what other languages should be getting the treatment. Jalexander--WMF 00:58, 13 February 2014 (UTC)

Closing of the Community Consultation for the Draft Privacy Policy

The community consultation for this Privacy Policy draft has closed as of 14 February 2014. We thank the many community members who have participated in this robust discussion since the opening of the consultation 03 September 2013. Your input has helped create a transparent proposed Privacy Policy that reflects our community's values. You can read more about the consultation and the next steps for the proposed Privacy Policy on the Wikimedia blog. Mpaulson (WMF) (talk) 00:00, 15 February 2014 (UTC)

Fine

and what happens with this kind of "Schurkenlisten" (list of desperados) https://de.wikipedia.org/w/index.php?title=Benutzer:Seewolf/Liste_der_Schurken_im_Wikipedia-Universum&diff=121652431&oldid=121648831#Angel54? Nothing, many sign, nothing happens. Thats wikipedia policy. Background: Im a teacher in history and they call me an "Antisemite" there - although never anything happened in this case: thats a kind of "ratfucking", dont u agree?--Angel54 5 (talk) 01:16, 15 March 2014 (UTC) And I add an uttering of that person, who holds me in prison there (from twitter), to demonstrate, what kind of underlying pattern there really is: https://twitter.com/search?q=hkrichel&src=typd&f=realtime

Harald Krichel ‏@hkrichel 12. Feb. @zynaesthesie Das funktioniert nicht für homophobe Antisemiten, eine überdurchschnittlich häufige Betroffenenkombination (seems to me meanwhile erased)

He answers (translated): That only doesnt work for homophobic antisemites, a more than average combination number of persons concerned.--Angel54 5 (talk) 20:26, 22 March 2014 (UTC) Then take his last one: Harald Krichel ‏@hkrichel 7 Std.

Merkel hat recht: Ein Twitterverbot ist keine Zensur. Zensur macht man mit feineren Werkzeugen.

Merkel is right: Forbidding twitter is not censoring. This is done with finer tools. Means: He knows, what censoring is and uses that kind of stuff in his own sense of right or wrong. Btw. He has the German WP in his hands, and tributes to it with programming filters, noone ever agreed. Noone knows how far this scheme goes, cause most of those filtering is hidden.--Angel54 5 (talk) 20:53, 22 March 2014 (UTC)

Note on Labs Terms / Response to NNW

Hi, NNW: If you are asking here about the change from Toolserver to Labs about when “profiling tools” are allowed, we made the change because the edit information has always been transparently available, so the Toolserver policy was not effective in preventing “profiling” - tools like X edit counter could be (and were) built on other servers. As has been suggested above, since the policy was ineffective, we removed it.
However, this change was never intended to allow anarchy. The current Labs terms of use allows WMF to take down tools, including in response to a community process like the one that occurred for X edit counter. Would it resolve some of your concerns if the Labs terms made that more obvious? For example, we could change the last sentence of this section from:

If you violate this policy ... any projects you run on Labs, can be suspended or terminated. If necessary, the Wikimedia Foundation can also do this in its sole discretion.

to:

If you violate this policy ... any projects you run on Labs, can be suspended or terminated. The Wikimedia Foundation can also suspend or terminate a tool or account at its discretion, such as in response to a community discussion on meta.wikimedia.org.

I think this approach is better than a blanket ban. First, where there is a legitimate and widely-felt community concern that a particular tool is unacceptable, it allows that tool to be dealt with appropriately. Second, it encourages development to happen on Labs, which ultimately gives the community more leverage and control than when tools are built on third-party servers. (For example, tools built on Labs have default filtering of IP addresses to protect users - something that doesn’t automatically happen for tools built elsewhere. So we should encourage use of Labs.) Third, it encourages tool developers to be bold - which is important when encouraging experimentation and innovation. Finally, it allows us to discuss the advantages and disadvantages of specific, actual tools, and allows people to test the features before discussing them, which makes for a more constructive and efficient discussion.
Curious to hear what you (and others) think of this idea. Thanks.-LVilla (WMF) (talk) 00:02, 24 December 2013 (UTC)
Is there a need in distinguishing WMF's role in administering Labs tools? I would only stress the requirement of Labs Tools to obey this policy, here, and link to a Labs policy on smooth escalation (ask tool author; discuss in community; ask Labs admins; ask WMF). Gryllida (talk) 05:14, 24 December 2013 (UTC)
WMF is called out separately in the policy because WMF employees ultimately have control (root access, physical control) to the Labs servers, and so ultimately have more power than others. (I think Coren has been recruiting volunteer roots, which changes things a bit, but ultimately WMF still owns the machines, pays for the network services, etc.) I agree that the right order for conversation is probably tool author -> community -> admins, and that the right place for that is on in the terms of use but an informal policy/guideline on wikitech. -LVilla (WMF) (talk) 17:15, 24 December 2013 (UTC)
Yah, I just wanted to propose that the policy references both concepts (WMF's ultimate control, and the gradual escalation process) so the users don't assume that appealing to WMF is the only way. Gryllida (talk) 08:38, 25 December 2013 (UTC)
As I mentioned elsewhere on this page, the talk about "community consensus" raises questions such as "which community?" and "what happens when different communities disagree?" Anomie (talk) 14:30, 24 December 2013 (UTC)
Right, which is why I didn't propose anything specific about that for the ToU- meta is just an example. Ultimately it'll have to be a case-by-case judgment. -LVilla (WMF) (talk) 17:15, 24 December 2013 (UTC)
I would perhaps remove the "on Meta" bit then since it bears no useful meaning. «... such as in response to a community discussion.» looks complete to me. There doesn't even have to be a discussion in my view: a single user privately contacting WMF could be enough, granted his report of abuse is accurate. «... such as in response to community feedback.» could be more meaningful. Gryllida (talk) 08:38, 25 December 2013 (UTC)
This is meant as an example ("such as"), so I think leaving the reference to meta in is OK. Also, this is in addition to the normal reasons for suspension. For the normal reasons for suspension, a report by a single person would be fine, but I think in most cases this sort of discretion will be exercised only after community discussion and consultation, so I think the reference to discussion is a better example than saying "feedback".-LVilla (WMF) (talk) 22:28, 31 December 2013 (UTC)
I am referring to this argument from above: we made the change because the edit information has always been transparently available, so the Toolserver policy was not effective. The position that any analysis that can be performed by a third party should also be allowable on WMF servers with WMF resources is not convincing. It is clearly possible for a third party to perform comprehensive and intrusive user profiling by collating edit data without the user's prior consent. We could (and should!) still prohibit it on our servers and by our terms-of-use policy. (A different example: it's clearly possible for a third party running a screen scraper to construct a conveniently browsable database of all edits that have ever been oversighted; this doesn't mean WMF should allow it and finance it.) Now, why should this kind of user profiling be prohibited by WMF? Because WMF lives on the goodwill of its editors, and editor NNW above put it best: "I want to create an encyclopedia, not to collect money for spying on me." AxelBoldt (talk) 18:15, 24 December 2013 (UTC)
You're right, but I think removed (oversaught) edits are out of question here. Whatever else is available is available, and allowing to collect freely available information programmatically sounds reasonable to me. Gryllida (talk) 08:38, 25 December 2013 (UTC)
It's not reasonable if the editors don't want it and if it doesn't further any identifiable objective of the foundation. In fact it is not only unreasonable but it's a misuse of donor funds. AxelBoldt (talk) 22:28, 25 December 2013 (UTC)
You should be interested in contributing to the #Tool_settings section below. Gryllida (talk) 01:56, 28 December 2013 (UTC)
Hello LVilla (WMF)! Your suggestion means that any tool that will be programmed in future has to be checked and – if someone things that it is necessary – has to be discussed individually. My experiences until now: "the community should not have any say in the matter" and a quite short discussion "Technically feasible, legally okay... but want tools do we want?" started at lists.wikimedia.org. If we want it that way we will have to define who is "community". Is it the sum of all users of all WMF projects? Can single projects or single users declare to keep a tool (e.g. en:WP voted for no opt-out or opt-in for X!'s Edit Counter but that would mean that my edits there will be used in that tool although I deny it completely for my account)? Which way will we come to a decision: simple majority or best arguments (and who will decide then)? Does a community vote for tool X mean that there is no chance for a tool Y to try it a second time or do we have to discuss it again and again?
We have to be aware of our different cultures of handling private data or even defining what's private and what's not. Labs "doesn't like" (nice term!) "harmful activity" and "misuse of private information". US law obviously doesn't evaluate aggregating data as misuse, I do. We discuss about necessary "transparency" but do not have a definition for it. The time logs of my edits five years ago seem to be important but you don't want to know my name, my address, my sex, my age, my way how I earn my money… which would make my edits, my intentions and my possible vandalism much more transparent than any time log. Some say "the more transparency the better" but this is a discussion of the happy few – but dominating – who live in North America and Western Europe. I think we also should think of those users who live in the Global South and want to edit problematic topics (religion, sexuality…). For those aggregated user profiles may become a real problem and they will always be a minority in any discussion. NNW (talk) 17:56, 28 December 2013 (UTC)
Everyone involved is aware that privacy values vary a great deal from community to community; but it seems very ill-advised to give the most restrictive standards a veto over the discussion, in practice and in principle. A clear parallel with the discussion over images can be drawn: while it would have been possible to restrict our standards to the subset deemed acceptable by all possible visitors, to do so would have greatly impoverished us. The same goes for usage of public data: we should foster an encourage new creative uses; not attempt (and fail) to preemptively restrict new tools to the minuscule subset nobody could raise an objection to. This does not preclude acting to discourage or disable a tool the community at large objects to – and the Foundation will be responsive to such concerns – but it does mean that this is not something that can be done with blanket bans.

To answer your more explicit questions, the answer will generally be "it depends" (unsatisfying as this sounds). Ultimately yes, the final arbiter will be the Foundation; but whether or not we intervene is dependent entirely on context as a whole; who objects, why, and what could be done to address those concerns. MPelletier (WMF) (talk) 00:48, 1 January 2014 (UTC)

So for programmers sky's the limit, it's to the community to find out which tool might violate their rights and to discuss this again and again and again because every tool has to be dealt anew. The community has to accept that in the end a RFC like for X!’s Edit Counter is just a waste of time and that programmers – of course – are not interested in any discussion or compromise because it might cut their tools. WMF is in the comfort position that Meta is in the focus of only very few users and the privacy policy does not apply to Labs. It would be fair to admit that under these circumstances WP:ANON becomes absurd and in near future – with more powerful tools – a lie. I understood "The Wikimedia Foundation, Inc. is a nonprofit charitable organization dedicated to encouraging the growth, development and distribution of free, multilingual, educational content" as "free and multilingual and educational content" but a user profile generated with my editing behaviour isn't educational. NNW (talk) 13:50, 4 January 2014 (UTC)
Unfortunately - it is. Just think of the possibilities for scientific research... Alexpl (talk) 08:14, 29 January 2014 (UTC)
A body donation would be great for scientific research, too. NNW (talk) 08:50, 29 January 2014 (UTC)
I think that's already covered by «Depending on which technology we use, locally stored data can be anything [...] to generally improve our services». Please be sure not to bring your organs close to the servers. ;-) --Nemo 08:57, 29 January 2014 (UTC)
One could squeeze a few Doctor titels out of in-depth research on contributors identity in combination with their WP work. Compared to that, a body donation is somewhat trivial. So I do agree we have to identify and neutralise every attempt to collect user data as fast and effective as possible. Alexpl (talk) 09:47, 29 January 2014 (UTC)

Questions from Gryllida

Implementation as Extension

This requests to conceal time of an edit. Would any of the supporters of the appeal be willing to demonstrate a working wiki with the requested change implemented as an Extension which discards edit time where needed? If sufficiently safe and secure, it could be added to a local German wiki by request of the community, and considered by other wiki communities later on. Many thanks. Gryllida (talk) 04:43, 24 December 2013 (UTC)

Tool settings

Have you considered requesting the Tool author to add an opt-out (or opt-in, as desired) option at a suitable scope? Gryllida (talk) 04:45, 24 December 2013 (UTC)

Example: editor stats:
«Note, if you don't want your name on this list, please add your name to [[User:Bawolff/edit-stat-opt-out]]».
--Gryllida (talk) 02:14, 28 December 2013 (UTC)

FYI: The tool address is here. It is not mentioned in the appeal text. (I have notified the tool author, Ricordisamoa, of this discussion and potentially desired feature.) Gryllida (talk) 02:20, 28 December 2013 (UTC)

User:Ricordisamoa deliberately ignored the idea of an opt-in or opt-out and there is no chance to discuss anything: There's no private data collection, and only WMF could prevent such tools from being hosted on their servers: the community should not have any say in the matter. For complete discussion read Talk:Requests for comment/X!'s Edit Counter#Few questions. NNW (talk) 16:29, 28 December 2013 (UTC)
@Gryllida and NordNordWest: of course I accept community suggestions (e.g. for improvements to the tool) but the WMF only is competent about legal matters concerning Wikimedia Tool Labs. If there should be any actions, they will have to be taken by the WMF itself. See also [10]. --Ricordisamoa 03:04, 29 December 2013 (UTC)
Ricordisamoa, would you not be willing to add an opt-out? I would desire it be solved without legal actions or escalation, as it appears to be something within your power and ability, and many users want it. (It seems OK to decline OPT-IN feature request.) Gryllida (talk) 09:07, 29 December 2013 (UTC)
@Gryllida: No. --Ricordisamoa 16:44, 30 December 2013 (UTC)
Ricordisamoa, I understand your view. It might make sense to document that in FAQ, if not already, at leisure. I appreciate you being responsive. Gryllida (talk) 07:17, 31 December 2013 (UTC)
As long as WMF wants to encourage programmers to do anything as long as it is legally there is no reason for programmers to limit the capabilities of their tools. "Community" is just a word which can be ignored very easily when "community" wants to cut capabilities. Only "improvements" will be accepted and "improvements" mean "more, more, more". NNW (talk) 14:00, 4 January 2014 (UTC)

Discussion on same topic in other locations

Note that this issue has also been discussed in #Generation_of_editor_profiles and #Please_add_concerning_user_profiles. For a full history of this topic, please make sure to read those sections as well. —LVilla (WMF) (talk) 00:36, 8 January 2014 (UTC)

Opt-in

There is the possibility for a compulsary opt-in for generating user profiles at Labs. By this we would return to the Toolserver policy which worked fine for years. No information would be reduced, fighting vandalism would still be possible, programmers still could write new tools and of course there will be lots of users who are willing to opt-in (like in Toolserver times). On the other hands all other users who prefer more protection against aggregated user profiles can get it if they want to. I see no reason why this minimal solution of the appeal couldn't be realized. NNW (talk) 13:43, 13 January 2014 (UTC)

As has been stated elsewhere, this only gives a false sense of security. There are other websites that allow profiling anyway, and there's no way to stop them, so there's no clear reason to pretend that you have a choice. //Shell 20:56, 13 January 2014 (UTC)
As has been stated elsewhere something that is done somewhere doesn't mean we have to do it, too. NNW (talk) 21:32, 13 January 2014 (UTC)
Toolserver policy was only enforced upon user request. There's a lingering worry that some upset user slap a tool author with a take-down request; this is demoralizing to authors after spending many hours developing the software. This discouraging effect is why we don't see many community tracking tools, like the Monthly DAB Challenge. I've got cool and interesting ideas, but wont waste my time. Dispenser (talk) 19:04, 21 January 2014 (UTC)
With an opt-in there would be no reason for any complaint. Everybody can decide if her/his data gets used for whatever or not and there will be still lots of users who will like and use whatever you are programming. Please think of those authors who spent many hours to create an encyclopedia and find themselves as an object of spying tools afterwards. Believe me: that's demoralizing. NNW (talk) 23:19, 21 January 2014 (UTC)
Users never spying on each other? I read enough ArbCom to know that's Fucking Bullshit. This goes beyond edit counters and affect community management. English Wikipedians do not want to watch over 2,000 articles for a month to understand what's happening at a WikiProject. Now I cite the DAB challenge as w:User:JustAGal was completely unknown to us until we expanded the data analysis. We've subsequently redesigned tools to work better for her.
Postscript: Dabfix, an automatic disambiguation page creation and cleanup script, only has a single user and may never recouped the hundreds of hours spent programming and testing it. If a tool is never used then I've wasted time that I could've done something useful. Dispenser (talk) 02:56, 10 February 2014 (UTC)

Alternative Labs terms proposal: per-project opt-in

The discussion above has been pretty wide-ranging, with some voices in support of opt-in; others in support of opt-out. It is also clear that, for any global proposal, defining who should be consulted is a key challenge. With those two things in mind, Coren and I would like to propose a per-project opt-in; i.e., if a particular project (e.g., Deutsch Wikipedia) wants to require it, then extraction of data from that project will require a per-user opt-in. This gives control directly to specific communities who have been most concerned about the issue, while still preserving flexibility for everyone else. Thoughts/comments welcome. —LVilla (WMF) (talk) 01:13, 4 February 2014 (UTC)

So 797 communities will have to discuss if they want to have an opt-in. Quite a lot talk, I think, especially for those who are active on several projects and dislike the idea of aggregated user data at all. I have got 100 or more edits in 20 projects although I don't speak all those languages. How can I vote in such a complex matter when I am not able to understand these languages? Am I allowed to vote in every project in which I have edits or do I have to meet some criteria? Why should a community control an opt-in/no opt-in when it is much easier that everybody takes control over his/her own data? It will lead to much more discontent among users when it will be a decision of projects instead of single users. Not everyone at de:WP thinks data aggregation is a bad thing, not everyone at en:WP likes to see data aggregated. NNW (talk) 10:10, 4 February 2014 (UTC)
@LVilla (WMF): A question of clarification: Does your proposal mean that in case a project makes this decision and a single user does not opt-in, the user's data will be excluded from the data pool which is accessable for developers on labs and external developers? (Which would be much more than just a declaration of intention but a technical barrier to analyze that user's data at all.) And could you explain the necessity of the intermediate level of legitimation by the respective project? I'm not sure if I understand what it's good for when you at the same time acknowledge that the single user himself has to make the decision to opt in. Wouldn't that be a shift of responsibility that no longer matches the reality? Why not just skip that step? User activity does not in general only take place on one single wiki, in times where contributors use their home wiki + commons + (sometimes) meta or wikidata, it seems to ignore the interdependencies we've built over the years. Alice Wiegand (talk) 23:26, 9 February 2014 (UTC)
@Lyzzy: If I understand your question correctly, then yes: if the project (such as WM-DE) opts out, then tools whose purpose is to build individual user profiles could not access the data of a user of that project who does not opt-in. The idea of doing this on a per-project basis is primarily because the objection to these sorts of tools appears to be highly specific to one project. (Not to say that everyone else on every other project loves it, but it seems undeniable that the bulk of the objection appears to be from one project.) Secondarily, it is because this rule is primarily symbolic (as discussed elsewhere in the page), so making it a per-project thing allows projects who care to make the symbolic statement without overly complicating the situation for others. Finally, it is because people objected to making it per-tool, because it was unclear what level of community discussion would be sufficient to force an individual tool to become opt-in. By making it per-project, we make it quite clear what sort of community discussion is necessary. This does lead to some inefficiencies, particularly for people who participate on meta and other projects. But none of the proposed solutions are perfect - all of them require discussions in a variety of places and inconveniencing different sets of users. Hope that helps explain the situation and the proposal. —LVilla (WMF) (talk) 02:54, 11 February 2014 (UTC)
@LVilla (WMF): I'm not sure you understood the question correctly. Would the non-opted-in user's data be somehow hidden in the Tool Labs database replicas as a technical restriction (which seems like it could be a significant performance hit for those wikis and would damage other uses of that data), or would this just be a policy matter that tool authors would be required to code their "data aggregation" tools to decline to function for the non-opted-in user on those wikis? Anomie (talk) 14:37, 11 February 2014 (UTC)
@LVilla (WMF):, I still don't understand if your proposal includes a technical solution. In part of your statements it reads as if the data of a user who does not opt-in after a project decided to go the opt-in-line will not be accessible to any tool on labs. That's entirely different from anything we talked about earlier (labs specific self-commitment) and it's also different from "there's a tag on the record, so please tell your tool not to analyse it". And because there is some kind of ambiguity, clarity about what the proposal is about is essential. Alice Wiegand (talk) 22:48, 17 February 2014 (UTC)
@Lyzzy: Sorry about the lack of clarity. The proposal does not include any technical measures. There are two types of technical measures possible:
(1) Publish less information. As described previously, this is inconsistent with how we have always done things, and would break a variety of tools.
(2) Audit individual tools on Labs. Given that most tool developers on Labs are likely to respect the policy, this would introduce a very high cost for a very low benefit.
So, yes, this would be a self-commitment, but the operations team at Labs would be able to kick off specific tools that violate the policy if/when the violation is discovered. Hope that helps clarify. —LuisV (WMF) (talk) 23:19, 18 February 2014 (UTC)
It does, thanks! Alice Wiegand (talk) 14:21, 19 February 2014 (UTC)
Edit counters will and have existed with or without Labs adopting the Toolserver policy. What about just letting the DE Wikipedia Luddites block tool links they don't like? Dispenser (talk) 03:11, 10 February 2014 (UTC)
You might take a look at Requests for comment/X!'s Edit Counter and check where the opt-in supporters come from. It is a bit more complex than de:WP vs. the rest of the world. NNW (talk) 09:02, 10 February 2014 (UTC)
@Dispenser: I've pointed out repeatedly that I think this is a mostly symbolic policy. We're trying to strike a balance that allows some communities who particularly care to make their symbolic statement. Not ideal, I know, but none of the solutions will please everyone here.—LVilla (WMF) (talk) 02:54, 11 February 2014 (UTC)
I'm not sure I support opt-in in any case, but this compromise is obviously intended for dewiki IMO. The privacy (if you consider analysis of aggregate data to be private) of users who edit on most wikis would still be gone. PiRSquared17 (talk) 02:59, 11 February 2014 (UTC)
This supposed privacy never existed in the first place. All the necessary data is already public. All this debate is about forcing people who want to create these tools to do so on third-party servers rather than on Tool Labs. Anomie (talk) 14:40, 11 February 2014 (UTC)

This discussion fell into sleep a while ago, unfortunately. Right now there is a RFC at en:WP about an edit counter opt-in which will hurt EU law when a community decides if data of single users will be aggregated and shown. I still think that it is not the right of a community to decide this but only the concern of everyone for him-/herself. NNW (talk) 11:31, 10 April 2014 (UTC)

For anyone still interested in this, I've opened a discussion at en:Wikipedia talk:Requests for comment/User analysis tool. SlimVirgin (talk) 23:13, 9 May 2014 (UTC)

Edits about tracking and personal information

This edits User:Elvey was remedied. User:LVilla (WMF) Elvey, please share context? (Like you did for some other thing here). Gryllida (talk) 04:30, 7 January 2014 (UTC)

To explain why I changed those -
  • this edit removed "retained" from the description of what we do with direct communications between users. I did this because we it is not accurate to say that we retain those - we may in some cases but in most cases that I'm aware of we don't.
    So does anyone think that justifies silence on this important topic? Not that I've seen (other than staff.)--Elvey (talk) 03:25, 11 May 2014 (UTC)
  • this edit removed an example about tracking pixels that Elvey had edited. Elvey's edit correctly pointed out that the example was a little hard to understand, but I don't think his edit improved it. I spent a little bit of time trying to explain it better without writing a book or assuming the reader is a web developer, and failed, so I deleted it. If folks want to take another stab at it, I'm happy to discuss it here.
Sorry for not explaining this earlier, User:Elvey - I do appreciate that you were trying to improve it :) —LVilla (WMF) (talk) 00:00, 9 January 2014 (UTC)
So does anyone think that justifies increasing opacity regarding this important topic? Not that I've seen (other than staff.) --Elvey (talk) 03:25, 11 May 2014 (UTC)

Layout problem

The blue-box summary for each major section in the left margin seems to be creating blank space in the main prose, as if there were a {{clear}} around it rather than being adjacent to the actual text. I'm using Firefox 29.0 on OS X. Seems to resolve itself if I make my browser window extra wide, so maybe something is hardcoded for some minimum something? Sorry, I can't upload images to meta to illustrate it. DMacks (talk) 00:16, 9 May 2014 (UTC)

Hi DMacks, thanks for pointing this out! We are looking into whether we can fix this. RPatel (WMF) (talk) 19:03, 14 May 2014 (UTC)

Typo

The phrase "such a merger" should read "such as a merger". If this is a community-developed privacy policy draft, why isn't it editable? I shouldn't have to post notices like this just to get a typographical error fixed. Semi-protection from IP vandals ought to be sufficient. If a page as contentious as en:w:Wikipedia:Manual of Style can be editable, so can this.  — SMcCandlish ¢ ≽ʌⱷ҅ʌ≼  08:09, 17 May 2014 (UTC)

Because, SMcCandlish, this Policy is approved by the Board, and the Board can only approve a particular version. People can't just add whatever they think "improves" the document afterwards, just as administrators can't just "improve" passed legislation. — Pajz (talk) 08:40, 17 May 2014 (UTC) (That said, I'm very sure both Legal and the Board welcome pointers to such errors, I'm just saying that this is unlike something like the Wikipedia Manual of Style.)
Somewhere in there it still says it's a draft being worked on, not an approved final policy. That's why I thought it should be editable.  — SMcCandlish ¢ ≽ʌⱷ҅ʌ≼  09:57, 17 May 2014 (UTC)
Thank you, SMcCandlish. We will fix the typo. RPatel (WMF) (talk) 03:05, 20 May 2014 (UTC)
Fixed! Thanks. RPatel (WMF) (talk) 20:16, 20 May 2014 (UTC)

There's still a need for more information

I understand that you didn't want to commit yourself absolutely in the policy. Nonetheless, "Once we receive personal information from you, we keep it for the shortest possible time that is consistent with the maintenance, understanding, and improvement of the Wikimedia Sites, and our obligations under applicable U.S. law." is a question waiting to be asked. Can you provide the users with a report of how long these retention times are, and especially, what obligations you feel you have under U.S. law? Wnt (talk) 10:57, 22 March 2014 (UTC)

Seconded. --Nemo 11:04, 22 March 2014 (UTC)
You can ask about the requirements of US law - but you can hardly ask Wikimedia to promise in the Privacy Policy (by giving a specific timespan) that those laws wont change. Alexpl (talk) 09:55, 22 April 2014 (UTC)
Retention timespans consistent with the maintenance, understanding, and improvement of the Wikimedia Sites can and should be provided.
And RPatel notes that they are provided, at m:Data_retention_guidelines.
Retention timespans consistent with perceived obligations under applicable U.S. law can and should be provided.
These, on the other hand are NOT provided at m:Data_retention_guidelines.
--Elvey (talk) 02:26, 6 May 2014 (UTC)
--Elvey (talk) 22:42, 22 May 2014 (UTC)
They sure can. But I see little benefit to the users, since such timespans do not apply to warrantless domestic wiretapping and data retention without any judicial oversight by state agencies. Alexpl (talk) 17:03, 8 May 2014 (UTC)
You're being myopic. Those with dragnet surveillance abilities aren't the only ones who can trample privacy rights. Privacy rights are regularly trampled without dragnet surveillance. --Elvey (talk) 20:55, 10 May 2014 (UTC)
The archives should prove how myopic I am about third parties. But fact remains that data could show up after the retention timespan consistent with the law, and I dont want WM to be held accountable for that because it had promised to have that data deleted by a specific date. Something like: "We will delete it after X years - but it wont disappear if the dataminig industrie or a state agency have gotten their hands on it before that date" does not sound helpful. Alexpl (talk) 06:05, 12 May 2014 (UTC)
I could have been clearer. I just meant to argue that the information isn't of little benefit. Something like, "We will delete it after 90 days. We have not been ordered to keep it longer by any government agency. But a court or agency could order us to do so, and could order us to keep the order secret (See enwp:National security letter). We take reasonable security precautions to protect personal information." does, IMO, sound helpful. --Elvey (talk) 22:42, 22 May 2014 (UTC)
Hi Wnt. Alexpl (talk) is accurate that we cannot predict whether our obligations under U.S. law will change in the future and require us to keep certain information for a longer or shorter period of time. One of the reasons that we chose not to include time frames in the privacy policy is that we want the flexibility to adjust our retention times as the law or our technological needs change, without seeking board approval for every adjustment. We do, however, provide our users with a better idea of what our promise to keep information for the shortest time possible means through our document retention guidelines. We also recently released requests for user information procedures and guidelines to provide our users with more information about our obligations under U.S. law and how we respond to requests for user information. Finally, we’re happy to answer, to the best of our ability, any specific questions you have if either of those documents don’t address them. RPatel (WMF) (talk) 20:29, 20 May 2014 (UTC)

Edit request (minor) - sectionlink to What This Privacy Policy Doesn't Cover

A minor edit request; the table in the section Definitions contains the words listed in the "What This Privacy Policy Doesn't Cover" section below. A sectionlink would seem more natural and user-friendly; listed in the What This Privacy Policy Doesn't Cover section below.

I appreciate that edits to this document can be costly. If this is more than a trivial change, please feel free to ignore. - TB (talk) 10:24, 25 May 2014 (UTC)

ERq 8592464 - phrase

"We believe that information-gathering and use should must go hand-in-hand with transparency."

to show strong commitment to the principle. Ivan Pozdeev (talk) 14:12, 31 May 2014 (UTC)

Important Consideration: I hope it's not too late, but fear it is.

It may have been discussed already, and realized the time and didn't have time to read the content of this page. If is has, then forgive and disregard this section.

It pertains to the fact that every user, whether or not a "functionary" exposes himself to possibility of civil or criminal prosecution for defamatory remarks made about another user. I've been doing research on WMF's concerns about attrition and plateaud new user registrations. One thing led to another, resulting in a dominoe effect landing me as the subject in a discussion group titled with my own username. Though the guidelines when opening a new topic state not to discuss anything defamatory or libelous, the fact of the matter is that everyone who comments about me in the room is in there to say not nice things... much of it defamatory and libelous. The admins are the worst.

Now, no one there seems to get it. I was protecting that room because I was protecting WMF, WP and other users from implicating themselves. Does anyone here know what I am talking about? It's everywhere on WP... plenty of notice about it.

You should begin with the actual Section of the article which is, itself, a violation of criminal law insofar as it begs for critique: PRESIDENTISTVB. Before I could do what is necessary to clarify the issue, I was blocked. All I could then do was edit my own talk page, so I created a section in answer to it: 60 Hours a Slave. I sent an email to Oversight to explain it a little better. You can read the letter I wrote to the Oversight Committee and then view these other two docs: [ONE] [TWO] (PW is username of admin who blocked me.)

The bottom line, as the three external references on my talk page reveal is that every user risks his personal, private information being revealed via court order accompanying a lawsuit, and I firmly believe all users should be made aware of it, in a more prominent way than we have been. I've linked some graphics in the content on my talk page. Make sure you read the three linked articles/items.

Again, if I'm visiting an area already fully discussed, then all I can say is, THANK YOU.

Best regards,

PresidentistVB (talk) 03:37, 3 June 2014 (UTC) PresidentistVB

Good luck but I don't expect the WMF to make any changes here. Its become pretty clear to me that the WMF doesn't have any interest in protecting editors rights or the rights of the readers. They only seem interested in further insulating the admins thus expanding the us and the mentality of adminship on Wikipedia. Unfortunately there is a seperate discussion about this on the English Wikipedia that has much more active discussion than here and I cannot edit there because I was banned to shut me up for criticising abusive admins. Reguyla (talk) 11:21, 3 June 2014 (UTC)

Slight/Major (depending on POV) changes to definition of PII, this policy, and data retention policy as a result of question about headers from Verdy_p

verdy p asked a question about HTTP headers on the data retention policy, and so we did some final reviews of our language on that issue. As part of that review, we realized Friday that there was a sentence in the main privacy policy that was poorly drafted (not future-proof) and inaccurate. It prohibited collecting four specific browser properties. This is bad drafting, because it isn't future-proof: what if new browser properties are added by browser manufacturers? What if we realize other existing properties are problematic? It also was inaccurate because some of this sort of data may be collected (but usually not retained or logged) for useful, non-harmful purposes. For example, it could be used to help determine how to make fonts available efficiently as part of our language work.

Reviewing this also made us realize that we'd made a similar drafting mistake in the definition of PII- it was not flexible enough to require us to protect new forms of identifying information we might see in the future.

We think the best way to handle this is in three parts, and have made changes to match:

  1. Broaden the definition of PII by adding "at least", so that if we discover that there are new types of identifying information, we can cover them as necessary. This would cover these four headers, for example, but could also cover other things in the future. (change)
  2. Added headers specifically as an example in the data retention policy, so that it is clear this sort of data has to be protected in the same way all other PII. (change)
  3. Delete the specific sentence. (change)

We think, on the whole, that these changes make us more able to handle new types of data in the future, while protecting them in the same way we protect other data instead of in a special case. Please let us know if you have any concerns. -LVilla (WMF) (talk) 18:47, 13 February 2014 (UTC)

Slight changes? Not in my view! This is MAJOR change. Revoking a commitment, the DAY BEFORE debate is scheduled to close, that browser sniffing is incompatible with this Policy is no slight change. I'm trying not to blow my lid, but I'm really pissed off! The deadline needed extension because of the change and needs extension, retroactively, now. Although it appeared at first that this MAJOR change was slipped in under the wire, I understand that it was prompted by verdy_p's questions starting 1/15. Still, asumming all that LVilla says is valid regarding future-proof-ness, that in no way justifies total removal of the commitment from the policy. The policy is now, once again, a blatant lie. I had fixed it. The time for considering such radical changes was back in December when this was discussed AT LENGTH. @LVilla (WMF): what about that discussion? I'm disappointed that no one else involved in the December discussion said a thing about this troubling change!
  1. Change 1 is awful; see the December discussion. I said then, "Let's not set a bad example and be deceitful about what we collect…" With the changes LVilla has made, if adopted, Wikimedia WILL BE setTING a bad example and beING deceitful about what IT collectS. If that happens, I'll be ASHAMED to be associated with it!
  2. Change 2 is awful for the same reasons.
  3. Change 3 … slight? Yeah, and nothing Snowden blew the whistle on was illegal.

I think the community is owed an apology and I think the changes need to be revisited. We need to stop lying to our users. Lying to our fellow users is inexcusable. If anyone wants to talk to me about this offline, let me know. --Elvey (talk) 06:44, 19 March 2014 (UTC)

@LVilla (WMF):, involved in the December discussion:@Geoffbrigham:, @Drdee:, @Stephen LaPorte (WMF): No response to my comment above? If this isn't going to be addressed, I guess I can ping the board directly to let them know, before they vote. --Elvey (talk) 02:24, 25 March 2014 (UTC)

We didn't respond because I don't think your criticisms are accurate, and your tone suggests you do not want to have a constructive conversation. In particular, the change you've characterized as "deceitful" allows us to add more things, but not take them away, from the list. I think most people would agree that, as we mentioned above, this is a pro-user and pro-future-proofing step - it allows us to protect users more in the future, but not less. If you'd like to take that to the board, feel free, but I'll feel very comfortable explaining to them why you're wrong. Sorry that we disagree. —Luis Villa (WMF) (talk) 18:09, 28 March 2014 (UTC)
You revoked a commitment to users that browser sniffing is incompatible with this Policy. That is no slight change, no matter how you spin it. And it seems inexplicable to me why you think that revoking a pro-user commitment to collecting less data is a "pro-user" step. But intelligent people disagree sometimes. --Elvey (talk) 02:31, 6 May 2014 (UTC)
The bottom line is that I pushed for and gained consensus for language that made it clear that the privacy policy would not allow browser sniffing. It was added and stayed in the draft for weeks. Then on the last day, it was removed. Now we have a privacy policy that allows browser sniffing, and yet claims to be informative. That's an untenable situation. That's the bottom line. If this is in any way inaccurate, I welcome corrections. Specific corrections only. Vague assertions based on no specific facts, as in your last comment, are not appropriate. --Elvey (talk) 06:35, 6 May 2014 (UTC) (update 20:49, 10 May 2014 (UTC): @Geoffbrigham:, @Drdee:, @Stephen LaPorte (WMF):, @LuisV (WMF): Well? )
But wait, the checkuser tool contains the IP address, Operating system and browser in order to identify potential sockpuppet accounts. Are you saying that they cannot do that anymore? Reguyla (talk) 18:15, 16 May 2014 (UTC)
The concern is that these individual things could be combined into a maybe-unique tracking tool, like a cookie. As we pointed out in the original comment above, we think the best way to deal with this concern is through the definition of PII and the data retention policy. This way we treat it in the same, careful way that we treat other personal information, instead of creating a separate, badly-defined category that can't be expanded or adapted as technology changes. We think overall that is both much safer for users and more likely to work in 3-5 years. —Luis Villa (WMF) (talk) 00:36, 21 May 2014 (UTC)
Thanks, LuisV and Elvey. Elvey: your concerns were noted and welcome. Luis's position is persuasive and considerate; and reflected in the policy adopted. SJ talk  19:10, 21 May 2014 (UTC)
With all due respect Luis, if the technology to do this right is 3-5 years out, then we shouldn't be leaving the privacy policy vulnerable to abuse for the next 3-5 years. I agree the policy needs updating and I agree that tools like the Checkuser tool need to be updated. But exempting a large chunk of the population with the most access to PII just doesn't make sense. Just in the last week there have been a flurry of incidents on the english Wikipedia where admins and even some members of the Arbitration commmittee, who have access to the Oversight and checkuser tools BTW, have displayed stunning lacks of good judgment. They tols multiple users to "fuck off", literally, not figuratively, they issued legal threats to an editor and someone even contacted an editors employer and included their Wikipedia user name and their real life identity. Now with this privacy policy they would be exempted from privacy policy completely. You may not agree with me and you may not change a word in the privacy policy, but I wanted to be on record for stating clearly and with no misunderstandings that these things are not ok. Reguyla (talk) 14:10, 22 May 2014 (UTC)
Can you post a link to a thread where this flurry of incidents was discussed?--Elvey (talk) 18:24, 3 June 2014 (UTC)

@LVilla (WMF):, would you please change the subject of this thread, as suggested by SJ, here?I went ahead and changed it.

Note, Folks: Wider discussion opened with an RFC at .en's Village Pump: https://en.wikipedia.org/w/index.php?title=Wikipedia:Village_pump_(policy)#New_privacy_policy.2C_which_does_not_mention_browser_sniffing. --Elvey (talk) 01:21, 28 May 2014 (UTC)

For the record I'm less concerned by the classic sniffing of browser capabilities: these headers are made with the purpose of allowing technical adaptation and compatibility and not really about "spying" users.
My question was about a much larger set of headers, including those that culd be inserted by random browser plugins that we don't use and don't need to develop our content to be available to the largest public.
But there are some concerns about logging and keeping data like prefered language : this preference is only temporary and does need archiving. All that matters is to know which language to render for the UI, and the content of the UI is not personal data and is unrelated to what users are doing in Wikimedia sites. If this data (incuding possible unique identifiers generated by plugins, which are even stronger than IP addresses) is used for collecting demographic data, the policies sed in Wikimetrics should be applied and we shouldn't need to archive it for individual users: this is personal data, used only by CheckUser admins, and that should be subject to the CheckUser policy, not used for anything else.
It is a concern because many users don't have any idea about what is transmitted in these protocol headers (and sometimes these unique IDs are inserted in protocol headers by malwares or adwares tracking users without user permission, in an attempt to bypass standard cookie filters, to track these users wherever they go in the Internet, we should not depend on them and shuld make sure than no other third party will be able to derive user data from our archive logs to correlate them with tracks left on other sites).
Note that some plugins are appending these IDs within the "User-Agent" string (normaly used only to sniff browser capabilities), so the full unfiltered User-Agent string should be considerd also as personal data (and the substrings we are sniffing in User-Agent should be very generic, never user-specific or specific to very a small community, including obscure browser names).
If developers need to get a log of values used in User-Agent strings extracted from server logs, in order to study some trends in new browser types we should support, they should request only this data and get an archive taken from a limited time period with minimal filtering of users (for example, filtering by version if IE prior IE5 is OK; per country is OK; per large ISP is OK; but per IP or per small IP block are bad). Such extraction of data for research & software development/improvement will be provided only to some developers, agreeing to not use specific substrings gound in these logs that could identity very small groups of users, we can sniff substrings in user-agent stings only if they are likely to match more than about 100 users over a short period outside peak hours (and this should not include the early detection of alpha versions of browsers tested by a few users; but such detection could be done experimentally on small wiki test sites or private wiki sites, whose content does not really matter and where some standard anti-abuse policies may be tested with contents that would not be accepted on a standard open project). verdy_p (talk) 09:14, 5 June 2014 (UTC)

L'application de la politique de confidentialité est-elle rétroactive intra muros de WP, quand des Wiki ne l'ont pas respectée auparavant en interne à l'égard d'autres wiki?

Problème passé (2012) sur une dénonciation : un Wikipédien a divulgué mon nom véritable, en l'associant à mon pseudonyme, en page de discussion d'article. A present, cette dénonciation de mon patronyme lié à mon pseudonyme continue d'être répétée dans les pages correspondantes à la demande d'information sur ma personne sur Internet quand on tape mon nom véritable. Le wikipédien dénonciateur n'a pas été inquiété et se trouve toujours parmi les contributeurs de la communauté Wikipedia. --Bruinek (talk) 12:03, 3 June 2014 (UTC)

Suite en 2014 du même problème de harcèlement déguisé de ma personne privée soi-disant au nom des "principes de WP": sur Wikipedia.fr, le même wikipédien a récidivé en m'interpellant par mon prénom véritable au lieu d'utiliser mon nom d'utilisateur dans "l'historique" de l'article sur l'écrivain Jean Sulivan le 5 juin 2014 à 21:25. Cf. aussi l'observation que j'ai faite à ce sujet dans la page de discussion de l'article Jean Sulivan : Violation de la politique de confidentialité de Wikipedia par.... Donc, que penser quand un wikipédien commence lui-même par violer la politique de confidentialité de WP pour dénoncer en public un autre wikipédien qu'il critique - quel que soit le motif invoqué -, en utilisant le nom véritable de cette personne, associé à son pseudonyme d'utilisateur? Ce wikipédien dénonciateur d'une information privée sur un autre wikipédien a-t-il pour le moins le droit de continuer de faire partie de la communauté Wikipedia? Sachant que l'information diffamatoire en question continue de figurer sur les pages du moteur de recherche Google (par exemple) dans les "réponses" fournies à mon nom véritable d'auteur par rapport à mes travaux de chercheur, livre et articles publiés! Et sachant que j'ai averti un administrateur du problème en 2012 et à nouveau en juin 2014. Ce wikipédien ne met-il pas Wikipedia en contradiction juridique flagrante à mon propos avec mon droit d'auteur sur Internet ? Moi seul(e), en tant qu'auteur, ai le droit à la divulgation de mon nom, y compris sous un pseudonyme (comme dans Wikipedia), et au retrait ! Voir Droit d'auteur et internet 2.2 Droit moral et internet: "Le droit moral de l'auteur correspond au droit à la paternité (ou droit au nom), au droit au respect de l’œuvre, au droit de divulgation et au droit de repentir ou de retrait. Ces droits sont inaliénables, perpétuels, insaisissables et imprescriptibles".--Bruinek (talk) 11:57, 15 June 2014 (UTC)
Pour l'incident de 2012, on peut difficilement revenir dessus (et le mieux c'était alors d'oublier dans les archives et tu peux toujours aussi demander à un admin de supprimer une info de l'historique public des pags concernées), c'est un peu tard, mais concernant celui de juin 2014 la politique était applicable. Parle-z’en à un admin de Wikipédia. Si tu fais preuve de harcèlement, celui qui fait ça devrait être sanctionné. On ne peut pas publier d'info privée sur quelqu'un sans son autorisation, même si l'auteur dispose de l'information. Maintenant si ça se limite à ton prénom c'est difficilement identifiable. S'il mentionne ton nom et qu'il n'est pas extrêmement courant (comme Martin, nom le plus utilisé en France...) c'est difficile de te localiser.
L'ennui c'est qu'il risque fort s'il t'a identifié de continuer à publier tout ce qu'il trouve sur toi, et verser toutes tes autres activités sur le web (surtout si tu as un compte sur un réseau social bien connu, ou si tu y as publié une photo de toi et aussi sur d'autres réseaux beaucoup plus sensibles comme des sites de rencontre) que tu ne voudrais pas lier à Wikipédia. Il n'est pas acceptable sur Wikipédia d'utiliser des infos glanées sur d'autres sites (et fortiori aussi sur les réseaux sociaux privés, c'est une violation de leur propre droit d'auteur qui ne donne accès à leur contenu que sur le site pour un usage privé, ou pour les démarchages commerciaux via certains filtrages et paiement de droits d'accès limité). S'il a obtenu des infos en rapprochant avec un réseau social, il a commis une violation de droit d'auteur (copyright) du site concerné. verdy_p (talk) 12:08, 15 June 2014 (UTC)
Note quand même: regarde [[11]] et tu verras qu'il y a une redirection qui mentionne un nom explicite. C'est public et si tu ne veux pas de cette redirection, demande à un admin de supprimer cette page de redirection. Quand tu as demandé le renommage de ton compte, la redirection n'aurait pas du être créée, ou bien l'admin qui a fait ça aurait du la supprimer immédiatement et rendre la page invisible de l'historique public et du journal public des suppressions. Ce renommage a eu lieu le 30 novembre 2007 à 21:34, il était visible en 2012, donc il n'y a pas eu de violation d'identité manifeste. Tu aurais du t'en rendre compte plus vite !‎ verdy_p (talk) 12:13, 15 June 2014 (UTC)

Tracking pixel

Where's the discussion which determined that this technique with "less than the best reputation" is needed on the voyage? The phrase "tracking pixel" doesn't even exist in the cookie FAQ. More dirty laundry hanging in the front yard, s'il vous plaît, if you're serious about public comment. MaxEnt (talk) 07:29, 8 May 2014 (UTC)

In the archive maybe. I´m not qualified to answer the FAQ problem. Alexpl (talk) 08:24, 9 May 2014 (UTC)
https://meta.wikimedia.org/wiki/Talk:Privacy_policy/Archives/2014 Obviously they're very very serious about creating the appearance of consultation with and acceptance of help from the user community. However, the history of edits shows otherwise, I saw no users arguing for the opaqueness around critical issues like profiling that I tried to address through comments and edits. And yet the edits I proposed and contributed were removed. On the plus side, although the policy is certainly not clear about what it is collected, at least it no longer claims to be clear about what it is collected. Earlier versions both were not clear and yet claimed to be clear. --Elvey (talk) 03:25, 11 May 2014 (UTC)
MaxEnt (talk), you can find tracking pixels in our glossary of key terms. If you would like to read some of the discussion we had during the consultation regarding this topic, please see answers from tech here and discussion regarding third party data collection here. RPatel (WMF) (talk) 18:59, 14 May 2014 (UTC)
RPatel (WMF), please stop keep not conforming to gender stereotypes of this awesome New Yorker cartoon! </joke> :-) (The #Anchors you added are helpful.) --Elvey (talk) 08:35, 4 July 2014 (UTC)

Exemptions from the Privacy Policy

I'm going to make this brief, because I don't think anyone really cares anyway, but I have a bit of a problem with the wording of this new privacy policy. In particular the part which says that Admins and functionaries (checkusers and the like) are exempt. Now I realize that there has been a developed culture where the admins here are treated like royalty and I agree there needs to be some language that allows them to do their tasks. But to say they are exempt from policy referring to Privacy information is a big problem for me. Functionaries I can go with because their identity and age are vetted. But administrators are selected by the community and their identities are never verified. There is enough problems with admin abuse on Wikipedia. We really should not be writing language that specifically excludes the from privacy policy. Reguyla (talk) 02:17, 15 May 2014 (UTC)

Are you referring to the "To Protect You, Ourselves & Others" section? The box on the left summarizes the cases when "users with certain administrative rights" can disclose information:
  • enforce or investigate potential violations of Foundation or community-based policies;
  • protect our organization, infrastructure, employees, contractors, or the public; or
  • prevent imminent or serious bodily harm or death to a person.
The third definitely makes sense. The second one is somewhat vague (protect the public/employees from what?), but seems reasonable. However, the first one could potentially be problematic. Violating WMF policy is very different from violating a "community-based" policy. Which part of the new privacy policy are you concerned with? I don't see anything where admins "are exempt", but I admit I only searched the document for the word "admin[istrator]". PiRSquared17 (talk) 22:07, 15 May 2014 (UTC)
Have you tried uncollapsing? The most important parts of the text are the two collapsed ones. Or, Talk:Privacy_policy/Archives/2014#Google Analytics, GitHub ribbon, Facebook like button, etc. and the three threads linked from it (plus some others). --Nemo 16:34, 16 May 2014 (UTC)
Oh yeah I read every word, which leads to a seperate issue of it being very long and sufficiently complex and legalistic to ensure very few will take the time to read it. In regards to the matter of admins and privacy. There are multiple problems with not clearly defining their role in the privacy policy. For example:
  1. There are about 1400 admins on the english wiki alone with varying levels of activity and interpretations of policy. Of that, only about 500 edit more than once every thirty days and of that less than 100 edit every day.
  2. They are not vetted through the WMF and are anonymous, makning privacy security dubious
  3. Even the the Functionaries like checkuser are questionable because eventhought their identifications are verified through the WMF. The verification process is pretty limited and the documentation isn't retained.
So I would recommend rewording the part about Admins like Checkuser, to refer to functionaries instead of admins and I would lose the loose wording of who is exempt. We don't have that many roles, we should just list them. Reguyla (talk) 18:12, 16 May 2014 (UTC)
@Nemo: Why are those boxes collapsed? They contain important information.
@Reguyla: Ah, I think I see what you are referring to now. "Administrative volunteers, such as CheckUsers or Stewards" is not clear whether it includes normal admins (sysops) or only CU/OS/Stewards (who are at least identified to the Wikimedia Foundation and have specific policies, as well as the access to nonpublic information policy). It would make sense to list out the specific groups or rights this covers. I don't see why admins should be exempt from policies regarding privacy. This wording seems to allow admins, essentially normal users with a few extra buttons, to disregard the privacy of other users, if I am interpreting it correctly.
@LVilla (WMF): are normal admins (sysops) exempt from this policy, or does that wording only apply to CU/OS/Stewards, who have more specific policies? PiRSquared17 (talk) 21:53, 16 May 2014 (UTC)
Hi Reguyla & PiRSquared17. Thank you for your comments and questions. We wanted to clarify why administrative volunteers are excluded from the privacy policy. The privacy policy is meant to be an agreement between the Foundation and its users on how the Foundation will handle user data. The Foundation can’t control the actions of community members such as administrative volunteers, so we don’t include them under the privacy policy. However, administrative volunteers, including CheckUsers and Stewards are subject to the access to nonpublic information policy (access policy). Under the access policy, these volunteers must sign a confidentiality agreement which requires them to treat any personal information that they handle according to the same standards outlined in the privacy policy. So, even though administrative volunteers are not included in the privacy policy, the access policy and the confidentiality agreement require them to follow the same rules set forth in the privacy policy. I hope that clears up any confusion. RPatel (WMF) (talk) 20:48, 20 May 2014 (UTC)
The Access to nonpublic information policy does not apply to "normal" sysops who are not identified to the Wikimedia Foundation, but who may have access to some private data (deleted edits). PiRSquared17 (talk) 23:07, 20 May 2014 (UTC)
@RPatel, Thank you for the response, but here is my problem with that. Checkusers, Oversighters and Stewards may sign an agreement and have their information vetted. Regular admins do not. They are still anonymous and since the "normal" admins have access to material which has been deleted, oftentimes including personal details like Email addresses, phonenumbers, etc. of edits made or derogatory material on BLP's, significant privacy issues can still be an issue. Also, your argument that you make about "the access policy and the confidentiality agreement require them to follow the same rules set forth in the privacy policy" is also applicable to regular editors, who frequently do not follow them. We have seen over the years a number of admins get in trouble, desysopped, banned, etc. for violations. Worse, we have also seen a number of admins, including some in the last week or two on Wikipedia, get away with pretty severe violations. So although I do not expect the WMF to make any changes, I still have serious concerns and hesitations about admins being exempted from the Privacy policy. Frankly, the admins are already held to a much lower bar than regular editors and frequently allowed to get away with things that would cause a regular editor to be blocked or banned entirely from the site, so this is just another example, of enabling a group of editors to be exempt from the policies that govern the site. Reguyla (talk) 20:22, 21 May 2014 (UTC)
@RPatel (WMF):, @LVilla (WMF): Reguyla- We haven't heard back since 16/20 May so I did diff because regular administrators clearly do have access to nonpublic information covered and defined by the Privacy Policy and because of the statement above by RPatel (WMF) that
"The Foundation can’t control the actions of community members such as administrative volunteers, However, administrative volunteers... are subject to the access to nonpublic information policy. Under the access policy, [all] these volunteers must sign a confidentiality agreement which requires them to treat any personal information that they handle according to the same standards outlined in the privacy policy."
I was reverted by Odder ~40 mins ago, without so much as an edit summary or other follow-up.
PiRSquared17 On what basis can you say that? I've provided two arguments for why that's not the case. We can't just put in place policies that are a more contradictory mess than the status quo. --Elvey (talk) 19:30, 27 May 2014 (UTC)
@PiRSquared17, I don't buy the argumetn that we can't control them so we just exempt them from teh policy. That makes absolutely no sense. Reguyla (talk) 20:10, 27 May 2014 (UTC)
@Elvey: My basis for that claim: The new version of the access to nonpublic information policy does not include admins in the list of users it covers. Also, admins do not necessarily meet the minimum requirements listed there. In fact, it says "Community members with the ability to access content or user information which has been removed from administrator view". If they wanted to include admins, then they wouldn't have added "which has been removed from administrator view". Being bold is fine in most cases, but (IMHO) you can't just add something to a WMF policy draft that was recommended to the Board without even discussing it on the talk page. FYI this seems to be the current version of that policy. PiRSquared17 (talk) 20:21, 27 May 2014 (UTC)
@Reguyla: I'm not sure what you're referring to (whom can't we control?). PiRSquared17 (talk) 20:21, 27 May 2014 (UTC)
I'm quoting your statement above where you say "The Foundation can’t control the actions of community members such as administrative volunteers". If that is the case, then that would also imply you can't control the editors either which makes the whole privacy policy pointless. You absolutely can control the admin corps, you have simply chosen not too and that is the problem. On En anyway the admins haev engrained a culture where they are above reproach and are exempt from policy already. Its next to impossible to remove the tools from even the most abusive admins and now they are exempted from the privacy policy too. I'm sorry but I have to wave the BS flag on that. I don't really even agree that the functionaries should be "exempt" but should be identified as having special roles that "requires" them to have access. Admins are not vetted through the WMF and they should not be exempt from the privacy policy. Reguyla (talk) 20:29, 27 May 2014 (UTC)
@Reguyla: I never said that; RPatel did. For what it's worth I agree with you. PiRSquared17 (talk) 20:45, 27 May 2014 (UTC)
Did you see this, Reguyla? PiRSquared17 (talk) 15:15, 28 May 2014 (UTC)
Yes sorry, it looked like you said it. Reguyla (talk) 17:12, 28 May 2014 (UTC)
Good points, @Reguyla:. What language changes should we make to avoid using "exempt" ? --Elvey (talk) 20:53, 27 May 2014 (UTC)
I don't know to be honest I would have to think about it. I'm pretty disallusioned with Wikipedia and the WMF at the moment so frankly I don't think they would listen to me anyway and anything I said would be a wsate of my time. I just wanted to make sure it was known that making admins exempt from privacy policy was absolutely not appropriate and was going to enable more abuse. Realistically nothing would ever happen anyway. The WMF stands behind the admins and I don't think they have ever interfered and the same goes for the admins themselves. Even if one is wrong they rarely admit it publicly and find reasons to defend even the most offensive violations of policy. So even if we said they were going to cooked over open flames if they violated the provacy policy nothing would happen because the WMF doesn't have any intention or desire of invovling them in the projects. Its beneath them.Reguyla (talk) 15:03, 28 May 2014 (UTC)
PiRSquared17: Either way, something must change. I agree when you say it's not OK that "This wording seems to allow admins, essentially normal users with a few extra buttons, to disregard the privacy of other users, if I am interpreting it correctly." We both see it as a problem. If I mustn't be bold, what then? It's OK for Odder to revert without so much as an edit summary or other follow-up? I say no. What do you say? We did discuss the need for a change, if not the actual change that I made, on this talk page, and the WMF took no action, for over a week, and I referred to this talk page in my edit summary. Please suggest or make a change that's better than the one I made. --Elvey (talk) 20:53, 27 May 2014 (UTC)
I think your edit summary here is a good example. PiRSquared17 (talk) 21:02, 27 May 2014 (UTC)
PiRSquared17: Of? Something must change. I agree when you say it's not OK that "This wording seems to allow admins, essentially normal users with a few extra buttons, to disregard the privacy of other users, if I am interpreting it correctly." We both see it as a problem. It's OK for Odder to revert without so much as an edit summary or other follow-up? I say no. What do you say? We did discuss the need for a change, if not the actual change that I made, on this talk page, and the WMF took no action, for over a week, and I referred to this talk page in my edit summary. Please suggest or make a change that's better than the one I made. --Elvey (talk) 20:53, 27 May 2014 (UTC)
The community consultation is over, according to the notice on the privacy policy and the access to nonpublic information policy, so I'm not sure. Has anyone from the WMF (perhaps RPatel) replied since? PiRSquared17 (talk) 22:07, 3 June 2014 (UTC)

Hi all. Sorry for the delay in response and for any confusion caused by my earlier response that referred to “administrative volunteers” — different types of volunteers should not have been lumped together with that phrase.

Correct me if I'm wrong, but you seem to be concerned that regular administrators (sysops) are not subject to the Access to Nonpublic Information Policy, but have access to material that has been removed from general public view (which may contain sensitive information, like email addresses, that was posted publicly).

By posting information publicly online, even if it is later removed from general public view, that information falls outside the scope of the Privacy Policy. The Privacy Policy covers "personal information", which is defined as "[i]nformation you provide us or information we collect from you that could be used to personally identify you" "if it is otherwise nonpublic.” Because sysops do not handle "personal information" within the scope of the Privacy Policy, we did not apply the Access Policy to sysops. Rules regarding sensitive information that has been removed from general view but still viewable by sysops is addressed in other policies, such as the oversight policy. Under the oversight policy, if a user is uncomfortable with sysops being able to view sensitive information in a particular situation, the user can ask for that information to be hidden. Oversighters who would handle these types of requests are subject to the Access Policy.

It is also worth noting that the Access Policy is meant to set minimum requirements for community members that do handle “personal information” as defined by the Privacy Policy. It does not limit a particular project’s community from imposing additional requirements or obligations upon community members, such as sysops who handle sensitive information. Each community must decide what is right for them and create policies accordingly. RPatel (WMF) (talk) 00:04, 4 June 2014 (UTC)

@RPatel (WMF): - That isn't entirely true and let me give you a couple examples why. Personal information, that would normally not be available or visible online is frequently passed around the backchannels through mailing lists and IRC while discussing issues or just in idle chitchat. That information is not generally allowed on Wikimedia projects and would generally be oversighted or at least revdelled. But it cannot be in the emails and IRC channels and these things are frequently logged and retained. I think we have all seen cases were these were used or leaked in inappropriate manners. The UTRS system is another good example. Lots of personal info is available there and any admin can have access. In fact there is a wanring message stating as such when the UTRS system is used. Many non admins have access to it as well making the problem even worse but thats a seperate issue. By exempting admins from the Privacy policy as its currently worded, is asking for trouble. IMO, if it ever went to court, any decent lawyer would have a good arguement for any number of exceptions to why the privacy policy violated users rights/reasonable expectation of privacy. I'm fairly surprised it hasn't already happened.
This privacy policy doesn't just cover Wikipedia or a couple projects. It is an umbrella policy designed to cover them all. Now if the WMF wants to restrict admins to those who are willing to provide personal info to the WMF to verify their identity or do that for those who wish to operate in the backchannels of IRC or UTRS, then maybe I could agree its fine. Another good step forward would be for the WMF to perform some oversite of the functionaries and admins of the Wikipedia site which is sorely lacking. But I don't think doing that is going to happen.
I for one already have serious concerns about the collegiality and civility problems of the english Wikipedia and the severe lack of leadership and oversight of the admins and functionaries of the project. If the site continues down its current path without some oversight or intervention by the WMF HQ team, no one is going to want to edit except some bullies and POV pushers (its almost to that point now). Exempting them is the last thing we should be doing to curb the rampant abuses that are already occurring. Reguyla (talk) 17:51, 4 June 2014 (UTC)

Definitions, simplification, reopening discussion

RPatel (WMF) [edit:revised] Can you add a definition of nonpublic information based on the one from Confidentiality_agreement_for_nonpublic_information to the definition section, or remove the need for one? SMcCandlish, we could fork/edit Privacy_policy/Proposed_Revisions --Elvey (talk) 10:25, 24 May 2014 (UTC)
{{editrequest}}
So, I don't think we should still have a notice that "Our Privacy Policy is changing on 6 June 2014". But since we do, to which version can we be switching? The one in place a month ago? the one with the fix RPatel just made? I don't think we can do the latter. So I think we should fix the outstanding policy issues and then repost notice that "Our Privacy Policy is changing on x xxx 2014".--Elvey (talk) 18:25, 27 May 2014 (UTC)
Hi Elvey, thanks for the question and suggestion. The privacy policy that will go into effect is the one that was approved by the Board, only changed since the Board's approval to correct typos, like the one pointed out above. To respond to your suggestion to add a definition of nonpublic information to the privacy policy, I wanted to point you to the definition of "personal information" in the definition section, which covers information that "is otherwise nonpublic and can be used to identify" users. The definition from the confidentiality agreement was not included in the privacy policy because that definition is geared towards information that volunteers would have and that is governed by the access to nonpublic information policy. For example, the confidentiality agreement definition specifies information users "receive either from tools provided to you as an authorized Wikimedia community member or from other such members." --unsigned comment byRPatel (WMF).
RPatel (WMF), are you aware that the Privacy Policy itself uses the term nonpublic information multiple times? Some of those uses of the term are far from any reference to the confidentiality agreement. I find it hard to imagine an argument for why is it better to leave the definition-and its very existence-hidden away. What's the benefit? Elvey (talk) 27 May
Hi Elvey. First, sorry about the previous unsigned comment! I think my previous comment was unclear. I read your suggestion as to take the exact definition from the confidentiality agreement and add it to the privacy policy, and I was trying to explain that the confidentiality agreement definition would not make sense in the privacy policy context (because it talks about authorized community members getting information through tools). But if you are just suggesting that a definition of nonpublic information be included, not necessarily the same definition from the confidentiality agreement, I want to respond to that as well. The privacy policy defines personal information and delineates how the Foundation handles it. Nonpublic information is a broader term that does not necessarily include personal information. For example, anonymized data that contains no personal information is "nonpublic" until we release it, whereas non-anonymized data containing personal information that has not been released (and would not be except as permitted under the privacy policy) would be both "nonpublic" and "personal information". The privacy policy does use the term "nonpublic information" and in most cases it's in reference to certain users with admin rights-- "who are supposed to agree to follow our Access to Nonpublic Information Policy" and nonpublic information is discussed in that policy. I don't think we're trying to hide its definition or existence but instead trying to be more specific by defining personal information. RPatel (WMF) (talk) 20:52, 28 May 2014 (UTC)
RPatel (WMF), Thank you for that explanation and for your patience. Indeed, Nonpublic information, private information, private user information, personal information - a lot of terms; perhaps a Venn diagram is called for. After having read the "Privacy-related pages", a user should know what is collected, know that WM employs it, and that access is restricted to approved projects and user groups, only. How should we resolve the problem of "Nonpublic information" not being defined where it is used? I have 2 ideas: A and B:
A) If we eliminate the term 'nonpublic information' from the Privacy Policy like this, is it a better policy? The Privacy Policy stops committing to protect the anonymized data you mention; is changing the status of the data in that section of the Venn diagram a significant negative? I don't see it. We simplify the document, eliminating an undefined term.
B)A definition of nonpublic information be included. I propose this one, which I derived from the extant one: "Nonpublic information. Nonpublic information is private information, including private user information, disclosure of which is covered by the Confidentiality agreement for nonpublic information. Nonpublic information includes personal information. It does not include information about a user that that user otherwise makes public on the Wikimedia projects."
Thoughts on these or other solutions, or the other changes I'm discussing with LVilla? --Elvey (talk) 20:21, 3 June 2014 (UTC)
Hi Elvey. Sorry for the delay in responding. We added a definition of nonpublic information here. Thank you for the suggestion! RPatel (WMF) (talk) 18:24, 2 July 2014 (UTC)
Wahoo! Thank you for taking it. --Elvey (talk) 07:39, 4 July 2014 (UTC)

vandal

I'm more than 10 years present, but I do not seem to be able to revert a vandalism here. See https://meta.wikimedia.org/w/index.php?title=Privacy_policy/de&action=history and the edits of the IP just now. I cannot revert them. It is a shitty system when you should study how to do it. A revert of a vandalism should be simple. -jkb- 22:50, 22 August 2014 (UTC) - - - P.S. My feeling is that more and more users are exluded from editing here. -jkb- 22:52, 22 August 2014 (UTC)

I've reverted those edits. For pages translated using Translate extension, you have to revert the edits to the translation units separately. Special:Contributions/198.228.200.168 and revert the edits to the pages in Translations: namespace. --Glaisher (talk) 08:40, 23 August 2014 (UTC)