Talk:Data retention guidelines/Archives/2014

From Meta, a Wikimedia project coordination wiki

Gender?

The following discussion is closed: closing, looks set, please reopen if not. Will archive in a couple days if still closed Jalexander--WMF 23:23, 22 January 2014 (UTC)

The examples list "email and gender in account settings" as examples of non-public data; however the account settings 'gender' property is publicly disclosed by necessity due to its purpose in producing grammatically correct strings.

Is this meant only to treat the combination of the two as private? Otherwise, we're leaking gender-by-username... --brion (talk) 21:24, 9 January 2014 (UTC)

Someone removed it, so ... :D --brion (talk) 21:41, 9 January 2014 (UTC)
Michelle did, but her login is apparently still on vacation :) It was removed in response to this remark. —LVilla (WMF) (talk) 21:49, 9 January 2014 (UTC)

Section 4 (Definition of personal information)

The following discussion is closed: closing since it seems set and no further response, please reopen if not. Will archive in a couple days if not. Jalexander--WMF 23:24, 22 January 2014 (UTC)

Information you provide us or information we collect from you that could be used to personally identify you. Reads a bit strange. Maybe a full sentence would be better here? Something like "Personal information means information you provide ..." maybe? --თოგო (D) 22:14, 9 January 2014 (UTC)

Hi თოგო. Thank you for your suggestion! We have adjusted the language accordingly. Mpaulson (WMF) (talk) 00:41, 10 January 2014 (UTC)

Possibilities in case of breaches

The following discussion is closed: closing since it looks set and answered, please reopen if needed. Will archive in a couple days if still closed. Jalexander--WMF 23:26, 22 January 2014 (UTC)

Maybe the policy should contain information about a place where users can go if they feel that the policy was breached. --თოგო (D) 22:17, 9 January 2014 (UTC)

Thank you თოგო for your comment - this makes sense. What if we added this sentence to the last section of the document (“Ongoing handling…”):

If you think that these guidelines have been breached, or if you have questions or comments about compliance with the guidelines, please contact us at privacy@wikimedia.org.

Would that address your concern? Any suggestions on how to improve it? :--JVargas (WMF) (talk) 00:14, 10 January 2014 (UTC)
Hi თოგო. I've went ahead and implemented Jorge's suggested language by adding a new section to the guidelines. Thank you for this helpful suggestion. Mpaulson (WMF) (talk)

Logs of terms entered into the site's search box

The following discussion is closed: closing as discussion seems to be over, please reopen if not. Will archive in a couple days if still closed. Jalexander--WMF 19:06, 23 January 2014 (UTC)

Why exactly is this data needed at all?Geni (talk) 00:44, 10 January 2014 (UTC)

Hi Geni, it's needed at least for debugging purposes. There are some searches that trigger bugs or performance problems, and we need to be able to go back and correlate searches with the bugs that get triggered. Sometimes malicious users may actually try to cause performance problems on the site via search, so we need to have the information correlated with IP addresses so that we can take action if necessary. We don't yet do much in the way of analytics on search traffic (that I'm aware of), but I could see that being of use in the future. -- RobLa-WMF (talk) 02:01, 10 January 2014 (UTC)

Non-public vs public data

The following discussion is closed: closing as discussion seems to be over, please reopen if not. Will archive in a couple days if still closed. Jalexander--WMF 19:06, 23 January 2014 (UTC)

It might be worth expanding on what is actually meant by non-public data and what data is public by default. For example, this page talks about IP addresses for visitors, which might be confused with IP addresses for editors which are either publicly visible (where an anonymous edit is made) or kept private but presumably kept indefinitely (for user accounts). I think most of this is in the privacy policy, but it might be worth summarising here. Thanks. Mike Peel (talk) 08:26, 10 January 2014 (UTC)

Hi Mike. Thanks for the suggestion! You are correct in that it is addressed more fully in the Privacy Policy draft, but we will draft some language to make that clear as well as give some basic examples. Mpaulson (WMF) (talk) 14:10, 10 January 2014 (UTC)
Thanks Michelle. :-) Mike Peel (talk) 14:44, 10 January 2014 (UTC)
Just to let you know, we added some language. Let me know if that works! Mpaulson (WMF) (talk) 18:56, 10 January 2014 (UTC)
I think it's much better now. The info is in the privacy policy, but you can't expect everybody to read it. //Shell 19:07, 10 January 2014 (UTC)

Donor data

The following discussion is closed: closing as discussion seems to be over, please reopen if not. Will archive in a couple days if still closed. Jalexander--WMF 19:06, 23 January 2014 (UTC)

Presumably this page isn't intended to cover donor data? It might be worth linking to wmf:Donor_policy. Thanks. Mike Peel (talk) 08:28, 10 January 2014 (UTC)

Hi Mike! While this document does not currently address donor data, we are hoping to eventually include retention practices in relation to donor data. These guidelines are meant to be a starting point for us and will get more detailed over time. In the meantime, I will see about getting navigational tools to other privacy-related documents (including the donor policy) added. Thanks for the suggestion! Mpaulson (WMF) (talk) 13:34, 10 January 2014 (UTC)

Sampled data

The following discussion is closed: closing as discussion seems to be over, please reopen if not. Will archive in a couple days if still closed. Jalexander--WMF 19:05, 23 January 2014 (UTC)

Does the 90 day rule for IP address applied to sampled data? As far as I know, the main motivation behind deleting IP addresses is to prevent tracking site usage back to a specific person. If site usage data is (sufficiently) sampled, it is much harder to do this regularly. Ottomata (talk) 16:01, 10 January 2014 (UTC)

It should apply. Your ISP might keep the correlation data IP address<->customer indefinitely, thus making it personally identifiable regardless of frequency. //Shell 19:09, 10 January 2014 (UTC)
Yes, the intent is that it should apply. This does probably mean there will have to be changes to certain existing setups, but as noted in the Audit section, that's a process we expect to occur gradually. —LVilla (WMF) (talk) 22:43, 10 January 2014 (UTC)

Examples in #How long do we retain non-public data?

The following discussion is closed: closing since it appears discussion over (Luis responding in above thread for further comment on emails), reopen if not. Will archive in a couple days if still closed

Are the examples in the table supposed to be exhaustive? If the WMF retain other types of non-public data, I believe this guideline should explain all of them but at the moment it does not read that way. -- (talk) 08:00, 11 January 2014 (UTC)

The examples are not intended to be exhaustive (they're examples, after all ;) It would be both impractical and not very useful to readers if we listed every type of data we collect. That said, the "data types" should be exhaustive - everything we collect/retain should all fit into one of those categories. Hope that helps. —LVilla (WMF) (talk) 21:40, 14 January 2014 (UTC)


Who are "we"?

The following discussion is closed: closing since it appears set, please reopen if not. Will archive in a couple days if still closed. Jalexander--WMF 20:02, 4 February 2014 (UTC)

Does this mean WMF? Or does this mean Wikimedia sites in general? --Rschen7754 23:46, 9 January 2014 (UTC)

Thanks for the question, Rschen7754! Whenever you see "we" / "us" / "our" in the text, we are indeed referring to the The Wikimedia Foundation, Inc., the non-profit organization that operates the Wikimedia Sites. This explanation is part of the “Definitions” section of the new Privacy Policy draft. Would it help if we added something like this to the document?

Terms that are not defined in this document have the same meaning given to them in the Privacy Policy.

--JVargas (WMF) (talk) 00:26, 10 January 2014 (UTC)
Yes, that would be helpful. --Rschen7754 00:28, 10 January 2014 (UTC)
We changed the "Definition of Personal Information" section to "Definitions" in the document, and we added the above sentence at the end of it. Thanks again!--JVargas (WMF) (talk) 00:52, 10 January 2014 (UTC)
Does this guideline only governs what WMF will do, but not those with access to private informations will do? Though I believe guidelines about what those users with access should do will likely be only nominal.--朝鲜的轮子 (talk) 03:03, 15 January 2014 (UTC)
@朝鲜的轮子: Generally we don't share personal information unless it is anonymized/aggregated. (There are exceptions in the privacy policy - the most important one is checkusers, but they will be covered by the Access to nonpublic information policy.) So we could make it apply to others, but (1) as you say, it would be hard to enforce and (2) in general they shouldn't have access to the data anyway. So I would prefer not to change it. Does that make sense? —LVilla (WMF) (talk) 20:25, 22 January 2014 (UTC)
Note that for CUs, we're talking with them about how to minimize retention, even though this policy won't formally apply to them. —LVilla (WMF) (talk) 20:27, 22 January 2014 (UTC)

Is there any possibilities of reasonable IAR on this guideline?

The following discussion is closed: closing since it appears set, please reopen if not. Will archive in a couple days if still closed. Jalexander--WMF 20:01, 4 February 2014 (UTC)

I just imagined a case: A checkuser once did some CU, and results are some data from a range of time, some of which will expire very soon(for example, 89 days old data). Whether these data that are about to stale are important or not(say, if one user's last logged action was 89 days ago, and that data related to sockpuppet issue, and later attempts to CU would likely result in "stale"), is it reasonable to keep the data beyond 90 days for the sake to make the issue clear? --朝鲜的轮子 (talk) 03:04, 15 January 2014 (UTC)

@朝鲜的轮子: Let me answer this in two parts, to explain:
For employees, IAR is not an option :) If an exception was made, it would be added here.
As I said above, for CUs, this generally doesn't apply. Whatever rule we do work out with the CUs will hopefully not be a rule that interferes with their work; we hope that if they are tempted to IAR, they'll discuss with us (or the broader community) first. —LVilla (WMF) (talk) 20:51, 22 January 2014 (UTC)