Research talk:Predict Users Search

Latest comment: 9 years ago by Okeyes (WMF) in topic We don't have that data

We don't have that data[edit]

What it says on the tin. Or, more accurately; our search logs, intentionally, do not contain anything that can tie each search request to the previous request, or to other browsing activity; no PII (outside of the search string itself), no unique IDs. Okeyes (WMF) (talk) 00:40, 4 November 2014 (UTC)Reply

Responding to the rest of the request:
"We would need data related to pages users visited (in English), when they visited those pages and if possible if it was through direct link within Wikipedia."
This, we have, although I don't think it's something you're likely to find researcher time to extract. The R&D team at Wikimedia is ~5 people, all of whom have a lot of responsibilities already.
"Also, data about the users (gender, age, country...) might relevant be we are not sure at this point."
I..don't know how we'd get the gender and age of someone from their IP address and user agent, which is the only PII we gather from requests.
So, we don't have most of this data. And the bit we do have is, I'm afraid, a pain to extract; it's unlikely to be worth it, at our end, for an in-class project :(. Okeyes (WMF) (talk) 00:42, 4 November 2014 (UTC)Reply