Jump to content

Research talk:Task recommendations/Work log/2014-07-17

Add topic
From Meta, a Wikimedia project coordination wiki

Thursday, July 17th

[edit]

Today, I want to get three things done:

  1. Gather a sample of the most common "returnTo" pages in the main namespace that are edited
  2. For each returnTo page, gather 15 similar articles that implement a set of filters (that I'll describe below)

Filters

[edit]
article_length > 0
Sanity check that it's not blank
page_namespace == 0
Main namespace (article)
input_title != output_title
Don't return the same page we're searching with
Filter en:Category:Living people
no biographies of living people -- too difficult for newbies to edit without being reverted

Sample of returnTos

[edit]

I originally thought that we could just sample the top N returnTo pages on user registrations, but now I realize that I'll need to sample from the whole set. If we only look at the most common articles titles, we could miss a very large proportion of articles that aren't returned to frequently. Stupid long tail distributions. No worries. It shouldn't be too hard to sample.

OK so, I'm going to need to filter returnTos. I don't want any returnTos that are not in the main namespace. I could probably also filter out BLPs, but I'm not sure that would really matter. I should probably filter returnTos to those that were edited by the newly registered user within 24 hours or so. Query time. --Halfak (WMF) (talk) 18:53, 17 July 2014 (UTC)Reply