Talk:New Readers/Offline/Content curation

From Meta, a Wikimedia project coordination wiki

Picking articles[edit]

The suggestions currently posted by Anne Gomez look like a great start. I would suggest that for the English Wikipedia, a good starting point is with the article lists at https://tools.wmflabs.org/enwp10/cgi-bin/list2.fcgi. It would also be good to review how we've done selections in the past. In particular, I recommend using the article score rather than simple page count. I'm fond of pointing out that at one point the article w:List of gay porn stars was one of the top ten articles in terms of page views, and although the porn industry is important, that article shouldn't rate that highly! It's also a problem if someone has just been in the news on the day when you make the selection, you can get a distortion of importance.

We reckon that the best measure of importance that we have is a combination of WikiProject importance assessments and the "External Interest Points" which are a weighted aggregation of page views, links into the page, and no. of language versions. The formula we use is described on that article selection page I mentioned, and the overall score also includes the article quality. You can see a typical set of results - done for Egyptian articles, where I've chosen to list the top 1000 Egypt articles by score. Note that Ancient Egypt only scores above Eqypt because the latter is C-class whereas the former is a Featured Article. Usually these lists are a pretty good reflection of the "must have" articles in a certain subject area. There are usually a few strange entries in these lists where an article is perhaps inappropriately tagged, but even then the article will be of high general importance. I'll amend the page slightly to include mention of article score.

Categories do have useful information, but I want to give one warning on using category hierarchies. Often as you go up through the category hierarchy you get much more general, and things can overlap into unrelated subject areas. One of my favorite examples was how Belgium used to appear under the high-level category "France". This is because there is (or at least was) a category "Countries bordering France" which naturally included Belgium. For the same reason we used to find night clubs showing up under chemistry (because they serve ethanol, a chemical). Some of these problems have been fixed, but I wouldn't rely on categories to get what you want unless you stay pretty close in the hierarchy tree, or if you semi-manually tag all the categories containing over 3 selected articles (which we actually did for one release) (e.g, any category containing the phrase "New York" is likely to relate to the US city). Of course WikiProject tags can have problems too, but if they're available (e.g., on English and to some extent the French Wikipedia) they're usually better than categories for all but the top level articles in a subject area. Walkerma (talk) 04:41, 19 October 2017 (UTC)[reply]

K to 12 education[edit]

I have begun working with a Rotary group in Canada regarding creating a K to 12 education ZIM. If people wish to work on that to some extent it would be excellent.

We currently have WP 0.8 as a ZIM and WP 1.0 is being worked on.Doc James (talk · contribs · email) 12:15, 19 October 2017 (UTC)[reply]

False positives[edit]

Hi there, I've played around with the first iteration of Zimmerbot and I think it has some great potential! However I'd like to better understand how article selection is being made. I've looked at pasta, in Italian, and had the tool return 100 results: the list was quite varied, with a few false positives. Can we have a look a the selection algorithm somewhere?

Also, on a side note, and as a suggestion for later iteration, it would be nice if the tool would generate/export an index file listing all articles. Ah, but I see it already was in the suggested outputs!

Thanks, Stephane (talk) 13:49, 12 December 2017 (UTC)[reply]