Talk:Article counts revisited

From Meta, a Wikimedia project coordination wiki

Quick read[edit]

Thanks for compiling this history. As far as I can see, it's all correct, though I can't answer the questions in red from the top of my head. I'm only unsure about Wikistats.

  1. For sure there are varied opinions on the value of wikistats' dumps-based approach, however my point when I told people to use wikistats is that Special:Statistics is not meant as an evaluation tool or anything official, it's just a tool for trivia. Wikistats, however broken, is our "official" point of reference.
  2. I've never been 100 % sure about how categories are or should be dealt with. At some point, I thought we had agreed that the 2011 system was supposed to count category links as well; I'm not sure what ParserOutput::mLinks contains. I think counting categories is a correct implementation, maybe we should do the same in MediaWiki.
  3. I don't know about stub dumps in detail, I'm afraid. :( I think the issue tracker has some discussion about them from which you may gather more information, though. The wikistats source code is also worth reading, IMHO. --Nemo 16:54, 26 April 2015 (UTC)[reply]
I checked a few files in the Wikistats source code but never really found what I was looking for. Thanks for looking over what I've written… - dcljr (talk) 08:53, 27 April 2015 (UTC)[reply]

Addressing the specific points above:

  1. Developers, "Analytics" folks, and "Foundation" folks may rely on the Wikistats information, but on the individual wikis themselves, many people have gotten used to using the site statistics as if they're meaningful. And I think from this point onward they (the article counts, anyway) will be quite a bit more meaningful that they have been in the past.
  2. I think "manually added" category links "weave" pages into a wiki as much as internal wikilinks do, and so should probably count when determining article count. However, many templates add categories to indicate problems with articles (candidates for speedy deletion, for example), and so template-provided categories probably shouldn't count. I don't think we want the software to have to make that distinction, though (for presumed performance reasons), so adding categories to the definition of an article may not be realistic.
  3. For now, I'm just going with what Wikistats reports about stub dumps on its "Tables" pages (as discussed in the Wikistats section). I've asked Erik Zachte to look over the text the same way I asked you, so he may be able to clarify or correct some things I've said about Wikistats.

- dcljr (talk) 21:15, 27 April 2015 (UTC)[reply]

I'd like to clarify that many or all official WMF communications (such as job ads or press releases) actually use the value of wmf:Template:ALL-WP-COUNT, which is updated from List_of_Wikipedias#Grand_Total or its source http://wikistats.wmflabs.org/display.php?t=wp (confusingly also called wikistats, although it is completely separate from stats.wikimedia.org). The latter is run by User:Mutante, who explained at phab:T89283#1035167 that this is based on the "content pages" numbers from Special:Statistics on all Wikipedias (collected via the API) and "imho that is reliable, unless Special:Statistics itself has a bug i don't see why it would be different from parsing dumps (which seems more error prone)".
Regards, Tbayer (WMF) (talk) 04:09, 28 April 2015 (UTC)[reply]
I guess maybe Mutante is in for a shock, then… [grin] Actually, since the errors on individual wikis are sometimes this way, sometimes that way, the total article count of a project (like Wikipedia) is less prone to systematic over- or undercounting (i.e., the errors tend to "average out" when all wikis of a project are taken into account). To be specific, when the article count changes seen on March 29th are summed over all Wikipedias, the change in total articles was only about 0.1% (a rise of one-tenth of one percent). Interestingly, though, a much greater percent change, in the other direction, was seen in total articles in all 677 recounted wikis (a drop of 0.9%). In any case, we shouldn't see dramatic shifts like this in the future, assuming they keep periodically recounting the articles. - dcljr (talk) 07:00, 28 April 2015 (UTC)[reply]
Well, we know that job ads and press releases are not particularly striking examples of correctness and precision. :) --Nemo 11:36, 28 April 2015 (UTC)[reply]
I say lies and damned lies Bennylin 11:02, 30 April 2015 (UTC)[reply]

NUMBEROFARTICLES, where to go from here?[edit]

Regarding wikistats, is it correct that http://stats.wikimedia.org/EN/TablesArticlesTotalAlt.htm is the old table, while http://stats.wikimedia.org/EN/TablesArticlesTotal.htm is the new table from the recent redefinition of what an article constitute? Because I saw a doubling in the number of total articles in id.wp. Currently we still stand around 350k (which is the number in the Alt table), while the real count is now 600k?? We upload the graph from Nov 2013 [Graph 1] at around 325k, but now the page [Graph 2] tells me that at that time the number was around 525k. To confuse us more, the {{NUMBEROFARTICLES}} still shows the 350k number, which is defined (in the Alt table) as "Articles that contain at least one internal link and 200 (ja,ko,zh:50) characters readable text, disregarding wiki- and html codes, hidden links, etc.; also headers do not count". id:Special:Statistics also still shows 350k without any explanation that the real count was actually 600k.

So the question, where to go from here? Should the {{NUMBEROFARTICLES}} shows the number of article "http://stats.wikimedia.org/EN/TablesArticlesTotal.htm that contain at least one internal link]" or the alt count? Bennylin 11:14, 30 April 2015 (UTC)[reply]

Back then I already told everyone on IRC on steward chat and Wikidata chat. No answer given.--AldNonymousBicara? 14:23, 30 April 2015 (UTC)[reply]
Bennylin, I don't understand your question. NUMBEROFARTICLES is a MediaWiki feature, see mw:Manual:Article count for its expected behaviour. --Nemo 18:34, 30 April 2015 (UTC)[reply]
@Bennylin: The "alternate" article counts at Wikistats don't match any defintion of an article that has ever been used in the MediaWiki software itself (which is presumably why it's called "alternate"). But as described in the "Wikistats" section, the so-called "official" counts of Wikistats also differs from the way MediaWiki counts articles. (By the way, both Wikistats tables you link to are based on the latest complete month of database dumps available, so neither is newer than the other.) Now, because of the different definitions used in the Wikistats tables, the fact that one says "352 k" while the other says "602 k" presumably means that 250k (the difference of the two numbers) of the Indonesian Wikipedia main-namespace pages with at least one wikilink (or Category link, which Wikistats also counts, but MediaWiki doesn't) contain fewer than 200 characters of text. I can't explain the difference between Wikistats' "602 k" and MediaWiki's "358,512" content-page count, other than to suggest that about 243k of the relevant pages have only a category link and not a wikilink (the difference between Wikistats' "official" definition and MediaWiki's "link" based definition) — but that doesn't seem to match what repeated use of "Random article" shows on the wiki itself. Finally, the fact that Wikistats' "alternate" count (352 k) and MediaWiki's "content page" count (358,512) are approximately equal is just a coincidence, as far as I can tell. As for what Wikistats reported 1.5 years ago, that will not match what it says today was happening 1.5 years ago, because today's report is based on the current state of the wiki, whereas what it reported 1.5 years ago was happening at that time was based on the state of the wiki 1.5 years ago. This is all very confusing for people who haven't gotten used to the idea (like me, a few weeks ago), and is completely different from the way most people think "past" article counts should behave (i.e., that they should not change over time — the current counts can change, but not the past ones). A comment from Erik Zachte, which I have yet to incorporate into my discussion of Wikistats, might help people to better understand Wikistats counts. Note that Wikistats' definitions have not changed in many years, as far as I know, and neither have MediaWiki's (different) definitions (not since 2011). The on-wiki article counts just got recalculated to reflect the definition(s) actually in use since 2011. - dcljr (talk) 07:36, 1 May 2015 (UTC)[reply]
Can you explain more about the different graphs again? How can "current state of the wiki" adds about 200k of articles from 1.5 years ago? I can understand if it substract some because of deletion, but adding that many? And then for the TablesArticlesTotal.htm, I searched wayback machine for these glaring change between Jan and March snapshot of the table (it affecting the 4 out of 5 Wikipedias you mentioned)
* https://web.archive.org/web/20150109014525/http://stats.wikimedia.org/EN/TablesArticlesTotal.htm (9 Jan) Nov 14: ar 339 k, sv 1.9 M, id 351 k, jv 47 k, sw 27 k
* https://web.archive.org/web/20150318100548/http://stats.wikimedia.org/EN/TablesArticlesTotal.htm (18 March) Nov 14: ar 602 k, sv 3.1 M, 595 k, jv 59 k, sw 28 k
What gives? Bennylin 16:42, 5 May 2015 (UTC)[reply]
Interestingly, of the 5 Wikipedias that have up-to-date Wikistats article counts based on full dumps (see "Wikistats" section), 4 of them have {{NUMBEROFARTICLES}} much closer to Wikistats' latest (Feb 2015) "alt" count than Wikistats' latest "official" count. Only 1, Swahili, is closer to Wikistats' "official" count than the "alt" count. Odd. But like I said, all 3 counting criteria are different, so I still say it's just a coincidence. - dcljr (talk) 17:46, 1 May 2015 (UTC)[reply]
Another id.wp user had suggested that the offset of 250k-ish articles may come from redirects in the main namespace. I couldn't confirm this. @Nemo bis:: it's simple, our statistics page and announcement both are using {{NUMBEROFARTICLES}}, where the graphs came from the wikistats (which now differs). If they don't match, then one of them is wrong (or at least using different count method), whereas previously, they're both the same (or almost the same). The question, therefore, assuming that the http://stats.wikimedia.org/EN/TablesArticlesTotal.htm is correct, and thus the {{NUMBEROFARTICLES}} is wrong. Bennylin 10:06, 5 May 2015 (UTC)[reply]
The key phrase there is: "or at least using different count method". Wikistats has always used a different counting method than MediaWiki. They have never (as far as I know) been trying to count the same thing. I can't explain anything more about Wikistats beyond what I've said here (except that it doesn't count redirects as articles), because I am not sufficiently familiar with it. You should ask User:Erik Zachte any questions you still have about Wikistats. (And finally: I'm not sure why you're assuming Wikistats is correct and MediaWiki is wrong, especially since the Indonesian Wikipedia's {{NUMBEROFARTICLES}} barely changed when the wiki was recounted on March 29th, while the Wikistats counts have changed radically in the last few months, as you point out.) - dcljr (talk) 03:54, 6 May 2015 (UTC)[reply]

Does these actions change the article count?[edit]

  1. Undelete a page (in main namespace; non-redirect). Expected behaviour: add count.
  2. Undelete some revisions (page already exist). Expected behaviour: no changes.
  3. Move over redirect. Expected behaviour: add count.
  4. Move and delete target page (two became one). Expected behaviour: subtract count.
  5. Move redirect over a page. Expected behaviour: subtract count.

Bennylin 16:41, 5 May 2015 (UTC)[reply]

Good question(s). Can you not test these yourself, since you're an admin on at least 2 wikis? (I forgot about undeleting. Thanks for reminding me about that.) - dcljr (talk) 04:13, 6 May 2015 (UTC)[reply]

Disambigation[edit]

Why disambiguation pages counted like articles? It's not article? it's just a kind of table of contents. ShinePhantom (talk) 06:35, 22 May 2015 (UTC)[reply]

Wrong counting[edit]

ur.wiktionary have more than 25,000 article pages but [[Special:Statistics|{{NUMBEROFARTICLES}}]] returns only 5,138 articles, and does not include articles created by bot.
How this problem can be fixed?
Thanks!
شہاب (talk) 12:21, 31 May 2018 (UTC)[reply]

@شہاب: Using Special:Random several times, I can see that a large proportion of the main-namespace pages in urwiktionary have no [[wikilinks]]. As described at mw:Manual:Article count and mw:Manual:$wgArticleCountMethod (urwiktionary uses the "link" method of article counting), in order for a page to count as an "article" (or "content page") it must contain at least one "regular" wikilink to another page on the same wiki. Thus, an edit like this will turn a non-article into an article. - dcljr (talk) 19:39, 31 May 2018 (UTC)[reply]
Thanks a lot!
شہاب (talk) 19:53, 31 May 2018 (UTC)[reply]