Talk:List of largest wikis/Archives/2006

From Meta, a Wikimedia project coordination wiki

What's up with the number 8,000?

Why is 8,000 our cutoff point? Shouldn't we make it some number like 10,000 that is more commonly used? (I suggested 10,000 because it is an even power of 10: all four digit figures and below would be on the previous list, while all five digit figures and above would be on this table.) 69.153.249.101 14:36, 3 January 2006 (UTC)

Well, 8000 is the lowest I could make sure of of not having ommitted any non-wikimedia wikis. It even used to be at 7000 at some point of time but then I ran into persian wikipedia, whose figures I cannot copy and paste.

In all fairness the limit on this page was originally set at 7500, another seemingly random 'round number'. I agree the limit should be eventually brought to 10,000 , but let's wait until 100 sites have reached the list. I think a nice round 'top 100' list would be great. --Jamie

How should we make this transition to a 10,000 limit? I had been planning on a graduated increase as follows: When at least 75 Wikis have hit 7500, the limit should increase to 7500 (this has already happened). When at least 80 Wikis have hit 8000, the limit should increase to 8000 (this just happened). When at least 85 Wikis have hit 8500, the limit should increase to 8500. This would continue under the same pattern until 100 Wikis have hit 10,000. However, another user has objected, claiming that the list's lower limit should continue to decrease until all non-Wikimedia Wikis are "discovered". Comments, anyone? 69.153.249.101 19:01, 4 February 2006 (UTC)

I think, the list should be allowed to grow, i.e. the lower limit not moved upwards. --Purodha Blissenbach 12:05, 27 September 2006 (UTC)

Colors

Someone should explain the significance of the colors used in the table (white = Wikipedias, red = Wiktionaries, green = everything else?). - dcljr 23:40, 20 January 2006 (UTC)

There is a legend below the table. But agreed the colors are kinda ugly ;) Mutante 22:53, 26 April 2007 (UTC)

Lower Limit

Please clarify what you mean by "lower limit must be above what size it must be possible to discover all existing Wikis". Does this mean that all "non-Wikipedia, non-Wiktionary wikis must be visible" or does it have some meaning that is not related to this table? I say this because if you look at the discussion page, some of us had agreed to eventually raise this limit to 10,000 as more Wiki's grew stronger. I had planned to make the increase gradually on the following criteria: If at least 75 Wikis were above 7500, the limit would be 7500. If at least 80 Wikis were above 8000, the limit would be 8000, and so on by this pattern. For what reason should this not be done? Please post any response on the discussion page for the List of Largest Wikis.

As soon as I am pretty sure of having discovered all wikis in a certain size segment (here between 7500 and 7000) I extend this list by this certain size segment. This task becomes harder as you move down in size. This task also becomes harder as new wikis grow into an existing size segment. Therefore, if all but two or three wikis have grown out of the lowest size segment, I even gat set the lower limit higher as I assume that as many wikis have grown into this segment unnoticedly as ther have grown out of this segment.

So you are only trying to keep all the "green" wikis on the list? While it does sound like a good idea, I do not believe this is the place to do that since this list is titled "List of largest wikis" (keyword largest). I would suggest that another list be created that would contain all non-Wikipedia, non-Wiktionary, non-etc. wikis and their totals. Those that make the cut for this list would still be included, but all of those "green" wikis would go on the new list. That would be a way to allow all such wikis to be represented without compromising what I believe to be the intent of this list. 139.78.10.1 20:35, 2 February 2006 (UTC)

No, this is not the case. I have found many "green" wikis that are way too small to be included into this list. I also cannot be sure that I have found all of them within a certain size range. Furthermore, an alphabetical list of "green" wikis alredy exists as the interwiki map reference. I do not intend to make another comprhensive list of "green" wikis, I just like to keep the list as it is. Somebody made the suggestion of extending it to the "top 100" and, well, we slowly are getting there.

Wow! We already have 95 Wiki's above 7000 due to some new discoveries! 5 to go until the list will truly be the "Top 100". 139.78.10.1 20:13, 9 February 2006 (UTC)

I say we leave the lower limit where it is until 100 automatic wikis are reached. (The ones updated by the bot script) It is the nicest roundest number possible, it will also make it easier to update the wikis not yet on the list (see above), such as the Muppet Wiki, et al.
In fact, I am not sure why we raised it recently... I say lower it back down... Include a few more in the auto-script. What is the harm in listing them, if no additional manual labour is required?
--JamieHari 00:29, 14 March 2006 (UTC)
It was raised simply because all of them have grown above the next higher round figure. As soon as we can be utterly sure of having detected _all_ wikis within a certain size segment without having to dread that one or more might have gone unnoticed, we will extend the lower limit. So far I know of three wikis within the 7000-7500 range:
  1. Muppet Wiki
  2. No Smoke
  3. FKK Wiki

Stochastically, there must be many, many more. Can you help out? RobiH 06:52, 14 March 2006 (UTC)

Another nearly "natural increase" of the lower bound has occurred, moving us up from 7,500 to 8,000. Simple English Wikipedia has briefly fallen off the list which currently stops at 8,000, but should gain this number shortly (it is at 7970). As soon as it hits 8,000, it will return to the list. 139.78.10.1 23:50, 29 March 2006 (UTC)

Wikisource color

I have changed the Wikisource entries to an orange-yellow color to set them off from the rest of the non-wikipedia, non-wiktionary wikis. So far, only three Wikisources are on the table. 139.78.120.46 18:19, 3 February 2006 (UTC)

Good Idea. Colorize all currently green Wikimedias orange. Non-Wikimedias shall remain green.

Your idea of setting off other Wikimedia projects is a good one, so I have implemented the following colorscheme: Wikibooks = purple, Wikitravel = light blue, Wikimedia commons = an even lighter shade of blue. No other type of Wikimedia project that I know of has made the list as of yet. 69.153.249.101 01:11, 4 February 2006 (UTC)


Actually, this is too many colours. I propose the following scheme:
White = Wikipedia
Red = Wiktionary
Orange = Other Wikimedia
Green = Other Mediawiki
Blue = Other Wikis

What is the difference between Wikimedia and Mediawiki? 69.153.249.101 18:55, 4 February 2006 (UTC)

Wikimedia is the organization and mediawiki is the server software. Many wikis use mediawiki without being part of wikimedia.


Bot / getting statistics by script

On the article a bot for this page is being requested. When thinking of how to make a (cronjobbed) script to get the statistics,i ran into the problem that cutting out the actual numbers from all those Special:Statistics pages is not trivial because they all use different languages, and i cant find a good string to grep for or cut off at, that would always stay the same. Isnt there any nicer way to get the pure numbers from mediawikis, despite the language they are using? Mutante 21:17, 5 February 2006 (UTC)

Aaah, already got the answer:

< Nikerabbit> http://en.wikipedia.org/wiki/Special:Statistics?action=raw how about this?
} cool, that makes it a lot easier, gonna try making a script.. Mutante 22:05, 5 February 2006 (UTC)

Discussion continues here.

<-- wiki syntax autocreated by script, but still need to add colors and missing Wikis that dont have the Special:Statistics?action=raw feature. Mutante 21:07, 8 February 2006 (UTC)

Nation State sensor broke down again.

The retrieval script for the NationState Wiki is malfunctioning again, registering all zeroes for stats. Could somebody fix it? 139.78.10.1 19:59, 9 March 2006 (UTC)

There's another malfunction, this time for cnic.org. Please fix it. 139.78.120.88 00:58, 25 March 2006 (UTC)

Done. RobiH 09:36, 26 March 2006 (UTC)

WikiZnanie.ru is registering all zeros. Could someone please fix this? 139.78.10.1 01:19, 15 April 2006 (UTC)

Missing Greek Wiktionary (9,000 articles)

132.204.227.46 21:04, 20 March 2006 (UTC)

Done RobiH 20:59, 21 March 2006 (UTC)

'Files' column in the largest-wikis table...

Hey,

I just thought it would be a good idea to see how many 'files' (aka images) are in each wiki, so I have just had the 'raw' statistics output changed to include an image count as well. It should be implemented in the next few weeks. Keep an eye out for it and when you see it, could we add it to the script?

Of course, since it is only implemented in CVS at present, only version > 1.6.0 wikis will have it. I would assume all Wikimedia projects should have it before the end of April. All other wikis on the list could show either 0, or 'n/a' in the meantime...

Cheers,

--JamieHari 02:40, 28 March 2006 (UTC)

Hey RobiH,
With the release of version 1.6.1 all wikis that upgrade, now have the "images=x;" data included in their raw stats page...
(My database for example)
I was just wondering what you thought about adding this as a new column to the table script?
--JamieHari 06:52, 6 April 2006 (UTC)
Notify me as soon as all Wikimedias, all Wikias and all EditThis Wikis are upgraded. Then it'll be implemented. RobiH 15:39, 9 April 2006 (UTC)
RobiH,
It looks like it has been implemented pretty much across the board. You should be good to go any time now... :)
JamieHari 22 April 2006

It has been added to all the scripts now. Mutante 07:06, 8 May 2006 (UTC)

Wikiznanie

WikiZnanie.ru is registering all zeros. Could someone please fix this? 139.78.10.1 01:19, 15 April 2006 (UTC)

They have changed the URL to http://www.wikiznanie.ru/ru-wz/index.php/Special:Statistics?action=raw

But this new URL produces double headers that cannot be processed by the script: ´╗┐

         Warning:  Cannot modify header information - headers already sen 
t by (output started at /usr/local/lib/!wz-sites/own/wz155-2/languages/LanguageR 
u.php:1) in /usr/local/lib/!wz-sites/own/wz155-2/includes/SpecialStatistics.p 
hp on line 58

We have to wait until wikiznanie have fixed this flaw. RobiH 14:52, 16 April 2006 (UTC)

Conservative = Good?

In the text of the article, the term 'conservative' is used several times, but the table has the column titled 'Good'. Are these the same thing? Should the text be updated to 'good' instead of 'conservative' to match, or should some explanation be added that they are the same.

Thanks,

- Tdoyle 16:39, 24 April 2006 (UTC)

Yes,

Those terms are exactly comparable. Good is a term that is easier to understand to the lay-user, where folks who are more familiar with wiki would understand a 'conservative' article count. I don't see the need to clarify further. (This discussion between you and I will remain on the talk page for all to read.

--Jamie 24 April 2006

Additional 'Conservative' Questions

The article states:

The conservative number for the Wikipedia excludes redirects, discussion pages, image description pages, user profile pages, templates, help pages, portals, articles without links to other articles, and pages about Wikipedia.
  1. Although it's an older way of counting 'good' pages, there is an option $wgUseCommaCount which can be set to count pages which include a comma rather than including links.
  2. It is my understanding that pages in user-defined namespaces are not included in the count of good articles. Is this true, and if so, why? Some wikis have chosen to segregate their content into different namespaces in order to allow focused search capabilities. I have been told I can modify code to include these other namespaces. If I do this, is this going to be seen as 'cheating' to increase the ranking of a wiki on this page?

Thanks!

- Tdoyle 16:54, 24 April 2006 (UTC)


Tdoyle,
  1. I think the algorithm for calculating 'good' articles is a little more complex now than just links...
    So I wouldn't worry about that too much.
  2. I am fairly certain that MediaWiki can be set to count 'good' articles in custom namespaces as well. A few tweaks in the localsettings.php file ought to handle it. No such thing as 'cheating' per se, but you just don't want to include 'project' pages as part of your count. They are about your project, not about your topic. Only pages about your topic should be counted, IMHO.
Hope that helps!
Jamie 24 April 2006

Largest wiki's seem to be faking their conervative count

Usually the conservative count is many times smaller than the total count, but the largest wiki's in the list like qweki have almost the same value for both which doesn't seem right. --Nad 20:31, 1 March 2007 (UTC)

Wikitravel stats

Just two minor quibbles, but it's Wikitravel with a small T, and the stats for "WikiTravel" represent only the pages on the English version of the site. Total article count is over 14,000 as I write this, but unfortunately there's no easy way to get this aggregate figure. Jpatokal 10:00, 30 April 2006 (UTC)

Technically, the language versions of wikitravel are separate wikis and are to be listed as those. 12:58, 30 April 2006 (UTC)

Quantity / Quality

I just wanted to make a small remark: the rank assigned to a wikisite is based on the number of items, which gives an undue value to the stubs. This is true on the sites full of stubs as well as on the largest sites (for instance, the content of English Wiktionary is bigger that the French Wiktionary).

Do you think that it could be a good idea to sort the site by database sites? Laurent Bouvier 10:02, 14 May 2006 (UTC)

This is why the concept ot conservative article count ("Good") was invented. This list is sorted in decreasing order of valid articles, outruling stubs. RobiH 06:25, 16 May 2006 (UTC)

software

hello, is the source of the bot open?

Sure, see User:Mutante/Wikistats RobiH 08:04, 30 May 2006 (UTC)

Why have a lower limit now?

More specifically, for MediaWiki wikis, within reason, we could update the list to include all of the 'missing wikis' (see the top of this page). Since it is a fully automated cronjob, no additional work will be required after initial setup. It would certainly save myself some work in updating them every few days.

I might suggest though, a 6 hour update period is nice and everything, but I think a 12 or 24 hour update, might be more practical (6 hours is overkill).

No one within reason, needs that data more than once per day. :)

Cheers,

--192.26.212.72 18:23, 8 June 2006 (UTC)

The size limit can only be lowered if it can be made sure that all wikis within the certain new size range have been found. I advocate against extending the statistics into an incompletely explored size range. RobiH 15:46, 11 June 2006 (UTC)
I completely agree. I am unavailable to do this for the next few days, but as soon as my availability returns next week, I will do a massive hunt. Hopefully, I would like to discover all (MediaWiki especially) wikis to the 4,000 article limit. That should add about 20 - 30 more to our growing list. I will post here again when I have completed the search. --Jamie 19:11, 15 June 2006 (UTC)

Jamie, some grounds for your hunt: RobiH 17:48, 17 June 2006 (UTC)

Sources of Mediawiki links:

  1. http://www.mediawiki.org/wiki/Sites_using_MediaWiki
  2. http://www.wikiindex.org/Category:MediaWiki
  3. http://meta.wikimedia.org/w/index.php?title=Sites_using_MediaWiki&oldid=328217
  4. http://www.gratis-wiki.com/

Sources of general wiki links (Mediawikis amongst them):

  1. http://list-of-wikis.brainsip.com/
  2. http://www.wikiindex.org/Category:Wiki_Engine

crossed wires.

Currently, sorting by edits descending actually displays total pages descending. Total pages descending shows edits descending. Error? Splarka 05:13, 10 June 2006 (UTC)

Thanks for the bug report. Fixed. Mutante 20:20, 12 June 2006 (UTC)

WikiHow

WikiHow, currently listed in the "List of largest other wikis" section, is actually MW 1.6.7, so it should be able to be added to the main listing, correct? Who can do that?

Done. RobiH 21:53, 30 July 2006 (UTC)

Special:ListOfLargestMediaWikiWikis

Seeing as the s23 website seems permanantly down (up again) (and even if it wasn't), why don't we just create a Special page:

  • Special:ListOfLargestMediaWikiWikis

which would gather the information real-time whenever anybody wanted it? It could eventually be made really clever and save the values most recently fetched and only update them when out-of-date (e.g. every 24 hours).

This page could be kept on the wikimedia server and therefore be more under Wikimedia control.

There is no reason why fancier functionality (like the sortable lists on s23) shouldn't be linked to as present.

Mediashrek 09:32, 5 August 2006 (UTC)

1. s23.org is up again 2. Feel free to create it. Be bold. RobiH 12:03, 5 August 2006 (UTC)

Regarding (2), I was afraid you would say that! But I might just do it one of these days... :-) Mediashrek 13:57, 5 August 2006 (UTC)

We are back up (S23) Mutante 17:10, 6 August 2006 (UTC)

Coalesced version of "biggest wikis" table

It would be quite nice to ALSO have a coalesced version of the "biggest wikis" table. What I mean by this is a table where, for example, all of the "Wikipedia" language versions are grouped as one, all of the "Wiktionary" versions are grouped as one etc. I certainly don't want to replace the existing table, but a seperate table showing the websites grouped as real websites would be very interesting to see the wood from the trees. Mediashrek 13:33, 12 August 2006 (UTC)

Go to http://s23.org/wikistats/ and click onto the HTML Versions of the table. At the bottom of each table, you are to find a box with grand totals. There you are. RobiH 21:05, 12 August 2006 (UTC)

Yup, that's some of the raw data that I want, but it's scattered over several pages, and so is hardly the table I am describing. I will try and give you an example of what the table would look like:

OrderWebsiteGoodBadEditorsetc.etc.
1Wikipedia 4 877 486
2Wiktionary1 079 426
3etc....
4etc....
5etc....
6etc....
7etc....

Mediashrek 21:40, 12 August 2006 (UTC)

'Coalesced' version / "grand grand" totals

No. Name Good Total Edits Admins Users Images
1 wikipedias 4946720 13854127 158436743 2881 3480872 1068805
2 mediawikis 1236607 1923245 4523257 737 149264 85769
3 wiktionaries 1100264 2630779 6364401 340 42508 1722
4 wikia 269627 1810533 4074248 2449 26411469 127639
5 wikisources 125165 266381 1039776 180 16943 24762
6 uncyclomedia 61836 283453 2088945 185 212819 65257
7 wikiquotes 47912 195817 854075 159 24882 725
8 wikibooks 45978 267910 1152935 200 57235 18560
9 wikinews 24917 123646 764900 145 16351 2179
10 wikitravel 18051 52155 490276 52 10776 8884
11 editthis 12996 2662747 175696 4324 7636 0
12 anarchopedias 2954 40770 46214 54 1252 55


What I would like to see is the sites by domain name grouped - so for example all the *.chainki.org, all the *.wikipedia.org sites etc. This table above is close for some (e.g. wikipedia) but for example all the MediaWiki sites are just lumped together. The question is really "what are really the biggest wikis?". MediaWiki is not a wiki but a type of software (and in which case Wikipedia and others should be in these figures). Please can we have a real biggest wikis table? 90.4.13.53 13:04, 27 August 2006 (UTC)

Grand Total

Articles Total Edits Admins Users Images
7 893 027 24 111 563 180 011 466 11 706 30 432 007 1 404 357

wiki syntax pasted from [1]

What of Memory Alpha?

[2]? It gives a conservative of 20 000, which should at least place it in the top 100. 24.76.102.248 07:28, 30 August 2006 (UTC)

It's there, at no. 66. --194.73.130.132 13:27, 30 August 2006 (UTC)
I clearly suck at ctrl-f. Hehe. 24.76.102.248 04:09, 31 August 2006 (UTC)

baidu baike

why there not said "baidu baike"? PenJou 16:23, 30 August 2006 (UTC)

baidu-baike, can i put it in the list? PenJou 16:25, 30 August 2006 (UTC)
Yes, feel free to add it to the "missing" list above. – Minh Nguyễn (talk, contribs) 01:39, 31 August 2006 (UTC)
Come to think of it, is Baidu Baike really a wiki? According to "Baidu Baike" at the English Wikipedia, the company behind it has distanced itself from the term "wiki". – Minh Nguyễn (talk, contribs) 01:42, 31 August 2006 (UTC)

CookbookWiki

It appears that the CookbookWiki (listed in the hand edit section) is now MW 1.7.1 based and can be moved to the auto-generated section.

Done 213.23.132.42 16:07, 7 September 2006 (UTC)

How did Vietnamese wiktionary got that big?

Reading the list, I was somewhat shocked by the fact that Vietnamese wiktionary has currently 208365 articles. That's second biggest among all the wiktionary projects, and not far below the English one(296399)!

However, Vietnamese wikipedia has only 11295 articles, so it doesn't seem like all those people are crazy about wikis. Then is it because of some particular property of Vietnamese language? --Acepectif 03:43, 27 October 2006 (UTC)

They used bot to create articles. So the question is whether source for this was free. MaxSem 05:54, 1 November 2006 (UTC)
Wiktionary:vi:User:PiedBot in fact. --Dangherous 10:52, 10 November 2006 (UTC)
Somewhat situation applies to Chinese Wiktionary. Bare facts are not copyrightable, but copying very large amounts of contents and layouts from a copyrighted dictionary is obvious copyvio. Chinese Wikipedia and Wiktionary have about 112000 articles each and Chinese Wiktionary has grown very fast.--Jusjih 16:36, 17 February 2007 (UTC)

Stadtwiki Karlsruhe

Hi, we upgraded the city wiki of Karlsruhe/Germany from 1.3.x to 1.8.2 .. please add http://ka.stadtwiki.net/ http://ka.stadtwiki.net/Spezial:Statistics?action=raw to the automatic table. Thanx --Kawana 17:15, 7 November 2006 (UTC)

Done RobiH 14:08, 11 November 2006 (UTC)

"Bot" Wikis

Should computer generated Wikis, such as RichDex, that exist primarily as wiki-spam (ie, a large number of computer generated pages, each consisting of a single hyperlink and dozens of Google ads) be allowed to be listed in the same category as "real" user generated wikis? Although technically it is a large wiki, it really isn't in the same category as Wikipedia, etc.

This is why Wikipedias are marked white and other Wikis are marked green. RobiH 21:53, 10 December 2006 (UTC)

I do not think they should be included. I would suggest that a wiki must have a community of users to be included. --Pmsyyz 17:19, 13 January 2007 (UTC)
If we want to be exact about "list of largest wikis" then we should not look at their content at all and list them for the sheer fact of their existence. I agree though that there is a difference between a real community and a generated site and that this should be marked somehow and it is via the color, as robih mentioned above. We could make that marker stonger and more notable somehow though. And theoretically you could also have "List of computer generated wikis" and "List of real community wikis". But: How do you want to measure how much of a wiki has been created by real users and how much by scripts? And how do you want to draw a line? If you start with removing one.. the discussion about others will follow. Seems a pandoras box to open to me somehow.. Mutante 23:07, 17 January 2007 (UTC)
I think it comes down to the definition of what a "large" Wiki is. Until now we always simply considered a Wiki large if it had a large "good" (Article) number. But since we agree that sheer number of articles is not a "real" wiki somehow as long as it doesnt also have a certain number of users, maybe we should change our definition of "large wiki". If we take more than just "good" into account when we are sorting the list, f.e. the number of users and admins... hmm Can you think of a better way to sort, that comes closer to our definition of "real wiki" but can still be put into a script? Do we even need to calculate our own "wiki score" ? Mutante 23:17, 17 January 2007 (UTC)

Richdex is now marked dark violet to distinct them from community-driven wikis. RobiH 22:50, 18 January 2007 (UTC)

I don't think Richdex is very interesting, and I hate to see them take up so many lines in the table -- but they are GFDL and I think they let anyone Register and contribute, so I think they should be included. 69.87.193.162 00:12, 27 January 2007 (UTC)

It's interesting to note that even Wikipedia has used bots to beef up article count. For instance, in October 2002, Derek Ramsey used a bot to build Wikipedia articles on all US towns based on US census data.[3] The articles looked like this. 1,000's of articles were created. That being said, all the Richdex's only have 1 user and are clearly not community driven, so it is appropriate to give him his own color code! Jonathan Stokes 06:40, 2 March 2007 (UTC)

ValueWiki

ValueWiki now has 54,707 articles (conservative count). What is the process for adding it to this list? Thank you. Jonathan 06:29, 9 December 2006 (UTC)

Done. RobiH 21:47, 10 December 2006 (UTC)

Thank you RobiH. This list is a great resource. Thanks to you and everyone for creating and maintaining it. Jonathan22:31, 10 December 2006 (UTC)