List of Wikipedias by sample of articles

From Meta, a Wikimedia project coordination wiki
This is an archived version of this page, as edited by MarsRover (talk | contribs) at 05:40, 3 July 2008 (updated for July 2nd (article cnt)). It may differ significantly from the current version.

This page contains a list of the largest Wikipedias under the auspices of the Wikimedia Foundation for various languages. Test Wikipedias are listed at the Wikimedia Incubator Wiki project.

This list of Wikipedias is based on the List of articles every Wikipedia should have (total: 1047 on the 2nd of July, 2008) as a sample, but the actual list which is used is at the end of List of Wikipedias by sample of articles/Source code and can be a little different. For every Wikipedia, the articles in this sample list is retrieved (based on interwiki links from the English Wikipedia) and the number of characters is calculated (minus "comments" and the "interwiki" text at the bottom of the article). The size of each article is then adjusted for each language by multiplying it by the language weight. The articles are divided in four classes: "absent" (i.e. non-existing; size = 0), "stubs" (size in characters inferior to 10,000), "articles" (size between 10,000 and 30,000) and "long articles" (size superior to 30,000). The average weighted size of the non-absent articles in the sample is also calculated. Finally, a score is computed, based on the following formula: rawscore = stubs + articles*4 + long.articles*9. In order to have a consistent scale the raw score is normalized by dividing by the maximum score and multiplying by 100. The maximum score would be maxscore = (absent + stubs + articles + long.articles)*9. The final score is the following score = rawscore / maxscore * 100. The language editions are then listed in order of decreasing score.

A copy of the program used to obtain this list is in List of Wikipedias by sample of articles/Source code.

Absent articles for major Wikipedias are in List of Wikipedias by sample of articles/Absent Articles.

See also:


Last Update: 2 July 2008

Wiki Language Weight Average Article
Size (wt.chars)
Absent
(0k)
Stubs
(< 10k)
Articles
(10-30k)
Long Art.
(> 30k)
Score Growth
1 en English 1.0 45 587 0 65 317 665 77.66 +0.70
2 de Deutsch 1.0 34 137 9 178 426 434 61.42 +0.43
3 fr Français 1.0 29 707 7 263 422 355 54.61 +0.84
4 es Español 1.1 27 029 6 290 459 292 50.45 +0.91
5 it Italiano 1.1 23 748 10 330 435 272 47.95 +0.93
6 ru Русский 1.4 23 072 2 385 411 246 45.16 +1.49
7 zh 中文 3.7 24 243 0 410 388 249 44.60 +0.50
8 ja 日本語 1.9 16 465 11 476 416 144 36.46 +0.57
9 pt Português 1.1 15 992 11 506 390 140 35.30 +0.32
10 pl Polski 1.1 14 598 18 575 341 113 31.37 +0.29
11 hu Magyar 1.1 14 830 94 524 308 121 30.19 +0.19
12 fi Suomi 1.1 13 434 26 606 306 109 29.83 +0.69
13 cs Čeština 1.3 13 042 55 573 324 95 28.91 +0.69
14 he עברית 1.2 11 869 35 580 357 74 28.40 +0.41
15 sv Svenska 1.1 11 975 4 664 281 98 28.33 +0.32
16 nl Nederlands 0.9 11 427 19 614 348 66 27.59 +0.37
17 vi Tiếng Việt 1.1 15 318 208 479 241 119 26.68 +0.21
18 no Norsk (Bokmål) 1.2 11 113 23 678 262 84 26.34 +0.30
19 sr Српски / Srpski 1.4 11 932 120 598 240 89 25.03 +0.49
20 uk Українська 1.3 10 429 35 675 272 65 24.92 +0.23
21 tr Türkçe 1.3 11 496 58 650 270 66 24.73 +0.24
22 ca Català 1.1 10 226 0 719 269 59 24.68 +0.08
23 hr Hrvatski 1.3 9 597 104 666 220 57 21.85 +0.61
24 sk Slovenčina 1.3 10 448 158 616 213 60 21.31 +0.26
25 ro Română 1.1 9 802 152 640 197 58 20.69 +0.39
26 da Dansk 1.2 8 383 55 782 160 50 19.87 +0.29
27 ko 한국어 2.5 7 842 88 748 166 45 19.28 +0.37
28 el Ελληνικά 1.1 10 130 261 540 189 57 19.20 +0.30
29 bg Български 1.1 8 173 125 693 192 37 19.04 +0.22
30 id Bahasa Indonesia 1.0* 6 487 65 799 154 29 17.79 +0.43
31 ar العربية 1.0 5 353 1 879 147 19 17.40 +0.52
32 gl Galego 1.0* 8 155 200 644 179 24 16.73 +0.24
33 sl Slovenščina 1.2 6 975 101 769 159 18 16.63 +0.15
34 eo Esperanto 1.1 5 758 2 922 99 24 16.28 +0.24
35 fa فارسی 1.2 6 434 189 699 130 29 15.71 +0.55
36 th ไทย 1.0 6 338 160 733 130 24 15.59 +0.23
37 ms Bahasa Melayu 1.0* 8 160 353 531 132 31 14.20 +0.20
38 lt Lietuvių 1.0* 5 307 122 805 112 8 14.06 +0.24
39 simple Simple English 1.0* 3 678 0 974 67 6 13.75 +0.13
40 is Íslenska 1.0* 2 847 23 970 49 5 12.85 +0.10
41 nn Nynorsk 1.2** 5 382 252 695 85 15 12.42 +0.11
42 sh Srpskohrvatski / Српскохрватски 1.0* 6 711 363 555 111 18 12.32 +0.17
43 et Eesti 1.0* 4 409 189 780 64 12 12.16 +0.19
44 bs Bosanski 1.0* 4 790 222 727 94 3 12.00 +0.55
45 la Latina 1.1 3 917 215 765 55 12 11.60 +0.17
46 lv Latviešu 1.0* 5 524 341 601 94 11 11.42 +0.57
47 eu Euskara 1.0* 4 293 255 716 68 8 11.25 +0.09
48 af Afrikaans 1.0* 8 743 541 386 90 30 10.78 +0.51
49 mk Македонски 1.0* 4 955 399 575 60 12 9.80 +0.12
50 cy Cymraeg 1.0* 2 358 226 803 15 3 9.44 +1.34
51 ka ქართული 1.0* 3 944 409 592 38 8 8.66 +0.21
52 br Brezhoneg 1.0* 4 535 471 522 45 9 8.31 +0.15
53 ps پښتو 1.0* 31 336 916 29 35 67 8.19 +0.07
54 bn বাংলা 1.0* 3 927 494 502 40 10 7.99 +0.07
55 ta தமிழ் 0.9 3 524 441 572 30 3 7.64 +0.24
56 zh-yue 粵語 3.7** 6 644 613 367 52 15 7.53 +0.40
57 ml മലയാളം 1.0* 5 117 560 430 52 5 7.25 +0.57
58 hi हिन्दी 1.0* 2 845 492 524 25 5 7.11 +0.53
59 lb Lëtzebuergesch 1.0* 4 363 556 447 36 8 7.04 +0.23
60 sq Shqip 1.0* 4 399 565 433 43 6 6.99 +0.22
61 qu Runa Simi 1.0* 2 635 489 526 32 0 6.94 +0.11
62 yi ייִדיש 1.0* 3 181 538 478 22 7 6.69 +0.01
63 ga Gaeilge 1.0* 4 041 578 433 27 9 6.60 +0.08
64 scn Sicilianu 1.0* 2 523 500 526 20 1 6.53 +0.07
65 oc Occitan 1.0* 3 591 557 453 33 3 6.50 +0.12
66 sw Kiswahili 1.0* 3 067 479 556 11 1 6.46 +0.14
67 ast Asturianu 1.0* 3 705 551 464 29 2 6.35 +0.01
68 nds Plattdüütsch 1.0* 4 708 674 332 27 14 6.01 +0.17
69 tl Tagalog 1.0* 3 381 607 407 29 4 5.93 +0.23
70 be-x-old Беларуская (тарашкевіца) 1.0* 4 266 619 386 42 0 5.88 +0.28
71 io Ido 1.0* 1 699 513 530 1 1 5.77 -0.01
72 ur اردو 1.0* 4 087 663 344 33 6 5.63 +0.14
73 zh-min-nan Bân-lâm-gú 1.2 1 984 545 496 6 0 5.52 +0.04
74 su Basa Sunda 1.0* 9 670 836 141 57 12 5.07 +0.09
75 az Azərbaycan 1.0* 2 457 627 406 11 2 4.97 +0.21
76 te తెలుగు 1.0* 6 927 788 205 45 9 4.95 +0.99
77 an Aragonés 1.0* 2 677 640 395 11 1 4.75 +0.13
78 ku Kurdî / كوردی 1.0* 2 270 639 396 11 0 4.67 +0.03
79 jv Basa Jawa 1.0* 3 399 671 361 14 1 4.52 +0.62
80 ia Interlingua 1.0* 2 760 690 343 13 1 4.29 +0.02
81 als Alemannisch 1.0* 7 653 825 174 41 7 4.26 +0.22
82 mn Монгол 1.0* 4 327 740 284 18 5 4.26 +0.51
83 mr मराठी 1.0* 2 768 721 313 8 5 4.14 +0.10
84 be Беларуская 1.0* 3 632 735 292 18 2 4.05 +0.40
85 fy Frysk 1.0* 3 956 741 287 18 1 3.91 +0.04
86 gd Gàidhlig 1.0* 2 244 707 332 8 0 3.86 +0.04
87 tg Тоҷикӣ 1.0* 2 076 721 318 6 2 3.82 +0.05
88 vec Vèneto 1.0* 3 382 775 254 16 2 3.57 +0.09
89 li Limburgs 1.0* 3 769 779 251 16 1 3.44 +0.15
90 uz O‘zbek 1.0* 2 661 779 256 10 2 3.33 +0.07
91 kn ಕನ್ನಡ 1.0* 5 095 848 176 15 8 3.27 +0.07
92 vo Volapük 1.0* 1 680 759 285 1 2 3.26 +0.04
93 bat-smg Žemaitėška 1.0* 1 526 760 285 2 0 3.11 +0.07
94 cv Чăваш 1.0* 2 657 791 249 7 0 2.94 +0.04
95 nah Nāhuatl 1.0* 2 318 804 236 4 2 2.87 +0.18
96 hy Հայերեն 1.2 2 582 814 223 10 0 2.79 +0.12
97 ht Krèyol ayisyen 1.0* 1 399 805 239 2 1 2.72 +0.05
98 fur Furlan 1.0* 2 903 828 212 7 0 2.55 +0.03
99 fo Føroyskt 1.0* 2 882 841 197 8 1 2.53 +0.25
100 mt Malti 1.0* 5 703 903 122 17 5 2.49 +0.12
101 nrm Nouormand/Normaund 1.0* 2 026 816 231 0 0 2.45 +0.01
102 kk Қазақша 1.0* 4 558 858 176 11 1 2.43 +0.18
103 pms Piemontèis 1.0* 3 352 861 177 6 3 2.42 +0.07
104 sco Scots 1.0* 1 956 828 218 1 0 2.36 +0.02
105 pam Kapampangan 1.0* 5 980 903 124 18 1 2.18 +0.10
106 zh-classical 古文 / 文言文 3.7** 5 380 894 139 13 1 2.12 +0.06
107 bar Boarisch 1.0* 4 897 900 132 14 1 2.09 +0.03
108 nov Novial 1.0* 1 613 868 175 4 0 2.03 +0.01
109 ceb Sinugboanong Binisaya 0.8 2 521 885 155 6 1 2.00 +0.17
110 wa Walon 1.0* 2 387 871 172 4 0 2.00 +0.07
111 dv ދިވެހިބަސް 1.0* 3 433 914 120 11 1 1.84 -0.04
112 jbo Lojban 1.0* 1 241 874 173 0 0 1.84 +0.10
113 lij Líguru 1.0* 1 509 877 169 1 0 1.84 +0.09
114 sa संस्कृतम् 1.0* 1 200 881 164 2 0 1.83 +0.05
115 wuu 吴语 3.7** 10 809 961 70 10 6 1.74 +0.24
116 nds-nl Nedersaksisch 1.0* 2 942 888 158 1 0 1.72 +0.03
117 ln Lingala 1.0* 1 747 898 145 4 0 1.71 +0.01
118 gv Gaelg 1.0* 2 170 893 152 2 0 1.70 +0.27
119 diq Zazaki 1.0* 2 698 894 152 1 0 1.66 +0.04
120 am አማርኛ 1.0* 1 252 898 149 0 0 1.58 +0.07
121 ne नेपाली 1.0* 3 854 931 109 5 2 1.56 +0.00
122 kw Kernewek/Karnuack 1.0* 2 057 903 144 0 0 1.53 +0.00
123 frp Arpitan 1.0* 1 475 905 142 0 0 1.51 +0.02
124 new नेपाल भाषा 1.0* 4 385 946 89 11 1 1.51 +0.05
125 ksh Ripoarisch 1.0* 2 455 922 122 2 1 1.48 +0.11
126 rm Rumantsch 1.0* 3 579 946 92 7 2 1.46 +0.16
127 os Иронау 1.0* 1 959 913 133 1 0 1.45 +0.00
128 gan 贛語 3.7** 3 599 924 120 3 0 1.40 +1.30
129 ang Englisc 1.0* 1 825 922 123 2 0 1.39 +0.01
130 si සිංහල 1.0* 17 895 1 010 24 3 10 1.34 +0.52
131 hsb Hornjoserbsce 1.0* 2 395 929 116 2 0 1.32 +0.01
132 ilo Ilokano 1.0* 2 814 935 111 1 0 1.22 +0.01
133 lad Dzhudezmo 1.0* 3 357 947 95 5 0 1.22 +0.06
134 tpi Tok Pisin 1.0* 4 639 969 69 7 2 1.22 +0.01
135 yo Yorùbá 1.0* 2 084 933 114 0 0 1.21 +0.09
136 se Sámegiella 1.0* 1 233 935 112 0 0 1.19 +0.01
137 bpy ইমার ঠার/বিষ্ণুপ্রিয়া মণিপুরী 1.0* 4 790 954 89 1 2 1.18 +0.02
138 arc ܐܪܡܝܐ 1.0* 1 184 937 110 0 0 1.17 +0.03
139 wo Wolof 1.0* 2 590 954 89 3 1 1.17 +0.02
140 gu ગુજરાતી 1.0* 3 017 959 81 7 0 1.16 +0.03
141 ay Aymar 1.0* 820 939 108 0 0 1.15 +0.01
142 lmo Lumbaart 1.0* 3 611 966 73 8 0 1.11 +0.10
143 vls West-Vlams 1.0* 3 756 959 83 3 1 1.10 +0.07
144 cbk-zam Chavacano de Zamboanga 1.0* 18 936 1 014 20 7 6 1.08 +0.00
145 gn Avañe'ẽ 1.0* 1 058 950 96 1 0 1.06 +0.46
146 ie Interlingue 1.0* 2 262 968 76 2 1 0.99 +0.01
147 co Corsu 1.0* 1 905 958 88 1 0 0.98 +0.04
148 crh Qırımtatarca 1.0* 1 604 959 88 0 0 0.93 +0.02
149 fiu-vro Võro 1.0* 1 227 959 88 0 0 0.93 +0.00
150 so Soomaaliga 1.0* 3 884 981 62 2 2 0.93 +0.00
151 nap Nnapulitano 1.0* 2 244 968 78 1 0 0.87 +0.03
152 sc Sardu 1.0* 2 098 975 69 3 0 0.86 +0.04
153 war Winaray 1.0* 1 567 968 79 0 0 0.84 +0.00
154 ky Кыргызча 1.0* 1 135 969 78 0 0 0.83 +0.04
155 kab Taqbaylit 1.0* 2 375 980 64 3 0 0.81 +0.02
156 csb Kaszëbsczi 1.0* 1 666 973 74 0 0 0.79 +0.01
157 ba Башҡорт 1.0* 1 948 976 69 1 0 0.78 +0.09
158 lo ລາວ 1.0* 1 588 974 73 0 0 0.77 +0.00
159 pdc Deitsch 1.0* 1 432 974 73 0 0 0.77 +0.00
160 tt Tatarça / Татарча 1.0* 2 490 983 61 3 0 0.77 -0.01
161 eml Emiliàn e rumagnòl 1.0* 2 594 986 60 0 1 0.73 +0.01
162 pag Pangasinan 1.0* 2 217 987 57 3 0 0.73 +0.02
163 iu ᐃᓄᒃᑎᑐᑦ 1.0* 826 980 67 0 0 0.71 +0.01
164 kaa Qaraqalpaq tili 1.0* 2 816 985 61 1 0 0.69 +0.20
165 bcl Bikol 1.0* 1 723 983 64 0 0 0.68 +0.00
166 sd سنڌي، سندھی ، सिन्ध 1.0* 17 337 1 031 7 5 4 0.67 +0.00
167 mg Malagasy 1.0* 1 303 990 57 0 0 0.60 +0.03
168 tk تركمن / Туркмен 1.0* 1 658 990 57 0 0 0.60 +0.00
169 km ភាសាខ្មែរ 1.0* 2 673 997 48 2 0 0.59 +0.00
170 cu Словѣньскъ 1.0* 1 418 992 55 0 0 0.58 +0.04
171 pa ਪੰਜਾਬੀ 1.0* 5 436 1 015 25 7 0 0.56 +0.03
172 zea Zeêuws 1.0* 3 656 1 002 44 1 0 0.51 +0.02
173 kg KiKongo 1.0* 1 325 1 000 47 0 0 0.50 +0.01
174 mi Māori 1.0* 2 670 1 006 39 2 0 0.50 +0.00
175 rmy romani - रोमानी 1.0* 1 718 1 000 47 0 0 0.50 +0.00
176 na dorerin Naoero 1.0* 1 901 1 007 38 2 0 0.49 +0.08
177 tet Tetun 1.0* 2 797 1 001 46 0 0 0.49 +0.00
178 map-bms Basa Banyumasan 1.0* 1 525 1 002 45 0 0 0.48 +0.01
179 haw Hawai`i 1.0* 1 171 1 006 41 0 0 0.44 +0.12
180 ig Igbo 1.0* 3 151 1 018 28 0 1 0.39 +0.01
181 udm Удмурт кыл 1.0* 3 091 1 010 37 0 0 0.39 +0.00
182 roa-rup Armãneashce 1.0* 1 887 1 011 36 0 0 0.38 +0.00
183 stq Seeltersk 1.0* 2 607 1 015 31 1 0 0.37 +0.10
184 ks कश्मीरी / كشميري 1.0* 3 194 1 018 26 2 0 0.36 +0.00
185 mzn مَزِروني 1.0* 1 342 1 019 28 0 0 0.30 +0.01
186 ty Reo Mā`ohi 1.0* 1 224 1 020 27 0 0 0.29 +0.00
187 ce Нохчийн 1.0* 2 052 1 021 26 0 0 0.28 +0.02
188 pap Papiamentu 1.0* 1 960 1 028 18 1 0 0.23 +0.01
189 to faka Tonga 1.0* 1 704 1 026 21 0 0 0.22 +0.00
190 roa-tara Tarandíne 1.0* 5 452 1 038 7 2 0 0.16 +0.00
191 or ଓଡ଼ିଆ 1.0* 884 1 038 9 0 0 0.10 +0.00
192 pi पाऴि 1.0* 3 557 1 042 4 1 0 0.08 +0.00
193 bh भोजपुरी 1.0* 1 774 1 042 5 0 0 0.05 +0.00
194 glk گیلکی 1.0* 868 1 042 5 0 0 0.05 +0.00
  • weights with "*" have no weight available so using default weight of 1.0
  • weights with "**" use the weight of the known related language (ex. 'zh')