Research:Revision scoring as a service/Word lists/sw

From Meta, a Wikimedia project coordination wiki
Jump to navigation Jump to search


ISO code Language Generated list Badwords Informal words Stopwords Dictionary Stemmer Contact person Wiki labels Interface Forms Campaign Needs
sw Kiswahili (Wikipedia) 250 - - - - - See: Word lists requested no no no -
Generated list [1]

Words in the generated list commonly appear in reverted revisions but not in others. This list is generated using a TF-IDF approach.

  1. '
  2. sp
  3. spider's
  4. africa
  5. exterminatus
  6. mpira
  7. shrewsbury
  8. able
  9. africa
  10. after
  11. aisilandi
  12. aliens
  13. all
  14. alt
  15. amg
  16. and
  17. anything
  18. are
  19. arraabish
  20. arrives
  21. ate
  22. awards
  23. bacon
  24. ball
  25. based
  26. because
  27. become
  28. becomes
  29. becoming
  30. been
  31. being
  32. black
  33. breaks
  34. broken
  35. budget
  36. but
  37. can
  38. career
  39. charlotte
  40. charlotte's
  41. chigs
  42. claimed
  43. coise
  44. comes
  45. congo
  46. country
  47. creative
  48. currently
  49. cykranosh
  50. daughter
  51. denmaki
  52. did
  53. didn't
  54. dirty
  55. disambiguation
  56. disney
  57. dog
  58. doing
  59. don
  60. drac
  61. droed
  62. dvd
  63. eacu
  64. earth
  65. eats
  66. editing
  67. eternity
  68. even
  69. everyone
  70. face
  71. feelings
  72. film
  73. financial
  74. find
  75. fitbaa
  76. fodbold
  77. following
  78. fotbal
  79. fotbale
  80. foussball
  81. friends
  82. frog
  83. frosch
  84. fterran
  85. fuotbal
  86. fussball
  87. futbolas
  88. futbolli
  89. futbols
  90. g's
  91. gala
  92. genitalia
  93. get
  94. ghost
  95. give
  96. good
  97. grenouille
  98. grodan
  99. hadithi
  100. has
  101. have
  102. he's
  103. health
  104. her
  105. him
  106. his
  107. holds
  108. hollywood
  109. home
  110. homework
  111. huruf
  112. icao
  113. imdb
  114. including
  115. iso
  116. ist
  117. its
  118. jalgpall
  119. jalkapallo
  120. jason
  121. just
  122. kikker
  123. kills
  124. kipala
  125. least
  126. lenga
  127. like
  128. long
  129. look
  130. making
  131. mchezo
  132. miguu
  133. million
  134. mockbuster
  135. mom's
  136. money
  137. moral
  138. more
  139. most
  140. mostly
  141. mother
  142. movie
  143. must
  144. named
  145. nci
  146. needed
  147. neither
  148. niczka
  149. nogomet
  150. nominated
  151. nyingine
  152. nyota
  153. off
  154. olan
  155. once
  156. other
  157. own
  158. owner
  159. pampered
  160. pediludium
  161. performed
  162. pha
  163. phe
  164. pictures
  165. pie
  166. pig
  167. pig's
  168. poster
  169. pot
  170. princesse
  171. principessa
  172. prinses
  173. prinsessa
  174. prinsessan
  175. prinsessen
  176. r's
  177. ranocchio
  178. really
  179. received
  180. reputation
  181. rips
  182. rotten
  183. sammakko
  184. sana
  185. sapo
  186. sauti
  187. says
  188. scheme
  189. she
  190. shprooch
  191. shrewsbury
  192. since
  193. snake
  194. sokker
  195. star
  196. stupid
  197. such
  198. sugar
  199. supported
  200. tale
  201. tale
  202. talk
  203. tanzanian
  204. tells
  205. that
  206. the
  207. then
  208. there
  209. they
  210. this
  211. those
  212. three
  213. time
  214. times
  215. tomatoes
  216. tried
  217. truly
  218. universal
  219. usmc
  220. various
  221. voetbal
  222. voodoo
  223. vootbal
  224. waliotia
  225. walt
  226. wanacheza
  227. wanapingana
  228. wanted
  229. wants
  230. ward
  231. was
  232. what
  233. when
  234. where
  235. which
  236. who
  237. whole
  238. wie
  239. wikipedia
  240. will
  241. with
  242. woman
  243. won
  244. wong
  245. worked
  246. you
  247. your
  248. yoy
  249. yuggoth
  250. zao
Generated common words

Common words appear on all revisions reverted or otherwise. In the English language this would include words like 'the' or 'is' which are meaningless on their own. This list is generated using a TF-IDF approach.

  1. afrika
  2. aina
  3. ambao
  4. ambayo
  5. and
  6. aprili
  7. archive
  8. archivedate
  9. archives
  10. archiveurl
  11. asili
  12. baada
  13. baadaye
  14. bila
  15. caption
  16. category
  17. cha
  18. chini
  19. codes
  20. com
  21. combined
  22. commons
  23. commonscat
  24. council
  25. date
  26. default
  27. defaultsort
  28. desemba
  29. dini
  30. district
  31. dunia
  32. eneo
  33. file
  34. files
  35. final
  36. flag
  37. for
  38. from
  39. general
  40. glottolog
  41. hadi
  42. hai
  43. hali
  44. hapa
  45. hasa
  46. hata
  47. hii
  48. hili
  49. historia
  50. hivyo
  51. hiyo
  52. htm
  53. html
  54. http
  55. huko
  56. huo
  57. huu
  58. iaus
  59. idadi
  60. iko
  61. ili
  62. ilikuwa
  63. image
  64. imtmetis
  65. ina
  66. inahusu
  67. index
  68. infobox
  69. jamii
  70. januari
  71. jiji
  72. jimbo
  73. jina
  74. jio
  75. john
  76. jpg
  77. julai
  78. juni
  79. juu
  80. kabla
  81. kalenda
  82. kama
  83. karibu
  84. kaskazini
  85. kati
  86. katika
  87. kazi
  88. kiingereza
  89. kila
  90. kristo
  91. kubwa
  92. kuhusu
  93. kuna
  94. kusini
  95. kutoka
  96. kutokana
  97. kuu
  98. kuwa
  99. kwa
  100. kwamba
  101. kwanza
  102. kwenye
  103. lake
  104. lakini
  105. languages
  106. languoid
  107. latd
  108. latm
  109. latns
  110. lats
  111. left
  112. leo
  113. link
  114. llmap
  115. longd
  116. longew
  117. longm
  118. longs
  119. lugha
  120. maana
  121. machi
  122. madola
  123. maelezo
  124. maeneo
  125. magharibi
  126. mahali
  127. majimbo
  128. makala
  129. map
  130. mara
  131. march
  132. marejeo
  133. marekani
  134. mashariki
  135. matumizi
  136. mbalimbali
  137. mbegu
  138. mbili
  139. meac
  140. mei
  141. mengine
  142. miaka
  143. mikoa
  144. mji
  145. mjini
  146. mkuu
  147. mnamo
  148. moja
  149. mpaka
  150. mtakatifu
  151. mtu
  152. muda
  153. mujibu
  154. multitree
  155. mwa
  156. mwaka
  157. name
  158. nav
  159. nchi
  160. nchini
  161. ndani
  162. new
  163. news
  164. nje
  165. novemba
  166. nyingi
  167. nyingine
  168. oktoba
  169. olac
  170. old
  171. ongezea
  172. org
  173. orodha
  174. pamoja
  175. papa
  176. pdf
  177. php
  178. pia
  179. picha
  180. png
  181. population
  182. printing
  183. pushpin
  184. ramani
  185. rasmi
  186. redirect
  187. ref
  188. region
  189. report
  190. resource
  191. right
  192. roma
  193. sababu
  194. sana
  195. sasa
  196. satelite
  197. sehemu
  198. septemba
  199. settlement
  200. sib
  201. siku
  202. sites
  203. statistics
  204. subdivision
  205. svg
  206. tangu
  207. tanzania
  208. tarehe
  209. tazama
  210. tena
  211. the
  212. thumb
  213. title
  214. tofauti
  215. tovuti
  216. tuzo
  217. type
  218. uingereza
  219. ujumla
  220. ulaya
  221. upande
  222. uploads
  223. vile
  224. viungo
  225. vya
  226. wakati
  227. wakazi
  228. wake
  229. wakristo
  230. walio
  231. waliofariki
  232. waliozaliwa
  233. watu
  234. wavuti
  235. web
  236. website
  237. weebly
  238. wengi
  239. wengine
  240. wenye
  241. wilaya
  242. wote
  243. www
  244. yaani
  245. yake
  246. yao
  247. year
  248. yenye
  249. zaidi

Bad words

Bad words are words unwelcome on any page. This would include curse words, spam and other content that would be reverted regardless of where it is inserted.

Needs bad words... Use |list-badwords=

Informal words

Informal words are words unwelcome on article namespace but would be acceptable on talk pages. This would include words such as 'hello' or 'hahaha' which would be fine in discussions but not in articles.

Needs informal words... Use |list-informal=