Research:Revision scoring as a service/Word lists/vi

From Meta, a Wikimedia project coordination wiki


ISO code Language Generated list Badwords Informal words Stopwords Dictionary Stemmer Contact person Wiki labels Interface Forms Campaign Needs
vi Tiếng Việt (Wikipedia) - 21 28 78 custom stop words enchant.Dict - See: Word lists no no no no -
Generated list [1]

Words in the generated list commonly appear in reverted revisions but not in others. This list is generated using a TF-IDF approach.

  1. aacap
  2. aaron
  3. accessdate
  4. aecc
  5. aeridinae
  6. alceste
  7. all
  8. amilyinequality
  9. anact
  10. andreas
  11. angiospermae
  12. annual
  13. apahelp
  14. apahelpcenter
  15. apoapsis
  16. appellees
  17. artli
  18. auguste
  19. auhority
  20. aurotipula
  21. australia
  22. austromolophilus
  23. author
  24. autocon
  25. aversive
  26. avietnamnet
  27. baccman
  28. baillargeon
  29. bailly
  30. baodai
  31. beypa
  32. bgdđt
  33. bisby
  34. bookcase
  35. bourgoin
  36. bscc
  37. bân
  38. calolimnophila
  39. cartodere
  40. catalogue
  41. catalogueo
  42. catalunya
  43. cautela
  44. ceratocheilus
  45. checklist
  46. chudoanhnghiep
  47. chương
  48. col
  49. colaudinhbolinh
  50. congthuonghanoi
  51. cretry
  52. dasymolophilus
  53. ddddee
  54. demolevel
  55. denvuadinhnamdinh
  56. diagnostic
  57. digikam
  58. diotrepha
  59. dlist
  60. donghoalugv
  61. dongtien
  62. donie
  63. dship
  64. dsm
  65. email
  66. emdr
  67. eriocera
  68. euparatropesa
  69. eurekalert
  70. eurhamphidia
  71. eurhipidia
  72. garay
  73. gays
  74. glbtq
  75. global
  76. globi
  77. gmail
  78. goniodineura
  79. goodyerinae
  80. guestbook
  81. gvmdy
  82. habromastix
  83. hatisnature
  84. hddqt
  85. hdqt
  86. healthychildren
  87. hoankiem
  88. htmlstart
  89. icriomastax
  90. imageii
  91. imis
  92. immu
  93. incom
  94. indolimnophila
  95. indomalaya
  96. itis
  97. key
  98. khô
  99. kig
  100. kirk
  101. kteatime
  102. labmeeting
  103. lach
  104. lamingos
  105. laosa
  106. lbgtq
  107. ledaihanh
  108. lederer
  109. lihpao
  110. lindl
  111. lix
  112. lleida
  113. longurio
  114. lurdia
  115. lyriomolophilus
  116. macromastix
  117. magnoliids
  118. match
  119. maurice
  120. mcconaghy
  121. metalibnotes
  122. meyrick
  123. modernising
  124. mongoma
  125. mêlinh
  126. môi
  127. nbabie
  128. nearctic
  129. necydalinae
  130. neolipophleps
  131. nghi
  132. nghiep
  133. nghiên
  134. nghên
  135. nguyenanhhuy
  136. ngư
  137. nhan
  138. nhis
  139. nicolosi
  140. nicolson
  141. niel
  142. norlander
  143. northvale
  144. nouvelle
  145. numbero
  146. okres
  147. olkesson
  148. oncidiinae
  149. org
  150. orrell
  151. outsource
  152. ouvrard
  153. oval
  154. oxyrhi
  155. pagename
  156. paglina
  157. palearctic
  158. papuaphila
  159. paragymnastes
  160. parahexatoma
  161. paralipophleps
  162. paramongoma
  163. paratropesa
  164. parilisia
  165. parormosia
  166. periapsis
  167. ph
  168. phanboichau
  169. phedina
  170. phia
  171. phucthanh
  172. phymatopsis
  173. plantae
  174. promolophilus
  175. psaronius
  176. psiloconopa
  177. psychiatric
  178. pyraloidea
  179. qbot
  180. qis
  181. ramanagaram
  182. ranco
  183. rau
  184. reading
  185. red
  186. redirect
  187. register
  188. registering
  189. reparative
  190. reports
  191. rhampholimnobia
  192. rolimonia
  193. roskov
  194. ruitenbeek
  195. rusbult
  196. rutner
  197. sarantakos
  198. satinover
  199. schltr
  200. search
  201. sensiti
  202. sgdck
  203. sillo
  204. skaitlis
  205. spondylidinae
  206. sretry
  207. statusdescription
  208. subordo
  209. sugbo
  210. suleyman
  211. sullins
  212. sultan
  213. sâu
  214. taish
  215. tanypremna
  216. tanypremnella
  217. taxonreport
  218. thiên
  219. thêm
  220. tinh
  221. todo
  222. transgender
  223. tricholimonia
  224. tuongdaidinhtienhoangde
  225. vanhocnghethuatphutho
  226. vhdt
  227. vietdulieu
  228. vmg
  229. vmgmedia
  230. vnd
  231. vua
  232. vuadinhtienhoang
  233. văn
  234. vươn
  235. xem
  236. xuthanh
  237. year
  238. yllxnmuhbl
  239. ynsky
  240. ôliu
  241. đai
  242. đe
  243. đuôi
  244. đêm
Generated common words

Common words appear on all revisions reverted or otherwise. In the English language this would include words like 'the' or 'is' which are meaningless on their own. This list is generated using a TF-IDF approach.

  1. accessdate
  2. all
  3. amilia
  4. amilie
  5. and
  6. anh
  7. animalia
  8. annual
  9. arachnida
  10. arachnomorpha
  11. araneae
  12. arthropoda
  13. at
  14. aultsort
  15. auteur
  16. author
  17. authority
  18. ban
  19. bao
  20. beelding
  21. beeldingtekst
  22. beginnet
  23. beschreven
  24. binomial
  25. biologie
  26. bot
  27. bronnen
  28. bronvermelding
  29. by
  30. cao
  31. caption
  32. catalogue
  33. categorie
  34. category
  35. charles
  36. checklist
  37. cheers
  38. chelicerata
  39. chi
  40. cho
  41. chu
  42. chân
  43. châu
  44. cite
  45. classis
  46. col
  47. com
  48. commons
  49. commonscat
  50. con
  51. côn
  52. công
  53. danh
  54. date
  55. datum
  56. di
  57. dier
  58. dieren
  59. do
  60. door
  61. dân
  62. dây
  63. eb
  64. een
  65. eergave
  66. eerst
  67. erd
  68. erences
  69. etenschappeli
  70. gallery
  71. genus
  72. geo
  73. geslacht
  74. gi
  75. gia
  76. gian
  77. haak
  78. hai
  79. het
  80. ho
  81. hoa
  82. html
  83. http
  84. huy
  85. ikipedia
  86. ikispecies
  87. image
  88. in
  89. inline
  90. insect
  91. insecta
  92. italic
  93. italictitle
  94. itis
  95. kevers
  96. keversoort
  97. key
  98. kh
  99. khai
  100. khi
  101. khoa
  102. khu
  103. không
  104. kim
  105. komt
  106. list
  107. liên
  108. match
  109. may
  110. miêu
  111. naam
  112. nam
  113. name
  114. nbsp
  115. ng
  116. ngh
  117. nghi
  118. nghiêng
  119. ngo
  120. ngu
  121. ngư
  122. nh
  123. nhan
  124. nhi
  125. nhiên
  126. nhân
  127. như
  128. nhưng
  129. năm
  130. nơi
  131. obox
  132. on
  133. orde
  134. ordo
  135. org
  136. orphan
  137. paul
  138. pg
  139. ph
  140. phong
  141. phylum
  142. phân
  143. png
  144. publisher
  145. px
  146. qua
  147. quan
  148. quy
  149. ra
  150. re
  151. reading
  152. red
  153. regnum
  154. sau
  155. search
  156. sinh
  157. small
  158. soort
  159. species
  160. spinnen
  161. stub
  162. sub
  163. subphylum
  164. sysem
  165. system
  166. taxon
  167. tham
  168. the
  169. theo
  170. thi
  171. thiên
  172. thu
  173. thumb
  174. thân
  175. thêm
  176. thông
  177. thư
  178. tin
  179. tinh
  180. titel
  181. title
  182. tiên
  183. tiêu
  184. trang
  185. tri
  186. trong
  187. trung
  188. truy
  189. trên
  190. trư
  191. tuanut
  192. tuy
  193. tây
  194. tên
  195. uit
  196. unranked
  197. url
  198. van
  199. viên
  200. voor
  201. văn
  202. xem
  203. xu
  204. year
  205. đang
  206. đi
  207. đây
  208. đông
  209. đơn
  210. đư
Bad words

Bad words are words that would be commonly associated with vandalism. They are generally used to insult or be vulgar. This includes curse words, racial slurs, assertions of- and prejudices against sexual preferences.

  1. dick
  2. slut
  3. đ[íị]t
  4. (dz?|gi)âm
  5. [ck]u
  6. ass
  7. bitch
  8. cunt
  9. cứt
  10. fag
  11. fuck.*
  12. gay
  13. ghey
  14. l[ôồ]n
  15. shit
  16. đ[ụù].
  17. đái
  18. đéo
  19. đĩ
  20. ỉa
  21. [ck]ặ[tc]
Informal words

Informal words are words unwelcome on article namespace but would be acceptable on talk pages. This would include words such as 'hello' or 'hahaha' which would be fine in discussions but not in articles.

  1. (he){2,}
  2. (hi)+
  3. bro
  4. bợn
  5. ch[ớứ]
  6. chẳng
  7. fải
  8. khỉ
  9. moron
  10. mày
  11. nghịch
  12. ngu
  13. nguỵ
  14. ngụy
  15. ok
  16. quái
  17. retard
  18. stupid
  19. thôi
  20. thằng
  21. tui
  22. vời
  23. wái?
  24. đừng
  25. ơi