Research:Revision scoring as a service/Word lists/tr
ISO code | Language | Generated list | Badwords | Informal words | Stopwords | Dictionary | Stemmer | Contact person | Wiki labels | Interface | Forms | Campaign | Needs |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
tr | Türkçe (Wikipedia) | 261 | 173 | - | nltk.stopwords | - | (Zemberek?[1]) | See: Word lists | translated | no | no | almost complete [2] | bad words need to be updated |
Generated list [3] |
---|
Words in the generated list commonly appear in reverted revisions but not in others. This list is generated using a TF-IDF approach.
|
Generated common words |
---|
Common words appear on all revisions reverted or otherwise. In the English language this would include words like 'the' or 'is' which are meaningless on their own. This list is generated using a TF-IDF approach.
|
Bad words |
---|
Bad words are words that would be commonly associated with vandalism. They are generally used to insult or be vulgar. This includes curse words, racial slurs, assertions of- and prejudices against sexual preferences. User talk:とある白い猫/tr-bad words
bok
oğlu it
|
Informal words |
---|
Informal words are words unwelcome on article namespace but would be acceptable on talk pages. This would include words such as 'hello' or 'hahaha' which would be fine in discussions but not in articles. Common pattern is having ch in place of ç and sh in place of ş. Below words are mostly fine on their own had it not been for the substitution of ch/sh in place of ç/ş whicn is plain strange.
|