Talk:Small wiki audit

From Meta, a Wikimedia project coordination wiki
Jump to navigation Jump to search

Ideas[edit]

  • We can probably run queries to find which editions are predominately by a single person. It is not necessarily bad, many are and in many cases those are valid speakers of the language, but this can raise some red flags if they are not -N. --Base (talk) 10:46, 27 August 2020 (UTC)Reply[reply]
    @Base: See Small wiki audit/Queries/Prolific article creators. PiRSquared17 (talk) 21:10, 29 August 2020 (UTC)Reply[reply]
  • Sorry, but I think it's ridiculous to think that we can find "auditors" in hundreds of languages. The whole point of the independence of Wikimedia projects subdomains is that it's simply impossible to know and manage hundreds of languages/cultures centrally. Nemo 07:10, 30 August 2020 (UTC)Reply[reply]
    That is one of the reasons I think the report above is very important to set some priorities. For example at first glance lbewiki, rwwiki and abwiki among other draw some attention. Also while I hate to go personal and while I do know some polyglot Wikimedians, Kmoksy's contribution to a bunch of wikis while having -0 indicated on deleted userpages draws attention too. --Base (talk) 20:47, 30 August 2020 (UTC)Reply[reply]
  • Broadly, the health of the Wikipedias rely on "many eyes", both of readers and editors, and readers gradually becoming editors often firstly by correcting misspellings. As we've seen with the prime example, quality erodes when a wiki loses either community. While I am wary of proposing metrics tailored from the sco.wiki problem, what may be useful is a stored view over time which for each wiki covers (a) number of articles, (b1) number of page views, (b2) number of pages with under 10 views, (c) number of active editors, (d) proportion of edits this year that are by a top-3 editor, (e) number of admins who are actively editing. Comparing the trends of a wiki's stats can highlight situations where a Wiki is becoming decreasingly used by its external community, where someone is mass-creating pages of interest to nobody, etc. All metrics are gameable, but something like this could help identify priorities for some form of audit check (though how and by whom is then the more awkward problem). AllyD (talk) 11:15, 1 September 2020 (UTC)Reply[reply]
  • We can also see which Wikis have top editors whose babel tags show low fluency in the language. That should raise alarms immediately, and there appear to be several where that is the case. SecretName101 (talk) 08:18, 15 September 2020 (UTC)Reply[reply]
    That is what the report linked above in response to my commend does basically. --Base (talk) 11:05, 16 September 2020 (UTC)Reply[reply]

Necessity of linguistics background[edit]

This looks like a great idea. If I can be of any assistance, please let me know. I'm a trained linguist. --Janwo (talk) 07:47, 27 August 2020 (UTC)
(Edit: moved my comment here from the top of the page where I feel it doesn't belong. Janwo (talk) 01:24, 2 September 2020 (UTC))Reply[reply]

This looks like a great idea to make sure the whole Scots mess doesn't happen again. I don't think it's necessary, however, to specify linguistic skill for helping. From what I understand, this would be periodic checks on small language wikis to make sure that they actually are written in the correct language, among other things. The best people to have for that would be native speakers and not linguists (linguists are often hear that their job is to know a bunch of languages rather than what they actually do, which is try to scientifically understand language in general.) Zoozaz1 (talk) 20:11, 27 August 2020 (UTC)Reply[reply]

It would be monstrously difficult to maintain an audit team with members who are native speakers of every language that they're supposed to check out. A possibly more realistic approach would be to recruit people with high-priority language backgrounds where available, but with other languages find outside experts (e.g. professors or literary figures who respond in a friendly manner to emails) to double-check a corpus of text randomly pulled from the relevant wiki. So, for example, if the audit team were to be reviewing the Pitkern wiki but had no members with working knowledge of Pitkern, they'd email any local government officials / teachers / writers that they can find, and see if one responds to the query "hey does this article sound normal to you?" RexSueciae (talk) 02:31, 28 August 2020 (UTC)Reply[reply]
Linguists, especially those with training in corpus linguistics, (machine) translation or forensic linguistics, can help determine whether a given text is e.g. an automated translation, a word-by-word translation, or simply fake accent. You don't have to be a native speaker to see patterns and, more tellingly, flaws or inconsistencies. That'S how the fake scots hoax got detected too. It doesn't need a native speaker (just someone who is slightly better at reading grammars than notorious AG), to figure out that e.g. pih:Adoelf_Hitlar contains spelling and grammatical errors (hii, hiis instead of hi, his; use of the fake past tense verb wos or the similarly fake participle difitid).--Janwo (talk) 05:50, 28 August 2020 (UTC)Reply[reply]
I have to disagree with you here. Knowing that something is an incorrect past tense verb requires knowing the language, not what a past tense verb is. Imagine something not written in the Latin script; a linguist, unless they have been trained to know the language, would be completely lost. The Pitcairn and Scots examples are outliers because an English speaker can somewhat understand them, but the vast majority of languages would need people who understand them to correct them. Of course, at the higher level linguists might be useful, but in actually doing the audits you don't want linguists specifically but anyone who speaks the language fluently. Obviously you would not have 300 odd people in every language as a part of the project, but contact the relevant native speakers of any language you are auditing when the time comes for the audit of that particular language. Zoozaz1 (talk) 14:27, 28 August 2020 (UTC)Reply[reply]
I think non-speakers can do some primary evaluation. For example a non-speaker can evaluate that "Гло́кая ку́здра ште́ко будлану́ла бо́кра и курдя́чит бокрёнка" does match Russian inflections. They won't be able to tell that it is actually a nonsense sentence with just 1 existing word, but failing to pass even such test would be a red flag that can set the priority of evaluation of that language version by speakers higher. --Base (talk) 14:52, 28 August 2020 (UTC)Reply[reply]
Linguists certainly can be helpful, and having a linguistics background wouldn't hurt. For a comprehensive audit, though, and more than just a basic test of whether it is in the right script or not, the necessary qualification would be being able to speak the language. How about we move being able to speak the language audited to a new section entitled necessary qualification to audit and change the existing section from ideal to suggested? This would only be for people comprehensively auditing their language rather than people who generally want to help the project. Zoozaz1 (talk) 15:04, 28 August 2020 (UTC)Reply[reply]
Scots and Pitcairnese are not so much outliers as easy targets for speakers of English. ;-) I do not have to know the language to be capable of comparing the text in a given article with the information i gather from a reliable grammar. Of course a native speaker can do that faster, but for many small languages it might not be as easy as you'd like to believe to find someone who actually speaks it. Anyway, I believe that for an effective "wiki hygiene", you'd need both, native speakers and linguists. | The fake Russian above looks unconvincing, Russian does not use the acute accent, at least not outside of introductory teaching materials. --Janwo (talk) 15:05, 28 August 2020 (UTC)Reply[reply]
(Yeah, it doesn't, I was too lazy to remove the stress marks, but in fact the lead section of articles in ruwiki always mentions stress (or should mention) for article subject name.) --Base (talk) 15:12, 28 August 2020 (UTC)Reply[reply]
I think it's certainly possible that linguists can take up a grammar and study the language enough to catch egregious examples like Scots. I just think that it's easier and more comprehensive for native/fluent speakers to do it. To give one example, Scots itself doesn't have a standardized written form and has many dialects. One could easily read a grammar on one of the dialects only to see the wiki in another and mark it is incorrect, whereas a native speaker would likely be able to distinguish them. I also think that generally it just takes more effort and will be less accurate to learn the basic structure of a language than to rely on a native speaker for it. (And I will also say that if we truly can't find any native speaker to check a few articles than that wiki probably doesn't have enough contributers/readers to justify its existence) In many of the smaller language where it is hard to find contributers, I will add, it is unlikely that there will be any (easily) available resources for linguists to study to learn the language. Zoozaz1 (talk) 15:24, 28 August 2020 (UTC)Reply[reply]
We could have linguists review projects first and then decide whether it's worth asking native speakers to review. There are over 200 Wikipedias in the GS wikiset (one operationalization of "small"), to say nothing of the number of sister projects, so I don't know if it's feasible to contact native speakers for all of them. And even if we did contact native speakers for review, we would still need people to communicate with various language communities and to write up reports on linguistic accuracy, which is something that linguists would be well-suited (but perhaps a bit overqualified) for. PiRSquared17 (talk) 21:42, 28 August 2020 (UTC)Reply[reply]
I think that's a very good idea, but I think the bottleneck would be the amount of linguists willing go learn the basics of a language. If we have enough linguists willing to devote enough time, then that would be the best solution. I think it's fairly likely that down the line we might have a shortage of linguists, but I guess we can cross that bridge when we come to it. I could draft up a process proposal later to formalize that process. Zoozaz1 (talk) 22:19, 28 August 2020 (UTC)Reply[reply]
Many of us are working on "exotic" languages or had to learn one in grammar course or fieldwork methods course. Maybe not enough to have a conversation in the given language and its relatives, but enough to find blatant errors in lexicon and grammar. --Janwo (talk) 03:49, 29 August 2020 (UTC) P.S.: We also have access to "informants": experts and speakers of these languages that we consult for our research. --Janwo (talk) 02:22, 1 September 2020 (UTC)Reply[reply]

Relation to langcom[edit]

Language committee, which approves creation of new language versions of Wikimedia projects, basically already is a gathering of linguistically competent people or people who have an access to those. Shan't they be involved into this? --Base (talk) 12:02, 28 August 2020 (UTC)Reply[reply]

Yeah, now that you mention it, it would make sense to have LangCom oversee this process. This would naturally fit into the scope of LangCom. PiRSquared17 (talk) 21:27, 28 August 2020 (UTC)Reply[reply]
Salvete! I have brought it up on our mailing list. --MF-W 17:19, 31 August 2020 (UTC)Reply[reply]

Timeframe[edit]

I think 5-15 years is much too long, especially the "-15" part; that is basically what people cannot understand about the Scots case being undetected for 8 years. It might be a better idea to tie audits to the number of edits, like, just making up a number here, every 500,000 edits or so. --Janwo (talk) 03:36, 29 August 2020 (UTC)Reply[reply]

I agree "5-15 years" is too much. Tying it to the wiki's total edit count is a good idea potentially. It would have the benefit of triggering alarms if someone created a whole bunch of articles (e.g. using some kind of automation). 500,000 edits might be too much, I'd want to look at how many edits are typical on small wikis before deciding on a number. PiRSquared17 (talk) 08:44, 29 August 2020 (UTC)Reply[reply]
Yeah, I purposefully had a long range because I wasn't sure of the specifics. I was thinking it would be closer to a one off thing to catch inaccuracies and we would focus on checking all small wikis before rechecking them, but if we have enough people then it could certainly be tied to articles or be shorter. Zoozaz1 (talk) 13:16, 29 August 2020 (UTC)Reply[reply]
Or maybe tying the audit together with AAR since it happens annually? --Minorax (talk) 13:45, 29 August 2020 (UTC)Reply[reply]
I would not couple it to AAR. AAR is done by stewards annually (or theoretically it can be done every 6 months). While some stewards can happen to help with this either with linguistical expertise, if they posses such, or some general assistance, it has nothing to do with AAR or stewards' work per se. --Base (talk) 20:51, 30 August 2020 (UTC)Reply[reply]
My point was that the audits can be done together with AAR since it has a fixed timeframe, by whoever the auditors are. Another suggestion would be to split the wikis into 2 groups: Audits on active wikis with < number of sysops will take place annually, while audits on wikis with little to no traffic can be done bi-annually. --Minorax (talk) 16:06, 31 August 2020 (UTC)Reply[reply]
I've changed it to a vague "some time" as per this discussion so it can be worked out (some time) later. Zoozaz1 (talk) 02:52, 31 August 2020 (UTC)Reply[reply]
I am thinking about starting a second audit program about wiki governance. There would be some overlap but this second one would focus on problems with governance and require minimal linguistic skills (and would focus more on those with the steward/GS skillset). The steps in this second one could really be done every year across all 900+ wikis, while this current one I assume would take an in depth look on smaller wikis and specifically content. (For current stewards, it is similar to my old page User:Rschen7754/Cleanup on stewardwiki). --Rschen7754 05:45, 16 September 2020 (UTC)Reply[reply]

How to solve problems?[edit]

"The final report, if the auditors encounters problems, should have a recommendation like an rfc on the matter." - As is probably well-known, RFCs to resolve content disputes on other wikis never lead to much. How can this be different? For the current Scots case, drawing attention to the matter seems to have worked for now to attract people who are able to fix the quality. I fear that for most languages that won't be the case. The case will be especially hopeless if it turns out something is a politically loaded problem. --MF-W 17:13, 31 August 2020 (UTC)Reply[reply]

I think one good way to solve issues would be an active auditor. The report page could be a centralized discussion before the final report is submitted, and the auditor could actively reach out supporters of the language for discussion. I've added another step on the main page for that. Zoozaz1 (talk) 18:20, 31 August 2020 (UTC)Reply[reply]
I'm afraid there won't be a one-size-fits-all solution. For some "small" wikis, one might be able to mobilize competent speakers to correct things, for others, it might turn out quite difficult to find people who would volunteer to invest efforts into it. (I'm thinking of Pitcairnese here, for example.)It also depends on the extent of "damage". If it is a few inaccuracies and unidiomatic expressions, that might be something one or two competent speakers can clean up without being overburdened. If it basically needs a complete rewrite (like apparently in Scots), one would need a team. Either way, in my experience it won't be sustainable if we recruit "outside" people who aren't wikipedians already. --Janwo (talk) 02:20, 1 September 2020 (UTC)Reply[reply]
We need to fix RFC and get it to actually do something. Requests for comment/Policy was a start but we need to bring more visibility to the process and get the process to actually produce actionable outcomes. --Rschen7754 05:58, 16 September 2020 (UTC)Reply[reply]
  • Yes, the auditor's recommendations need to be pulled into a decisive RfC which results either in their enactment or a reasoned consensus for another course of action. (See also my note below about what to do if there are too few contributors.) AllyD (talk) 09:04, 16 September 2020 (UTC)Reply[reply]
  • I adjusted the last two steps to suggest publicising each audit report to the communities at the related wikis in the language in question, which I hope is not just adding more process hoops for an auditor. AllyD (talk) 08:30, 16 September 2020 (UTC)Reply[reply]
  • At the point when an audit is published, if the auditor has raised serious concerns about the reliability of a significant proportion of the wiki content (whether as a result from non-fluent human or Bot editing), should a further step be immediate temporary addition of a top notice following the precedent of the "Following recent revelations, Scots Wikipedia is currently reviewing its articles for large scale language inaccuracies." warning? AllyD (talk) 08:43, 16 September 2020 (UTC)Reply[reply]
    I agree, I think that should be automatic after the report comes out and the auditor has identified problems. Zoozaz1 (talk) 13:42, 16 September 2020 (UTC)Reply[reply]
  • For these Small Wikis, it strikes me that the process should anticipate another possible consequence from the post-audit RfC. The audit is a health-check identifying particular conditions which endanger the quality of the given wiki. The RfC seeks consensus on actions, for which input from fluent speakers is crucial. What if the RfC draws minimal interest, what if it becomes clear that a critical mass of fluent editors willing to move things forward is lacking? That would suggest a further RfC: either to freeze the wiki resource - if the remedial action has placed it in a reasonable state - or to delete it altogether. AllyD (talk) 13:06, 16 September 2020 (UTC)Reply[reply]

The problem with mg.wikt[edit]

This page seems to be looking for problems, so here's a known problem with a small wiki. The Malagasy Wiktionary has over six million entries, but gets very few human edits. Almost all of the pages are created by one, frequently active bot run by the only active admin (and editor), Jagwar. These bot entries are misleadingly bad, and probably at least half of them are incorrect; the bot tries to copy definitions from other Wiktionaries and automatically translate them, but usually only picks one word out of the definition, so (to give a recent example) mg:wikt:cirugía plástica (Spanish for "plastic surgery") is just translated as "surgery" in Malagasy. There are also many thousands of definitionless Malagasy entries, whose definitions were removed due to massive copyright violation that Jagwar did many years ago. It has proven impossible to convince Jagwar that his bot is doing much more harm than good by making mg.wikt completely unusable, but as he is the only active editor, there is no community that can reign him in. I expect this kind of problem is much more common than the sco.wiki disaster, but it isn't obvious how to deal with it. What do you all think? Please ping me in any responses. Metaknowledge (talk) 06:27, 6 September 2020 (UTC)Reply[reply]

Metaknowledge, that certainly seems problematic and something for the small wiki audit to look at. I'll also note that this discussion seems suited to this page where we are discussing other wikis affected by a (somewhat) similar problem as Scots. Zoozaz1 (talk) 14:57, 7 September 2020 (UTC)Reply[reply]
Zoozaz1, it's not in need of an audit so much as a solution. I don't know how to move forward on this, but if you all don't either, then the small wiki audit will never accomplish anything. In short: what should we do? Metaknowledge (talk) 02:04, 8 September 2020 (UTC)Reply[reply]
Metaknowledge, I don't disagree, but before we can find a solution we need to know what to solve, and the only way we can do that is through some sort of audit. From your comments, I don't doubt that there is a problem; I only doubt that we know the full extent of it. What I think we should do is have a comprehensive audit (possibly done by you, if interested), gather native speakers of the language to grasp how bad the problem is, and then have solutions proposed (as is pretty much detailed in the main page). The problem with relying on user-reported errors and immediately jumping to a solution is that we can't just take over an independent wiki without knowing 100% that there is a problem. Zoozaz1 (talk) 15:16, 9 September 2020 (UTC)Reply[reply]
Zoozaz1, I'm willing to get a more formal audit together, but that's not the problem, and it should be obvious from the blatant nature of the mistakes that we don't need native speakers in order to assess it. If you don't believe me for some reason, maybe you'll trust PiRSquared17, who says they already knew about it. My point is that I don't want to go through the trouble of making a thorough report just to find out that there's no mechanism here for mass-deleting affected entries and preventing Jagwar from running his bot in this manner. This gets to the heart of my question about the small wiki audit: what exactly are you able and willing to do to fix things? Metaknowledge (talk) 16:40, 9 September 2020 (UTC)Reply[reply]
I was aware that the bot existed and had created the vast majority of pages on mg.wikt, but I can't corroborate any claims about the quality of these entries. I'm sure some of them are erroneous, but I can't say how frequent or serious the errors are, because I haven't bothered to check. PiRSquared17 (talk) 16:47, 9 September 2020 (UTC)Reply[reply]
Metaknowledge, I presume we would be able to do anything that would be technically possible. If we need to mass rollback a user's edits, then we can do that, or we could start an rfc so a greater amount of editors would be able to decide. Ultimately though, it's up to each situation as there can't really be a one size fits all solution (which is why having a comprehensive audit helps in determining the solution). Zoozaz1 (talk) 17:00, 9 September 2020 (UTC)Reply[reply]
Zoozaz1, I know every situation is unique, which is why I'm talking about this particular situation with mg.wikt. My understanding is that RFCs usually don't get anywhere, so that probably isn't a solution. But I admit that I'm pretty clueless about the wikibureaucracy involved here. Let's suppose that I compile a nice long report, in which I attempt to quantify the problem with plentiful examples, and everyone here is duly convinced that this bot's creations are highly unreliable. Who would be empowered to do a mass rollback? If Jagwar were to keep running his bot, who would be empowered to block it? This is what I mean by mechanism. Metaknowledge (talk) 19:47, 9 September 2020 (UTC)Reply[reply]
Metaknowledge, After you have done your report, you would come up with some recommendations (like a mass rollback) and then you would enlist an administrator or have one appointed to carry it out. The technical aspect shouldn't be that problematic. Zoozaz1 (talk) 20:30, 9 September 2020 (UTC)Reply[reply]
Zoozaz1, who would be appointed? Again, the technical side isn't the problem; it's the bureaucratic side. Metaknowledge (talk) 21:15, 9 September 2020 (UTC)Reply[reply]
Metaknowledge, Willing native speakers, administrators already there, the auditor, langcom members, global stewards, competent users in the wiki; I don't see that as much of a problem. If you look at Scots Wiki, people have already appointed native speakers to be administrators and help clean things up. Zoozaz1 (talk) 21:29, 9 September 2020 (UTC)Reply[reply]
Zoozaz1, I feel like you're not listening. There are no other administrators already there, or other competent users. There is no editing community at all. There is no magical way to engage native speakers to become new editors, and they would not become administrators overnight. If by the "auditor" you mean me, I am only an admin on en.wikt and therefore have no power to do that. We are unlikely to be able to reinvigorate mg.wikt like sco.wiki; what I propose is just to delete the crappy content. Again: who would be appointed to do that? Metaknowledge (talk) 21:44, 9 September 2020 (UTC)Reply[reply]
Metaknowledge, I think you misunderstand how administrators can be appointed. Native speakers very well can become administrators overnight; they are appointed by the community. In fact, administrators already have been appointed overnight in Scots Wiki. We will look at the problem and if necessary the community (not necessarily only the local one) will appoint administrators to fix it. Those administrators could quite literally be anyone. I gave many suggestions above, some of which you seem to take issue with but others that would likely be fine. Zoozaz1 (talk) 21:52, 9 September 2020 (UTC)Reply[reply]
Zoozaz1, I didn't know that (and it doesn't necessarily seem like a good idea to give random newbies the bit!). I'm frustrated because you seem to expect that this will be like sco.wiki, where native speakers will appear to help rebuild. I could become an admin at mg.wikt, if that's what it takes to get rid of bad entries, but I don't actually speak Malagasy, so I wouldn't exactly be an ideal choice. But that just begs the question of who exactly would appoint me (or someone else) to do it. Is there a steward who would be willing to do so? Metaknowledge (talk) 22:02, 9 September 2020 (UTC)Reply[reply]
Metaknowledge, Yes, a steward would appoint someone (possibly you) as administrator. You can see the requests here. There would should be a local discussion (if anyone is active on the wiki), but I'm sure that could be worked out. Here's the procedure for adminship on that wiki. Zoozaz1 (talk) 22:22, 9 September 2020 (UTC)Reply[reply]
Zoozaz1, Why are you sure that could be worked out? Do you think that Jagwar, the only active editor, will take kindly to this attempt to delete his shoddy work? Also, I fixed your link. Metaknowledge (talk) 22:30, 9 September 2020 (UTC)Reply[reply]
Metaknowledge To answer your question, I think that's where the audit itself plays a big role. If you've done this formal thing, shown a clear and systemic problem, and have native speakers agreeing with you (if you can find any) then it's going to be pretty clear to the wider community that someone needs to be appointed admin regardless of the opinion of the editor whose work you've just proved to be unhelpful, to say the least. And thanks for fixing the link. Zoozaz1 (talk) 22:37, 9 September 2020 (UTC)Reply[reply]
It is probably worth pointing out that mass creation of articles with bots has been a contentious issue in the past, particularly with regards to Lsjbot, but I would argue this is a different story. While Lsjbot creates entries that are mostly correct, even if of little value, Jagwar's bot actively creates ones that are blatantly wrong. Granted, there is a note on the pages that it has been "created based on translations on another page" (or something along those lines), but it doesn't mention at all that the entries were created by a bot or have a disclaimer that the entries may be wrong. Indeed, the bot seems to even consider links in longer definitions to be the definition sometimes, regardless of the purpose of the linking of that term (not that the bot could tell; therein lies the rub, after all). — surjection??⟩ 19:56, 9 September 2020 (UTC)Reply[reply]
I can certainly say that (almost) every time I create a Wiktionary entry, Jagwar's bot has made a very sparse copy of it on mg.wikt, probably run through some machine translator, within seconds. AryamanA (talk) 20:21, 9 September 2020 (UTC)Reply[reply]

Expand the Process#1.3 or not?[edit]

To me, although uncommonly, there are also languages that Google Translate didn't support yet, but other machine translation tools support, such as Yandex translate, which they can give (lang_X)<->Mari (not sure if that's Meadow Mari or Hill Mari), and FWIW, the Rosetta Stone, which in my memories provided the Abkhaz<->Russian mt service, and an open source Apertium, which as the link provided, they support Crimean Tatar<->Turkish, so there should really have a small addition to say "... Google Translate and other common machine translation tools, but...".

Anyway, based on this, I would propose to add 1.4: languages that one variant has announced to independent from its original language wiki, therefore we can also monitor the Nynorsk Norwegian and/or Western Armenian --Liuxinyu970226 (talk) 14:48, 9 September 2020 (UTC)Reply[reply]

That makes sense, I would say be bold and make the changes as it's only a draft. Zoozaz1 (talk) 21:30, 9 September 2020 (UTC)Reply[reply]

Malagasy Wiktionary audit published[edit]

@PiRSquared17, Zoozaz1, AllyD, Can I Log In, GZWDer, Base, Nemo bis, SecretName101, Janwo, RexSueciae, MF-Warburg, Minorax, and Liuxinyu970226:
I have completed the first audit associated with this project. I catalogued a wide variety of major problems in the correctness and usability of Malagasy Wiktionary, encompassing millions of entries, and have made my recommendations following that. I am now looking for community input on this, so we can move forward and clean up the mess. Metaknowledge (talk) 02:09, 16 September 2020 (UTC)Reply[reply]

  • This is really excellent work. PiRSquared17 (talk) 02:13, 16 September 2020 (UTC)Reply[reply]
  • I second the above. Well done. Linked to it from the RFC "other wikis" page, hopefully people will see this and chime in. For my part, I'm inclined to follow the "delete it" proposal for this one. RexSueciae (talk) 02:47, 16 September 2020 (UTC)Reply[reply]
  • I agree with everyone else; this is exactly the way a small wiki audit should be done. Pinging @Jagwar: to find out their take on the situation. If half of the entries across the dictionary are really that bad, then I think we can't possibly keep the content. Zoozaz1 (talk) 03:13, 16 September 2020 (UTC)Reply[reply]
  • Thank you for your quick and thorough action! I believe we need to have a look at automated translation bots in general. They cannot create sensible content without human supervision because their output will almost always have errors or be plain useless. Not only in Malagasy. --Janwo (talk) 03:17, 16 September 2020 (UTC)Reply[reply]
  • Metaknowledge, that is an excellent audit report. It looks like an intense piece of work; my first queries are about the effort and how easily the approach can be broadened to audit other resources: roughly how many hours of effort went into this report, and to what extent did it depend on opportune available resources (documents, contact with at least one fluent speaker who can assist)? AllyD (talk) 06:26, 16 September 2020 (UTC)Reply[reply]
    I didn't keep track of the time overall, but the 100 lemma survey alone took me around three hours. Anyway, this kind of approach doesn't need to be broadened to other wikis; the structure is reflective of how a Wiktionary operates, whereas this project seems to have been conceived around Wikipedias. More importantly, we were already aware of the problem and merely needed to characterise it and figure out its extent; we weren't discovering new, unknown problems. Metaknowledge (talk) 17:02, 16 September 2020 (UTC)Reply[reply]
  • Full support to this initiative and to the conclusion. French Wiktionary community reached the same conclusion about this specific project but stayed unable to act and it's a pleasure to see this audit nicely made! Noé (talk) 22:36, 16 September 2020 (UTC)Reply[reply]
  • @PiRSquared17, Zoozaz1, AllyD, Can I Log In, GZWDer, Base, Nemo bis, SecretName101, Janwo, MF-Warburg, Minorax, and Liuxinyu970226: Please vote at Talk:Small wiki audit/Malagasy Wiktionary#Poll to make your opinion clear. Metaknowledge (talk) 17:37, 17 September 2020 (UTC)Reply[reply]
  • This is terrific work. Yes, these bot entries should be deleted. And to immediately combat their claim that "No concluding results was given, and things were as they were before." we should come to a quick consensus here as to whether they should immediately cease creating new bot entries. I move they they should immediately cease. Any support or opposition? SecretName101 (talk) 05:36, 21 September 2020 (UTC)Reply[reply]
Seconded. --Janwo (talk) 09:27, 22 September 2020 (UTC)Reply[reply]

(discussion has been moved to align with RFC)  — billinghurst sDrewth 00:34, 26 September 2020 (UTC)Reply[reply]

Requests for comment/Concerned about Urdu Wikipedia articles' truthiness and neutrality[edit]

Interesting conversations. I haven't done any verification of the claims made but thought I would mention. --Rschen7754 18:04, 17 September 2020 (UTC)Reply[reply]

Wolof Wikipedia[edit]

Per their babel box on their Wolof Wikipedia user page, prolific user @Jëfandikukat:Ji-Elle: professes to have absolutely no fluency in the language, yet has created many articles and is the second-most prolific editor on this small wiki.
Same goes for the fifth most prolific editor @Guérin Nicolas: on their Wolof Wikipedia user page, who also appears to be an admin. Except Guérin Nicolas is not creating articles, and from what I can gather, appears mostly to be making edits regarding images, categories, and other stuff. SecretName101 (talk) 18:02, 21 September 2020 (UTC)Reply[reply]

Re the first user: Do you have any information as to whether their edits are good or bad, linguistically?
Re the second user: As long as those edits and categorizations are benevolent, their language skills are not problematic. I also sometimes correct or update data (like census data etc.) in wikis whose language I don't speak. It's article creation, especially if it's large scale, that should worry us. --Janwo (talk) 09:35, 22 September 2020 (UTC)Reply[reply]

If you are calling this an audit process, then have a proper framework[edit]

From my looking at the outcomes of the first audit undertaken it has not been a neutral auditing success. An audit should have:

  • a scope
  • a framework
  • a summary of findings

An audit should:

  • be impartial
  • present its report to its auditees
  • seek feedback

Only then would you really come to recommendations.

If we want to have communities accept an outside opinion it has to come across as a process that is going to be fair, demonstrate that it has quality outcome at its heart, and not be aimed at personalities. It also has to look to allow communities to change with the desire for improved output, not under the threat of consequences.  — billinghurst sDrewth 00:32, 25 September 2020 (UTC)Reply[reply]

I'll note that in the draft I originally wrote, an audit was understood to be a descriptive report, not a prescriptive proposal for taking specific action. Even in the current draft, while bullet point 9 says the report could propose starting an RfC, it doesn't say that the report itself would be an RfC with specific proposed actions. PiRSquared17 (talk) 01:18, 25 September 2020 (UTC)Reply[reply]
The "small" in the title refers to the number of active editors on a wiki. This suggests that this small number could be the cause of some problems. So for any remedy it would be of paramount importance to take an approach that will increase the number of active editors. Just applying a kind of surgical approach from the outside will probably result in unintended adverse results. For this reason I concur with billinghurst. --MarcoSwart (talk) 09:45, 25 September 2020 (UTC)Reply[reply]
The number of _active_ editors is in most cases a good indicator of how active the wiki's community is as a whole, especially with respect to quality control and antivandalism. A small but highly active community will be make sure that enough of its members have the necessary sysop rights. But "small" also means small communities. If a language's speaker community is small (like langs with < 1000 speakers), there won't be so many speakers from that language who are active contributors (and sysops). Of course the audit can point out problems to an existing wiki community, but the wikis that have been mentioned here often lack exactly this: a sufficiently large base of native speakers making sure the content is adequate. So actually such an audit might rather serve as an incentive to "repair" or improve contents. Just look what the attention to scowiki did. Among other things, it mobilized native speakers to help "their" wiki. --✍ Janwo Disk./de:wp 13:57, 25 September 2020 (UTC)Reply[reply]
Having written a lot of the process of the audit, I would like to weigh in on some of these concerns. First of all, it is still a draft; if there are any specific points you want to improve feel free to improve them.
Scope: The scope of each audit is analyzing the language quality of each small wiki and its compliance with global policies. Specific issues that would fit under that scope can only be determined through the process of the audit
Framework: I guess the framework would be similar to the scope, to look for issues in small wikis that need to be rectified.
Impartial: Of course an audit should be impartial, as the audits are, but since you are trying to find problems, then it will inherently seem partial towards the problems of a wiki rather than the upside to it because you are only writing about the problems, which is the goal of the audit. And of course you go into an audit looking for problems not based on specific grievances but a genuine desire to fix things if there are problems.
Present the report: The report is available on meta and will be posted on each wiki concerned.
Seek feedback: See the extensive feedback on the talk page of the malagasy audit.
I don't really see your concerns here. There is no threat of consequences, just a deserve to fix any problems, of course with community input. And obviously we should not target any specific editors, but if the cause of a wiki's problems are due to one editor it would be irresponsible and a disservice to the wiki not to mention it. If you have specific recommendations or concerns, it would be a great idea to incorporate those into the page but I don't see any. Zoozaz1 (talk) 19:40, 25 September 2020 (UTC)Reply[reply]

An idea for the next wiki to audit[edit]

I brought this up a couple of years ago. lrc.wikipedia.org (w:Northern Luri) is not readable nor understandable to its native speakers due to its original script and invented words (mostly borrowed from Kurdish, making it not usable if you don't know Kurdish). I asked before from several native speakers and I can do that again. Amir (talk) 12:03, 26 September 2020 (UTC)Reply[reply]

If you would like to follow the process outlined on the audit page and conduct an audit (which I would fully understand if you didn't have the time) then the community will be able to assess the problems, if any, of the wiki and possibly find solutions. Native speakers would also be very useful in assessing quality. Zoozaz1 (talk) 15:58, 26 September 2020 (UTC)Reply[reply]
For those interested, there is currently a discussion here to close the wiki. Zoozaz1 (talk) 00:50, 29 September 2020 (UTC)Reply[reply]
This wiki has closed so
Checkmark This section is resolved and can be archived. If you disagree, replace this template with your comment. Liuxinyu970226 (talk) 01:23, 25 January 2021 (UTC)Reply[reply]

Cebwiki and Warwiki[edit]

As shown in the following link, the Cebuano and Waray Wiki projects are very heavily inflated. This matter has even hit the news before the Scots Wiki did: 1 2. I propose that all the Lsjbot machine translated articles get deleted, as they are 2 sentences long max and correct grammar and vocabulary can't be guaranteed. --Glennznl (talk) 09:44, 31 October 2020 (UTC)Reply[reply]

@Glennznl: Should this proposal also contain Swedish Wikipedia (as the home town of Lsj) and Wikidata (because some WD users said something like "Lsj did good jobs in some regions")? --Liuxinyu970226 (talk) 05:18, 26 November 2020 (UTC)Reply[reply]
You should create a RFC with a clear analysis of the issue. Currently I will oppose any action.--GZWDer (talk) 06:32, 26 November 2020 (UTC)Reply[reply]

Systematic approach[edit]

Hi all; Liuxinyu970226 pointed me to this page after I made a similar suggestion on the LangCom's talk page. I would welcome a systematic approach in this; checking all small language versions (under a certain number of active contributors? Threshold?), possibly by a group of linguists, and then repeating this, say, every 5 or 10 years. After Scots, the case of Northern Luri and the resulting closure of that project has shown IMHO that there might be more grave issues hidden in small language versions. Gestumblindi (talk) 20:09, 25 January 2021 (UTC)Reply[reply]

This is a good idea, and the one that should be done with long-term support from appropriate team at the Foundation. It would also require a clear and concise SOP and transparent metric of which project will be subject to audit from time to time (in order to reduce the possibility of being too arbitrary to certain language group/project). I am very much in favour of developing a systematic, regular, and global approach for SWA. dwf² 14:27, 26 January 2021 (UTC)Reply[reply]

Individual users creating new articles purposefully only in small wikis[edit]

I keep running across this phenomenon of individual users, without any claim to actually knowing the languages, editing small wikis and only small wikis. None of them have been unproblematic and most of their edits are garbage, usually the same stub translated into 10+ languages. I think this case needs to be added to the 1st step in the process since none of the current ones cover it imo. -Yupik (talk) 22:27, 14 March 2021 (UTC)Reply[reply]

@Yupik: It's only a draft, so you can be bold and make the change yourself if you feel it's a good idea. Zoozaz1 (talk) 02:59, 27 March 2021 (UTC)Reply[reply]

Small wiki audit/audits/Kyrgyz Wikibooks[edit]

Hi, it's been a long time since this was active but I thought this would be a good test case. There are 61 pages on the above wiki, however I believe 43 of them are not legitimate. I was going to propose closure of the wiki, however as there are a few (less than 10) unquestionably legitimate pages I thought I would bring it here. Rschen7754 23:03, 30 July 2022 (UTC)Reply[reply]

Small wiki audit/audits/Macedonian Wikibooks[edit]

Another one - lots of articles in the wrong language and lots of copyvios - and I suspect that many of the ones that are actually in Macedonian might be copyvios. Rschen7754 19:49, 31 July 2022 (UTC)Reply[reply]