Talk:Small wiki audit/Queries/Prolific article creators

Add topic
From Meta, a Wikimedia project coordination wiki
Latest comment: 3 years ago by Liuxinyu970226 in topic Redirects

Babel[edit]

It would be a good idea to try to enrich this data with users' babel information (perfectly the wiki in question, the wiki where they have the most edits, commons, meta and wd should be checked to have that info). --Base (talk) 23:22, 29 August 2020 (UTC)Reply

Good idea, I'll try to get that done soon. PiRSquared17 (talk) 23:24, 29 August 2020 (UTC)Reply
@Base: Done. PiRSquared17 (talk) 05:32, 30 August 2020 (UTC)Reply
Thanks for this table and the work behind it! I noticed you marked it when the wiki's language and the babel language match (boldface), may I suggest you also highlight babel zeroes or users that do not have that language babel coded at all? That's the ones we should have a closer look at after all. --Janwo (talk) 01:45, 31 August 2020 (UTC)Reply
@Janwo: Good idea, that should be pretty easy to do. PiRSquared17 (talk) 02:28, 31 August 2020 (UTC)Reply

Redirects[edit]

I did not check the queries, but judging from IanraBot@crhwiki you are not excluding redirects from the count. --Base (talk) 02:35, 30 August 2020 (UTC)Reply

@Base: I am definitely excluding redirects. It's just that IanraBot has non-redirect page creations (they're not the most recent in its contrib history). See this query: it has created 2041 non-redirect articles and 2024 redirects. If you use the (approximate) values in the table you can compute roughly the number of pages it has created: (.74-.47)*7578 = ~2046. PiRSquared17 (talk) 03:00, 30 August 2020 (UTC)Reply
Oh, you are right, I have somehow scrolled past them when looking. Thanks! --Base (talk) 01:26, 31 August 2020 (UTC)Reply
@Base: Note that I Oppose Oppose excluding redirects (at least oppose so for zhwiki* projects), with the help of Wikidata's redirect-linking, the WD-linked redirect pages can also be either useful for future page creating, or can be a black hole of one language rather than a single wiki. --Liuxinyu970226 (talk) 14:28, 9 September 2020 (UTC)Reply
I did not completely understand your comment, but I am not sure if what you write is about the same thing as this report is trying to investigate. --Base (talk) 15:37, 9 September 2020 (UTC)Reply
@Base: You don't completely understand, probably because your most familiar wiki, the Ukrainian Wikipedia does rarely have Wikidata-linked redirects, which are really also included in your original rationale of this thread. --Liuxinyu970226 (talk) 10:04, 23 October 2020 (UTC)Reply

Suggestion[edit]

@PiRSquared17: I understand if this isn't feasible as I can't imagine that this would be easy, but would it be possible to create a bot for smaller wikis to find all of the edits of the wiki and then, for all the users with babel templates, find out what percent of edits were created by users of each babel level? Of course it would leave out anonymous editors and users without babel templates, but I assume it would be useful comparatively across wikis. Zoozaz1 (talk) 02:03, 31 August 2020 (UTC)Reply

Yeah, that would certainly be possible, but it would probably take a long time. I might look into this next week if I get a chance. One idea to reduce the amount of time required would be to take a random sample of edits instead of looking at all edits. Also, lots of edits are maintenance, so maybe it would be better to keep using article creations? I'm not sure. Could look at both. PiRSquared17 (talk) 02:10, 31 August 2020 (UTC)Reply
That's a good idea to take a random sample of edits; maybe we could also weed out the maintenance by calculating it by bytes added (and ignore it if it's negative) along with article creations? Zoozaz1 (talk) 02:41, 31 August 2020 (UTC)Reply
Idk if that makes things easier, but i just tried one approach manually with tywiki (Tahitian), and checked the recent changes and explored the last active users first. Lo and behold there's someone who has ty-0 on their profile and who created over 30 "articles" (stubs) about cities etc. which basically only differ in one word - the place's name. So maybe starting from "recent changes" and "new pages" (and probably the list of orphaned pages) would give us a first idea of whether it's necessary to check all users. --Janwo (talk) 13:57, 31 August 2020 (UTC)Reply

alswiki[edit]

Hi PiRSquared17

Since my homewiki (alswiki) is also listed here, I would like to point out that Alemannic (in Wikipedia als) has the code gsw in the ISO standard and therefore the language is mostly listed as gsw in the Babel templates. The reason that Wikipedia uses the wrong code here is that in the founding year of Alemannic Wikipedia there was no code for Alemannic.

Best regards --Holder (talk) 07:48, 18 September 2020 (UTC)Reply

@Holder: Hi, this table includes basically all public Wikimedia projects. If a project is listed here it doesn't necessarily mean there's any problem with it. I was already aware of special language codes, but thanks for the reminder about als/gsw. PiRSquared17 (talk) 16:30, 18 September 2020 (UTC)Reply

brwiki[edit]

Hi,

As a former admin on brwiki, I'd like to offer my point of view : I can vouch that the five users Bianchi-Bihan, Kadbzh, Gouerouz, Neal, Llydawr write in good to very good Breton (one of them - I could disclose which one in private if needed - even being a teach in University and wrote several methods to teach Breton). There maybe be other problems on brwiki but "linguistic accuracy" is not really one of them.

For the record, @PiRSquared17:

A galon, Cheers, VIGNERON * discut. 15:53, 18 September 2020 (UTC)Reply

@VIGNERON: Hi, this table includes basically all public Wikimedia projects. If a project is listed here it doesn't necessarily mean there's any problem with it. Thanks for the heads up about brwiki. PiRSquared17 (talk) 16:30, 18 September 2020 (UTC)Reply
Of course, I understand that this list is to spot potential problem and since I know well this project, I wanted to share my take on br.wp. Cheers, VIGNERON * discut. 16:37, 18 September 2020 (UTC)Reply

Remark on Wikisources[edit]

Hi,

As an active member and admin on multiple Wikisource, I want to raise a point. With the mw:Extension:Proofread, the main activity on Wikisource is just checking that the text on the left side of the screen match the image of the file on the right part of the screen. With this extension, it does not really matter if you know or not the language (that said, it still helps a lot to know the language).

So for Wikisource a good indicator of the quality for Wikisource would be the percentage of texts using this extension (and also the indicator of quality of the pages itself, but this is more self-declarative so it more easy to be biaised in theory). Luckily there is a tool doing it already : https://phetools.toolforge.org/statistics.php On this tool, you can see that Wikisource like br.ws or bn.ws have a very high level of using this extension (99.96% and 99.92% which make them #1 and #3) and anyone can check that for instance this page is of good quality : s:br:Pajenn:Inisan_-_Emgann_Kergidu.djvu/15. Meanwhile, other like kn.ws, cs.ws or zh.ws have a very low level (1.03%, 3.12% and 9.16%).

I hope my remark could help refine this table.

Cheers, VIGNERON * discut. 16:06, 18 September 2020 (UTC)Reply

That's a very good point. I think we may want to figure out project-specific metrics of quality so that we can find possible problems. The proportion of pages that are proofread/validated could be a good proxy for Wikisources, although I'm not sure there's any inherent problem with having unproofread pages in the Page namespace. Couldn't it just mean that someone thought it was worth uploading a book, but they haven't gotten around to editing it yet? Unproofread pages in the main namespace would be more problematic, I guess. Although I have to admit that my experience with Wikisource is rather limited, so I may be missing something. PiRSquared17 (talk) 16:30, 18 September 2020 (UTC)Reply
@PiRSquared17: yes, you're right. The level of use of the extension is just an indicator and a "clue", a project can "use a lot the extension and have bad content" and vice-versa, "not using the extension and have good content", it just seems less probable to me than "using the extension and good content". That's why this metric could be crossed with your table, if both are "bad", this a bad sign and scrutiny may be prioritize on these projects.
Also there is a correlation between age of the project and percentage of use of the extension, old project were created before the extension was created or well-known and most of them don't convert old texts to this "new" system (that's why en.ws is only 57%, meanwhile fr.ws did the conversion and is now 97%). That should be taken into account and maybe focus on recent project with low use of the extension.
For the page status (Not proofread/Proofread/Validated), I'm not sure it's a good indicator, at least not for small project. To reach the "Validated" status you need at least 2 different uses and for a Wikisource, "small" can be 2 to 5 people. And even for "big" Wikisources, there is very different strategy and dynamics: de.ws focus and having almost all page in green while other don't care that much. This subpage https://phetools.toolforge.org/stats.html of the tool can help to hace a quick glance of the diversity of situation, you can see that fr.ws has around 20% of green pages and de.ws has around 80% but in absolute value these 20% of fr.ws is twice more has the 80 % of de.ws! Both situation have pro and cons, none is "bad"; indeed it shows that numbers can be tricky. Again, crossing multiples metrics is a good idea and since Wikisources have already quality metrics, I think it's a good idea to look at them.
To sum it up (sorry for the length) my point of view is: the page status in itself is only a bit relevant, the level of use of the extension is more relevant. « Unproofread pages in the main namespace would be more problematic » true but page don't usually stay "Unproofread" for a long time and "Unproofread" does not always mean bad (espacially in small communities), meanwhile « pages in the main namespace with no scan » seems way more problematic as there is little to no tracability (a bit like an article on a Wikipedia with no source, it may be right or wrong but we can't know for sure).
Cheers, VIGNERON * discut. 17:02, 18 September 2020 (UTC)Reply