Talk:Machine learning models/Proposed/Language-agnostic Wikipedia article quality

From Meta, a Wikimedia project coordination wiki

test talk page post[edit]

testing to see if this will notify the #model-card-notifications channel on WMF slack HTriedman (WMF) (talk) 18:11, 21 December 2022 (UTC)[reply]

Any AI is potentially problematic[edit]

We can start with the quality assessors. I don’t think having editors train the model is a valid way to train for quality; many people who criticize Wikipedia don’t edit and I’ve heard valid reasons for labelling Wikipedia articles (specifically en) as poorly written even though based on the proposed metrics those articles would be labelled as good or high quality. If this is ever going to go forward it’s important to get people from the outside — esepcially people who don’t buy the Wikipedia model — to participate in training.

The other thing is whether this is even a good idea to start with in the first place. Based on our interactions with big tech, AI for classifying any content seems to be very unreliable (and therefore dangerous); our interactions with classifiers here feel no different. Al12si (talk) 20:08, 21 December 2022 (UTC)[reply]

Similar system in Portuguese Wikipedia[edit]

I find this system very similar to what we have in Portuguese Wikipedia, there we have an automatic quality evaluation system since 2011 that uses very similar parameters to evaluate the quality, at that time the algorithm was applied by bots that put the quality in a template in the articles talk page, since 2014 the quality is evaluated in real-time by pt:Module:Avaliação that uses the same algorithm to put the quality in the same template, and some years ago I created a database in Toolforge with the evaluation of all articles, this tool query the database to show the number of articles in each quality, this tool show the article or revision quality and explain the classification, and this tool compares the automatic quality with the ORES quality. The algorithm uses these parameters: page size, number of citations, number of internal links, number of sections, number of paragraphs, number of images, paragraphs length (too long paragraphs indicate wikification problem) and templates that indicates problems. The algorithm does not use machine learning, it uses a set of conditions, for example, to have quality 6 the article must have the template that indicate it is featured, to have quality 2 it must have at least 8000 if it does not have citations or 2000 bytes if it have at least one citation, at least 10 internal links, 5 paragraphs and it must not have templates that are equivalent to the English {context} and {cleanup}. That system could also be used for other wikis, we can get the templates that indicate problems in each wiki using interwikis, and the other parameters are language-agnostic. Danilo.mac talk 17:54, 12 June 2023 (UTC)[reply]