Jump to content

Wikipedia Article Optimization (WAO)

From Meta, a Wikimedia project coordination wiki

Wikipedia Article Optimization (WAO) is a concept that is related to Search Engine Optimization, in that specific feature dimensions of an article's content are improved in order to increase the quality score assigned to that article by humans and machines.

At Wikimania 2007, research was presented on using techniques from Natural Language Processing to determine the quality of Wikipedia articles (link). The general idea behind this research is that a computer does what computers do best - counting features of an article such as sentence length, paragraph length, number of images, internal/external links, templates and potentially thousands of other feautures (see the article for more). Human beings then rate these articles in terms of what their overall gestalt is in terms of quality, and then a machine learning algorithm such as the maximum entropy classifier or support vector machines assign weights to these features, in order to determine which are the most relevant in predicting the human ratings of quality.

After the model has been trained, new articles can be shown to it, and it will return an overall quality score. In addition, the features for the article at that time can be computed and compared to the feature combination that results in the highest quality rating, as determined by the model. This standard is likely to be highly accurate. The model can read the entire Wikipedia, whereas no single Wikipedian can. This gives the model the opportunity to learn something about quality that might be missed by a human being.

WAO comes into play when human editors look at a specific feature value and try to adjust the content of the article to either raise or lower that value to be in line with the highest quality rating possible. The entire point is to try and "game" the quality system. Because of the large number of features used, the fact that the model has read the entire encyclopedia, and because edits are closely watched by volunteers, it would be nearly impossible to save an edit which brought a feature dimension of that article in line with a higher quality score without also increasing the actual quality of the article. Thus, all attempts to game the system improve the quality of the article.