Research talk:Automated classification of article importance

From Meta, a Wikimedia project coordination wiki
Jump to: navigation, search

Work log


Importance of Wikidata Items[edit]

It seems like the proposed notion of importance of articles to a language version of Wikipedia is restricted to articles that already exist in that language. But for smaller Wikipedias, there may be many important articles that have not yet been created. It would be really valuable to have a notion of importance for any entity in Wikidata that has an article in at least one language. In particular, it would be great to have a global, language independent list of Wikidata items ranked by importance as well as a separate importance ranking for each language. These rankings could be used to not only prioritize work on existing articles, but also prioritize work on creating new articles and filling knowledge gaps. Ewulczyn (WMF) (talk) 00:53, 2 February 2017 (UTC)

Hi Ewulczyn (WMF), thank you for the insightful comments! You are right that the current proposal aims to determine importance of already existing articles. I agree that being able to determine importance for articles that do not exist, or for any Wikidata entity for that matter, would be valuable. Then some of the other notions of importance could potentially be a filter on top of that calculation (e.g. "any Wikidata entity with >= 1 article in >= 1 language"). However, I do not think we know much about how importance works across different scopes. Some of it is found in the discussions around List of articles every Wikipedia should have, where contributors argue about whether the list reflects a global perspective. There are also some research papers that look into cultural differences in importance (e.g. Research:Newsletter/2014/June#"Interactions of cultures and top people of Wikipedia from ranking of 24 language editions" reviews a preprint of a PLOS ONE paper that does this).
In summary, I am wondering if this suggests three potential research topics:
  1. Importance by scope
    • Are there meaningful differences in importance depending on the scope? In other words, if we ask a set of WikiProject members, will they agree with importance ratings that were gathered using an algorithm based on Wikidata? What are their reasons for agreeing/disagreeing?
  2. Importance in the context of Wikidata
    • How can we determine the importance of entities in Wikidata?
    • How does Wikidata importance relate to Wikipedia article importance?
  3. Recommendations for article creation
    • This should specifically target smaller Wikipedias. In other words, we are perhaps interested in determining some sort of base set of articles. This might turn out to be List of articles every Wikipedia should have, or it might turn out to be something different.
    • A key element would be to study what happens when local and global scope collide. In our WikiSym 2012 paper we did a rudimentary investigation of articles in a single language and found that they had a limited scope. Based on the PLOS ONE paper mentioned above, different language editions have slightly different focuses. It would therefore most likely be useful if we had a way of adding a local influence to a global scope, or something along those lines, in order to improve the quality of the recommendations.
I'm not certain to what extent these should be incorporated into the proposal. Pinging User:Halfak (WMF) so he knows about this thread. Thanks again for the comments! Cheers, Nettrom (talk) 19:44, 3 February 2017 (UTC)

Rethinking importance[edit]

Thanks for the proposal. In Research:Automated_classification_of_article_importance#Proposed_Project you say: "Training a machine learner based on Wikipedian article importance assessments that can predict article importance both globally (e.g. for a Wikipedia edition as a whole) and locally (e.g. within a specific WikiProject).". What do you mean by "Wikipedian article importance assessments"? Are you referring to predicting standard class types used in some Wikipedia languages, for example, English Wikipedia (see here for a table of classes)?

On a separate note and to document what we have already discussed offline: You have two main paths for tackling the problem of defining what importance should mean. One is the macro-economic path (and I won't be able to comment on that), the other is the more applied path that fields such as CS use. For the latter, you will need to focus on a goal to be able to define importance. For example, if your goal is to find the top n articles a reader should read to learn about topic x, you may need to define importance differently than if your goal is to focus on the task of sorting existing articles that need to be expanded.

And last but not least, feel free to use the features used in section 2.2 of https://arxiv.org/abs/1604.03235 when/if they become relevant to your work. Good luck! :) --LZia (WMF) (talk) 18:36, 6 February 2017 (UTC)

Hi LZia (WMF), thanks for the comments! The "article importance assessments" that I'm referring to are the Low/Mid/High/Top-importance ratings found on article talk pages, yes. I see from the link you provided there are also "Bottom", "NA", and "??/Unknown" ratings. "Bottom" appears to be used sparingly compared to the common ones, and I don't see "NA" and "??" ratings being very useful as they are basically a "non-label".
I agree with your example that importance might be defined differently depending on what scope of articles we're looking at. It reminded me of what I wrote in my response above, we currently do not know much about how importance differs with different scopes. To some extent I think it also could blur the line between importance and relevance. The latter is a topic I have so far mostly scratched the surface of, but I've read a couple of interesting papers in JASIST (Hjørland, B. "The foundation of the concept of relevance", 2010; and Cole, C. "A theory of information need for information retrieval that connects information to knowledge", 2011). At the moment I am not sure I am able to articulate my thoughts about it well, so I'll keep working on it.
And thanks for mentioning the paper, I'll be sure to revisit it and definitely grab features from it where it would be useful. Cheers, Nettrom (talk) 01:26, 8 February 2017 (UTC)
@Nettrom: I just ran into this while familiarizing myself with the board candidates. ;) Sending it to you in case you haven't seen it. FYI. --LZia (WMF) (talk) 01:50, 13 May 2017 (UTC)
@LZia (WMF): I hadn't seen it before and it looks incredibly useful to this project, thank so much for letting me know about it! Cheers, Nettrom (talk) 17:01, 15 May 2017 (UTC)

Looking within WikiProjects[edit]

I generated some inlink counts for a few articles in WikiProject biology. See https://quarry.wmflabs.org/query/16204

article Articles in WikiProject Links from within WikiProject
Biology 2520 418
Abdominal_cavity 2520 4
Abyssal_plain 2520 7

--Halfak (WMF) (talk) 22:28, 6 February 2017 (UTC)


Oooh! Here's a query that gets inlink counts for all pages linked from entire WikiProject. https://quarry.wmflabs.org/query/16210 --Halfak (WMF) (talk) 23:36, 6 February 2017 (UTC)

definition of importance[edit]

Reading about this project it occurs to me that a very important distinction is being overlooked. Originally article importance was conceived as an indigenous metric --- how important is a giving article to the self-defined community of editors. Many of the proposed approaches subvert this by using external inputs (from other cultures, world-views, languages) to define importance. If we assume that wikis are going to continue to be self-governing self-organising communities, imposition of a foreign definition of importance may not be well received, particularly if it subsumes an existing indigenous metric (even if that metric is poorly used). Stuartyeates (talk) 20:25, 8 February 2017 (UTC)

Hi Stuartyeates, thank you for reading and commenting on the proposal! I am not sure I completely understand your concern. Historically, Wikipedia contributors have used several external inputs to determine importance, e.g. the usage of the content of the 1911 Encyclopædia Britannica is an argument for anything in that encyclopedia being important, or the usage of external sources (e.g. Google) to determine notability. Sometimes inputs have also come from within the Wikimedia sphere, I know for instance some contributors looking to translate content into an edition would seek out articles covered by a large number of languages, meaning that "number of language editions with this article" would be a measurement of importance. Lastly, there's the usage of Rambot and Lsjbot to create a large number of articles based on external databases, thereby arguing that something found in those databases is important.
That being said, I am a fan of human-in-the-loop tools, and know from my work with SuggestBot and the discussion in the Signpost regarding our research paper on the misalignment between popularity and quality that contributors' interest in external input on what is important varies greatly. That is of course something I'll keep in mind as we move along and build tools around this. Cheers, Nettrom (talk) 23:21, 9 February 2017 (UTC)
en.wiki editors reaching a consensus to use 1911 britannica is fine, because en.wiki editors made a decision about en.wiki and collectively acted on it. A framework from wmf or based on a large group of foreign wikis is completely different an is likely to be viewed with scepticism by some wikis, particularly those with cultures struggling with the results of colonialism. Stuartyeates (talk) 09:18, 10 February 2017 (UTC)

Past discussion about Importance modeling[edit]

See en:User_talk:Jimbo_Wales/Archive_208#Specific_proposals. Ping EllenCT! We're working on this :D --Halfak (WMF) (talk) 18:41, 16 February 2017 (UTC)

Excellent! Thank you very much. EllenCT (talk) 20:48, 17 February 2017 (UTC)

Amazing medical classifications[edit]

What you did at Wikipedia_talk:WikiProject_Medicine/Assessment#WikiProject_Medicine_and_importance_ratings.

Based on your post there, I do not know how to get more involved. For example, you have some items which you ranked, and I think if invited and if you had a way to get feedback people would confirm or disagree with your system's choices. What kind of community response do you want from this? Blue Rasberry (talk) 22:24, 23 March 2017 (UTC)

I believe that link should be en:Wikipedia_talk:WikiProject_Medicine/Assessment#WikiProject_Medicine_and_importance_ratings Stuartyeates (talk) 09:36, 26 March 2017 (UTC)
Hi Blue Rasberry , thanks so much for your very useful feedback in the thread you refer to, and for getting in touch! I'm happy to hear you like our work so far!
We're currently in the initial stages of this work, where I'm mainly focusing on figuring out where the large gains can be made. The fact that WP:MED rates certain kinds of articles as Low-importance is an example of that, I expect our performance to be much better once I figure out how to add that kind of information to the model. At that stage a round of feedback on the predictions from a group of WP:MED members should be very useful as it can provide us with knowledge of where the model's predictions might be way off (those are important test cases), or where the existing rating of an article needs to be adjusted (which can lead to an improved training set). Right now it's difficult for me to estimate how quickly I'll have an improved model ready, but I can get in touch with you when I do. Thanks again for your great comments and interest, much appreciated! Cheers, Nettrom (talk) 17:03, 27 March 2017 (UTC)

Evaluation metric[edit]

Very interesting research. I do however have one thing I'm wondering about. The performance seems to be solely measured in accuracy. As there is and order in the labels some mistakes are however more wrong than other mistakes. Say we have an article X which has the label Top. If the algorithm classifies this as High (1 distance) that is much better than when it gets classified as Low (3 distance). Maybe providing the full confusion matrix, or looking at whether w:RMSE can be used to evaluate (the unknown class might be an issue) would be an idea. Basvb (talk) 05:31, 19 April 2017 (UTC)

Hi Basvb, thanks for getting in touch about this! You are correct that focusing solely on accuracy can lead to problems with misclassifications as you describe, and this would be a particular problem if the classes are weighted differently (e.g. if we decide that getting Top-importance articles correct is more important than getting Low-importance articles correct). In the work that I have done so far, I focused on accuracy firstly because we were looking to understand if we could make these kinds of predictions, and secondly because we try to make large gains. Once we found a model that was performing reasonably well, adding more predictors to it tend to result in small changes, and that's why I also started focusing on parts of the confusion matrix, to catch those types of errors that you describe. At the same time, it seems to me that as our models got more accurate, they tended also to not make large errors.
Nonetheless, it's something I'll keep in mind as we work with this, and I'll do my best to make sure we report useful measures as we continue this work. Thanks again! Cheers, Nettrom (talk) 20:23, 20 April 2017 (UTC)
Hi - I'm finally getting round to responding to your earlier post, having confused it initially with another project! I'm glad you're working on this. You're doing a more systematic study of what we did on the English Wikipedia when we put together our metric for w:WP:1.0; you can see my talk at the 2009 Wikimania. explaining this (I can send you the Powerpoint if you wish). The main difference seems to be that we used three machine-based measures; we used the page views and the no. of links in, but we also used the no. of language versions of an article as well. This last input helps to give a more global view of importance; for example, you might find in the English language version that an American baseball player is ranked much higher than a French soccer player, because many readers are American. When you include all the different language versions, you may find that the baseball player is only in 3 or 4 other languages, but perhaps the soccer player is in 30 or 40, and we took this to indicate that the soccer player has a higher global reach. I found it especially interesting to use this to judge pop singers or actors, or sports teams, where I was often unaware of how global some were while others were really just known in one or two countries. All three variables can have their flaws (for example, a top-ten article on en:WP based on no. of page-views used to be "List of Gay Porn Stars"!) but put together you get a fair picture. The three machine-based measures were then combined into one number, which we referred to as an "External Interest Score":
External interest points = 50 * log10(hitcount) + 100 * log10(internal links) + 250 * log10(interwiki links)
This was then combined with a WikiProject score in order to get overall importance. Since WikiProjects vary tremendously in scope (I'm fond of using the examples of w:WikiProject Music vs w:WikiProject:The KLF, a KLF-related article in the latter might be classed as "Top" but as "Mid" or even "Low" in the former - quite legitimately. We therefore adjusted the importance score for WikiProject scope, to give a much more balanced points score for these WikiProject-based importance ratings. We liked to include the human-based ratings, because they act as a reality check on things that might get data-based errors for one reason or another. These method were used on putting together w:Wikipedia:Version_0.7 and w:Wikipedia:Version_0.8.
Our goal at the 1.0 project is in assessing importance so we can select relevant articles for offline release. We're actually working on new selections right now (though our page doesn't really show that well), so your work is of great interest to us.
I'd love to chat with you on Skype or something similar. Have you compared your importance ratings with the WP1.0 importance scores? I'd love to see how these compare, and to see if your scoring produces a similar ordered list to ours. We are using these importance scores to put together the new collections, which will have monthly releases. If your system allows us to make importance more accurate, then we'd like to learn and adjust accordingly. Please let me know.
Also, are you attending Wikimania in Montreal? I'd love to talk with you face-to-face if you're going to be there. Even better - if you'd like to attend our post-Wikimania hackathon in Potsdam, New York (at my college), we have the Kiwix people, James Heilman and many other familiar faces attending. We could probably have a separate track to look at measurements of article importance & quality if you were to come. Again, let me know, and keep up the good work! Walkerma (talk) 04:41, 30 May 2017 (UTC)