Role of the Wikiprojects?
What is the role of the Wikiprojects in this research? If I follow what you're doing, Wikiproject articles are being used because they provide a convenient way to get a corpus of articles which we know are all about the same broad subject. The corpus gets analyzed and the system learns to predict if an article is the same subject. Is that about right, or did I miss the point?
If that's right, I assume that doesn't mean we'd always be limited to only the list of subjects about which Wikiprojects exist or have existed? E.g., if we identify important subjects about which a Wikiproject has never existed, there are, I trust, other ways to derive a corpus for analysis?
- You're correct that the current approach can only analyze according to existing WikiProjects, and also correct that we could build a new corpus for other purposes in the future. To build a new corpus, however, we would need to define the categories, then hand-code a large number (c. 100k) of drafts according to the new categories, which requires a lot of volunteer work. There might be a tricky way to do this, of course. The topics we choose should somehow be aligned with the purpose we intend to use the new model for. Also note that we're able to build these topic-detecting models for single or small numbers of topics, if we're looking for gaps for example. Adamw (talk) 23:23, 1 May 2018 (UTC)