Today Wikipedia contains more than 40 million articles, is actively edited in about 160 languages, and its article pages are viewed 6000 times per second. Wikipedia is the platform for both access and sharing encyclopedic knowledge. Despite its massive success, the encyclopedia is incomplete and its current coverage is skewed. To address these gaps of knowledge and coverage imbalances (and perhaps for other reasons), Wikipedia needs to not only maintain its current editors but also help new editors join the project. Onboarding new editors, however, requires breaking down tasks for them (either by machines or humans who help such editors learn to become prolific Wikipedia editors).
Up until recently, the task of breaking down Wikipedia contributions and creating a template for contributions has been done for the most part manually. Some editathon organizers report that they do template extractions to help new editors learn, for example, what the structure of a biography on Wikipedia looks like. Sometimes, extracting templates manually means going across languages, especially in cases where the language the editors being onboarded in is a small Wikipedia language (in terms of the number of articles already available in that language). Manual extraction of templates is time consuming, tedious, and a process that can be heavily automatized, to help reduce the workload for editathon organizers and help more newcomers to be onboarded.
This research aims at designing systems that can help Wikipedia newcomers identify missing content from already existing Wikipedia articles and gain insights on basic facts or statistics about such missing components, using both the inter-language and intra-language information already available in Wikipedia about the structure of the articles.