Grants talk:Project/Frimelle and Hadyelsahar/Scribe: Supporting Under-resourced Wikipedia Editors in Creating New Articles

From Meta, a Wikimedia project coordination wiki
Jump to navigation Jump to search

Name[edit]

Internet Archive table-top Scribe

I suggest choosing another name. In Wikimedia and Wikisource circles, "Scribe" is usually the Internet Archive scanner. --Nemo 13:34, 26 November 2018 (UTC)

That is an excellent point. We are open to change the tool's name. Do you have suggestions? --Frimelle (talk) 15:43, 26 November 2018 (UTC)
How about "Itemizer" since it starts with an item and attempts to expand it using structured rules, the way you include anything on an itemized list. Jane023 (talk) 16:44, 26 November 2018 (UTC)
I currently can't propose a name because I didn't fully grasp what's the point of the proposed software. More on this below. Nemo 08:43, 28 November 2018 (UTC)
Thanks all for your suggestions. We think that Itemizer does not capture entirely the scope of the project (as we add contextual information). We suggest keeping the name in the proposal for the time being as we have sent notifications to community members with this name. With the start of the project, we will create a poll about the name, collecting larger scale suggestions and deciding on those.--Hadyelsahar (talk) 11:51, 29 November 2018 (UTC)

Not just for small wikis[edit]

I endorse this project, but I have some reservations about the presentation of the project. I agree this is important work that should be done and it would be useful for any Wikipedia, not just small ones. For this reason I object to the lead image c:File:Wikipedia articles and editors distribution over languages.png being used to illustrate this, since it doesn't really say anything relevant. Yes we have lots of small languages with small numbers of articles, but don't forget the Encyclopedia Britannica in 2010 was also just 65,000 articles. Size matters, but usability is much more important, and the usability of the current input interface for article creation is what is mostly involved here, not size. If anything a much more relevant image would be a comparison of article growth for those smaller Wikipedias which adopted the Wikidata infoboxes. As a Wikimedian experienced in article & item creation I am curious whether the old opinion about infoboxes being scary for first-time users is true or not. Comparing English Wikipedia to any small Wikipedia is just comparing apples with oranges. Jane023 (talk) 16:40, 26 November 2018 (UTC)

I agree. If developed as a gadget to expand existing articles (see #Difference from article placeholder), this is actually more likely to be useful on larger projects, where more potential users exist. When I edit in English or Italian I often scan the corresponding articles and references in the French/German/Spanish Wikipedias, an automated nudge and summary in that sense can be useful for many. Nemo 11:02, 28 November 2018 (UTC)
We decided to focus on underserved communities, as they have different challenges as the large ones (e.g. the lack of editors). We believe focusing our attention and resources on those communities can be beneficial for all Wikipedians, and can lead in a later stage to deployment on different language Wikipedias. At this stage, however, we want to tailor it to the needs of those editors often overlooked as they are less in number. Knowledge gaps is a well identified problem in Wikipedia and represents a high priority problem addressed also by the Wikimedia foundation annual plan [1]. Accordingly, the top image displays this maldistribution between different language Wikipedias. Agreeing with you on the fact that it is quality, not quantity that matters, our tool encourages the creation of high-quality articles providing users references. --Frimelle (talk) 12:39, 29 November 2018 (UTC)

Comments on community needs[edit]

Hi! Thank you all for this submission. I'd like to explain our local way of working on Arabic Wikipedia about creating bot articles. That will probably help you to identify community needs.

- Bot articles need community approval (on the technical village pump), there by, offering that tool to regular editors will not be allowed for us.

- Bot-generated articles are reviewed in 3 steps: structure (we search for the best structure to extract maximum data from wikidata without changing the ability to easily read those articles, we identify the minimum data needed from Wikidata to select potential articles), test (we create some articles by bot on a sandbox and try to enhance the structure with further comments) and finally reviewing bot-generated articles created mainly by @جار الله:.

Structured articles already created on our Wiki:

  • We are creating now articles about actors (around 30000 should be created)

Take a look at those examples: 1, 2, 3. Lists of films and series are generated from Wikidata if there is a translation in Arabic (see below how), Infobox with references comes from Wikidata and external links are brought from the famous External links module. The first sentence has references also taken from English or French Wikipedia.

  • Same as above but for years by country, we identified paragraphs that we wanted to include (events, films, books, politics, sports, births, deaths, ..) and paragraphs lenght (ex. how many births by month we need to include and how to select famous ones ? We based our selection at the end on the number of interwikis from Wikidata)

Examples: 1975 in USA, 1975 in Tunisia, 1975 in Germany, in Japan, ..

  • Years BC have been created by bots using Wikidata: 187 BC, 787 BC, ..

Our main problem now, is the absence of a tool to make transliteration. Non-latin script based Wikis need an open transliteration tool. Soccer players articles have been created because we found an external website to translate names, same thing for artists. We imported those names into Wikidata labels before creating articles. But what about obscure species of algae? or villages in Samoa? Arabic content online is not available and a transliteration is needed (from latin for species and local or official languages for human settlements). I found some tools online (Google, Microsoft, Drupal) but I didn't test them (I'm not programmer). --Helmoony (talk) 17:05, 26 November 2018 (UTC)

Wow fascinating summary - thanks for posting! I totally agree about this transliteration tool - has it been proposed in the Wishlist survey? Good point about the "year articles". Someone mentioned recently that we need to create a set of properties to better describe these types of articles that are more Wikipedia roundups than coming from external sources. Same thing about technical articles like categories and templates, but lists are a different thing and often have their own sets of rules (what makes the list? what gets dropped?). Bot generation is of course very controversial, but I don't think that is what is going on here. I think this is more of a two-step process where the bot-generated content is then offered for human review and easier copy-editing. Jane023 (talk) 09:37, 27 November 2018 (UTC)
Thanks a lot for your summary of the very interesting work on Arabic Wikipedia! We will definitely get in touch with you during our planned interviews in the early phases of the project to get in-depth details about the process of article creation within each targeted community and what techniques could be useful to adopt while creating this tool. While we will certainly look into the bot you mention, as it does something very related to our work in terms of gathering information from Wikidata, our approach differs in two important points: (1) our tool is not creating articles, such as bots do, but supports editors in the creation of high-quality articles (imagine the difference between bots and the content translation tool), i.e. the tool will not display any information on Wikipedia, without an editor has worked on it (2) We gather information also from online resources in the target language, that will help editors to create more citations, a point often criticised as missing from underserved Wikipedias’ editors. So overall, it is exactly more the idea of copy-editing generated content as Jane023 clarified in the comment above, which will lead to a high quality in content and presentation of each article created through the tool.
Regarding the topic of a tool for transliteration: We believe this is an excellent idea, we have discussed and identified as a missing problem previously, too. It is not in the scope of this project, given that this problem is an open research question by itself and very language dependent. But it is a topic we will be interested in working on in the future in a different context. --Frimelle (talk) 13:57, 29 November 2018 (UTC)

Feedback from Harej[edit]

Thank you for submitting this proposal. It is a very impressive proposal and I am excited at the opportunities that are here.

  • I am concerned about the plan to implement Scribe as a gadget. The gadgets infrastructure isn't really designed for gadgets that run on multiple wikis. I think it's possible but basically you have to make sure that there is one wiki that hosts the code (with the other wikis loading that code) so that you don't end up with multiple, slightly different copies of the codebase – no one wants that outcome. The gadget will also need to support internationalization, and I am not sure there is a standard approach for doing so within gadgets. (I think it's been done before but I don't know the details or if the implementation details make sense.) Of course, if you already figured out how to have a translatable gadget run on multiple wikis without causing problems, great!
  • I really like the idea of going beyond just basic facts from Wikidata and providing deeper contextual information that can be used in writing articles. However I am wondering how exactly it will work in practice. One of the problems you highlight is that there's a shortage of information online in languages such as Arabic and Hindi. And if the Arabic-speaking Internet and Hindi-speaking Internet are anything like the English-speaking Internet, you need to be very careful in what you accept as a source. So I am wondering if you have certain repositories you will be focusing on for this feature.
  • What will the volunteer developers be responsible for? My concern is that volunteers and paid staff operate with different motives and incentives, and I wouldn't want to see the project fall behind because of unavailability of volunteers or volunteers reneging/falling behind on commitments.

Cheers, Harej (WMF) (talk) 00:24, 27 November 2018 (UTC)

Thanks a lot for your feedback and for raising such interesting points, we will reply below to each of them in the same order:
  • We decided on a gadget, as e.g. ProveIt, which has a similar base idea as our project (in a very simplified way) uses the gadget infrastructure between multiple language Wikipedias, too. We are aware of the issues that come with replication of code between multiple Wikipedias and will look into a good solution for this for our tool. However, we are not settled on the gadget either, if another way of using our tool sees fit. The gadget will, for now, give us the possibility to interact with the editors and serve them a tool as the result of the underlying work fast. Will this prove as not sustainable, we will change it to e.g. an extension in the course of the project.
    • The problem of internationalization is important to address. We will make sure to use existing tools and infrastructure, but we do not have an out-of-the-box solution for this yet.
  • The question of trustworthiness of sources is another important point. We thought of building a repository of sources we found as acceptable, by collecting sources used before on this (or other languages) Wikipedia or by calculating trustworthiness through redundancy of the information. But as we will not display the information right away but have a human-in-the-loop approach, we have someone judging the references. While there will be some blacklisted anyway, the remaining references might be displayed to the editors and filtered by them. Which approach gives the best results is something we will have to explore as part of our studies.
  • When we find someone interested in working with us, we will decide with them based on their interest and previous contribution what they will work on. But of course, we will make sure that the project does not depend on them solely, as they might become unavailable or have other commitments. We want to emphasize including volunteers and having an outreach for them to ensure that the code will follow the standards and therefore is maintainable after the project is finished.
--Hadyelsahar (talk) 15:02, 3 December 2018 (UTC)

Difference from article placeholder[edit]

Speaking of why make it a gadget, it would be helpful if the proposal explained why this cannot be done as part of the article placeholder extension.

The whole point of the article placeholders, when the idea started, was filling red links with Wikidata. Automatic generation of prose from Wikidata information, à la Reasonator, was the biggest challenge for the article placeholder extension (and was for now sidestepped by opting for a tabular presentation). It was supposed to help users convert the article placeholder in an actual article with less effort than starting from a blank page.

On a first read, this project proposes an add-on to the initial article placeholder idea, in short (I simplify) that we can also automatically suggest a TOC and a references section in addition to the lead section content and infobox data. I would expect such a feature to be easiest to "sell" as part of the seed text created by ArticlePlaceholder when creating a new article (otherwise I'm still basically starting from zero text written). The only way I can see this being useful as a gadget is if you use it to suggest expansion of existing articles, "hey this article doesn't deal with topic X which seems important, do you want to check these sources X Y Z?". But then again that would be most useful as a call to action to unregistered users or users who would not otherwise be contributing to the article, so I can rather imagine it being activated via an extension on all articles marked as stub in a certain or something like that. Nemo 09:06, 28 November 2018 (UTC)

I have been very involved in the development of the ArticlePlaceholder project, and therefore happy to hear about it, too. In this project, however, we want to address particularly editors, to support them in their editing experience. We do not create articles and do outreach in the form of increasing article size by automatic creation. The idea of suggesting new content based on an existing article is similar to the work of [Leila's paper]. The author's don't include references but the idea is very similar. But we want to focus on the content gap, i.e. missing articles as a whole, that an editor wants to create supported by a tool that helps them with the structure and references. --Frimelle (talk) 12:39, 3 December 2018 (UTC)

More on reference digestion[edit]

A large part of this idea seems to be about machine learning on the references, of the kind initially tried by wikidata:Wikidata:Primary sources tool and more recently by Quicksilver. In this field, important projects are http://gettheresearch.org/ (planned) and https://www.semanticscholar.org/ (already useful). Nemo 09:06, 28 November 2018 (UTC)

Yes, indeed part of our project is collecting references and representing their corresponding key points for editors. Many of the pointers you mentioned are very related in an aspect or two. Unlike the Primary sources tool, we deal with information that is more contextual than information which can only be displayed on Wikidata. The type of references offered to editors will cover a large scope of documents not only scientific publications, hence the difference to Semantic scholar and Get the Research. Finally, and most importantly, we focus on helping editors in under-resourced languages communities in Wikipedia [see https://meta.wikimedia.org/wiki/Grants_talk:Project/Scribe:_Supporting_Under-resourced_Wikipedia_Editors_in_Creating_New_Articles#Not_just_for_small_wikis]. This makes our use-case different than Quicksilver although it is very possible that many of the underlying research in both projects will be common. --Hadyelsahar (talk) 15:21, 3 December 2018 (UTC)

Handling cultural and linguistic differences[edit]

It would be interesting to know how you propose to handle suggestions that are culture-specific. In your example, the Arabic Wikipedia, it's easy to imagine some content and references being very politically sensitive. The same for Ukrainian or any language and topic where there is a vast body of literature on the "same things" which are however perceived as very different in different languages or places. Nemo 09:06, 28 November 2018 (UTC)

In an encyclopedia such as Wikipedia, the content is written from a neutral point of view, referring to collecting information from different (trustworthy) primary and secondary sources, see also verifiability. We only support the editors in collecting those sources, we do not push the topics in either direction. We believe that is to every Wikipedia community to decide to work with those difficult topics and we do not want to interrupt this decision process. Therefore, our references are only suggestions, the editor can decide what information they want to include or what topic needs more research. We do not create a comprehensive article for the editors. Further, I do not support the assumption that sources from different languages per se have biases in topics. If a source is trustworthy, it will cover the topic thoroughly, independent from the original language. --Frimelle (talk) 12:39, 3 December 2018 (UTC)

Translation tool vs. approach[edit]

The community has built a set of tools to facilitate article creation, such as the Content Translation Tool. This tool enables editors to easily translate an existing article from one language to another.

What are the other tools? I was expecting a wider analysis rather than just one tool/approach.

The articles that can be translated are selected by their importance to the source Wikipedia community. Topics with significance to the target community do not necessarily have an equivalent in the source Wikipedia. In those cases, there is no start point using the content translation tool. It has been shown, that the English Wikipedia is not the superset of all Wikipedias, and the overlap of content between the different languages is relatively small, indicating cultural differences in the content. Editors should be encouraged to avoid a topical bias and cover articles important to their communities.

Editors can translate from any language (they know), not just from English. If there isn't source article in any language, then obviously it cannot be translated. It's not a limitation of Content Translation tool, but limitation of the whole translation approach.

Especially for underserved languages, the machine translation is limited. There are few documents available online aligned with English and even less with other languages, that the translation engine can be trained on. This leads to often criticised quality.

Not all translation engines need a corpora. Some small languages have good hand-built engines, although that is an exception and not the norm.

Monolingual speakers are disadvantaged. They cannot verify the translation in the context of the source language or the references used.

This again is a limitation by the approach, not by the tool. The section should be renamed to highlight that you are contrasting translation against structured/guided article creation. --Nikerabbit (talk) 11:18, 28 November 2018 (UTC)

Considering other tools and comparing to them: We focus in on our comparison on the content translation tool, as this is the closest one that (1) focuses on editors and (2) underserved language editors. We are aware of a range of tools/techniques that support editors, e.g.: Bots, Importing infobox from Wikidata, Article placeholders, Gap finder, External tools (Google Translate, Quicksilver), and the Translate extension. Thus the only possible comparison will be a use-case comparison since we don’t replace any of the existing tools but rather fill the gaps not covered by them. --Hadyelsahar (talk) 14:44, 3 December 2018 (UTC)
Considering the limitations of the content translation tool and the translation approach: In our proposal by mentioning the content translation tool we mean the translation approach in general. Since the content translation tool is the most used tool for content translation in Wikipedia by the underserved communities we thought to highlight it as an example of the translation approach. We believe that Human aided summarization approach in our proposal can fill in the gaps not filled by the translation approach. --Hadyelsahar (talk) 14:44, 3 December 2018 (UTC)

Eligibility confirmed, round 2 2018[edit]

IEG review.png
This Project Grants proposal is under review!

We've confirmed your proposal is eligible for round 2 2018 review. Please feel free to ask questions and make changes to this proposal as discussions continue during the community comments period, through January 2, 2019.

The Project Grant committee's formal review for round 2 2018 will occur January 3-January 28, 2019. Grantees will be announced March 1, 2018. See the schedule for more details.

Questions? Contact us.

--I JethroBT (WMF) (talk) 03:16, 8 December 2018 (UTC)

Questions and concerns[edit]

Thank you for this interesting proposal. However I have a number of questions/concerns:

  1. As written the project aims at creation of a tool with rather fabulous capabilities: it should automatically suggest the topic, structure and references as well as the key points for all sections. Are the goals of this project realistic? Basically your tools, if created, will be have AI like capabilities.
  2. What are computational requirements for this tool? Can it be plausibly run within an internet browser itself running on a relativity slow computer? (which is common in underesourced countries)
  3. It is unclear where the references will come from? You mention Wikidata but also some unspecified external online sources. So, are you going to somehow look for sources in the internet? and how will this be accomplished?

Ruslik (talk) 18:31, 11 December 2018 (UTC)

  1. Indeed, the capabilities are quite impactful. We have experience working in those fields as well as in integrating a tool into Wikipedia, therefore we are sure we can tackle the problems in the time stated. We have worked on similar projects in the past, combining research ideas with real-life problems, and the support of a developer will make sure it will be implemented in a timely manner. Each of the problems represent well-researched areas. We aim to extend those areas with the focus on low-resource languages, but do not reinvent the field. For clarification: While suggesting topics to editors is the natural extension of our work, it is part of this proposal. Once the tool is implemented however, it can easily be integrated in previous work on the topic of suggesting articles to create and edit, e.g. the gap finder.
  2. Our computations will be split into the following: 1) client side computations (i.e. browser): to manage lightweight front end functionalities such as generation of Wikitext, drag and drop etc. This should be very lightweight and able to be processed by any computer 2) Service based computations: The developed gadget will do api calls to a web service hosted on an external server This is similar to what is happening in other gadgets such as ProveIt [2]. The server side will be responsible for performing tasks such as querying and filtering references, calculating textual similarity between Wikidata entities for section suggestion, performing extractive summarization. In order to reduce the online computation load we intend to do a large amount of caching for potential topics for each target language (existing Wikidata ids without articles) before the service goes public.
  3. We are collecting references through a search engine. This will be either an open online API or over a local index the common crawl web corpus [3]. Based on their rankings, we can discover related documents. We are aware, that there is are quality criteria for sources in Wikipedia, that we want to follow to ensure the best possible quality. To start, we will use a whitelist of sources, that have been used in this Wikipedia before, that the editor can select from. We will extend this work by studying better ways of ensuring the trustworthiness of sources, a topic widely covered in research. --Frimelle (talk) 18:51, 20 December 2018 (UTC)
The projects says nothing about server side computations. You should certainly add information about this to the tool description. Ruslik (talk) 18:12, 16 January 2019 (UTC)
Hello Ruslik! We adapted the description of the tool, please see the new Technical Details section. --Frimelle (talk) 15:28, 23 January 2019 (UTC)

FYI feature request Associate red links on Wikibase client wikis with items in the Wikibase repo[edit]

I guess this project will benefit of this feature request T212211 - Salgo60 (talk) 15:08, 21 December 2018 (UTC)

Questions about your budget[edit]

I have a couple questions about your proposal's budget Frimelle and Hadyelsahar.

  • Are the budgeted amounts sufficient for the work and time? The amounts seem a bit low, though I am also unclear where this will happen so they may not be.
  • I do not see any costs for office space or technology. How will these be accounted for?
  • I do not see an budget lines related to the editathons. Will there be costs associated with them?

Thank you in advance for your thoughts on these. --- FULBERT (talk) 20:42, 8 January 2019 (UTC)

Hello FULBERT, thanks for your interest!
  1. The research and development will be additional work to our current work. The developer’s pay is relatively low, but gave a realistic estimate for a non-European developer. We believe it makes sense to collaborate with developers in one of the countries we want to support.
  2. Lucie lives and works in the UK and Germany, Hady lives and works in France. We plan to collaborate online, as we have done in previous projects, that lead to publications, such as the extension of the ArticlePlaceholder [4][5] and TREx for (multilingual) relation extraction. Therefore, we are not in need of an office space. We plan to continue our collaboration remotely. That includes the developer.
  3. We organized meetups before, and usually organized them in collaboration with Wikimedia, such as the Wikidata meetup in London last year. We want to organize future events in colocation with existing Wikimedia events, such as WikiArabic (already accepted talk and discussion) and Wikimania. We believe that this is the best way to collaborate with the Wikipedia community, who are already attending those events and gives us insight into a wide range of international Wikipedians. Further, we have calculated a buffer for possible costs of events. --Frimelle (talk) 22:01, 15 January 2019 (UTC)
Thank you for your replies here concerning your budget Frimelle. --- FULBERT (talk) 17:41, 26 January 2019 (UTC)

Aggregated feedback from the committee for Scribe: Supporting Under-resourced Wikipedia Editors in Creating New Articles[edit]

Scoring rubric Score
(A) Impact potential
  • Does it have the potential to increase gender diversity in Wikimedia projects, either in terms of content, contributors, or both?
  • Does it have the potential for online impact?
  • Can it be sustained, scaled, or adapted elsewhere after the grant ends?
7.4
(B) Community engagement
  • Does it have a specific target community and plan to engage it often?
  • Does it have community support?
7.2
(C) Ability to execute
  • Can the scope be accomplished in the proposed timeframe?
  • Is the budget realistic/efficient ?
  • Do the participants have the necessary skills/experience?
6.8
(D) Measures of success
  • Are there both quantitative and qualitative measures of success?
  • Are they realistic?
  • Can they be measured?
7.6
Additional comments from the Committee:
  • This is clearly in line with the strategy, and, if successful, will have a big impact online as well as being sustainable.
  • The project fits with Wikimedia's strategic priorities and has a significant potential for online impact. However its sustainability and scalability is less clear. The server will need expansive maintenance in the future and the gadget will need periodic updates to keep up with the Mediawiki software.
  • The area of impact is important, for this project will not only create a tool to help fill the content gap in underrepresented areas, but also test using the tool with related content to be created in the process itself.
  • The project fits with the strategic direction of Wikimedia, i.e. knowledge equity. Though this project may not have direct potential impact, the end product will be of exceptional help to drive the online content creation. I very much liked the idea. Though I mostly contribute to English Wikipedia, I’ve closely seen problems faced by Wikimedians from relatively small language communities. Major hindrance occurs in the form of volunteer time and access to sources. I understand that the project addresses both these issues. The gadget will help editors to minimize the time they will spend on developing an article, and thereby resulting in the production of more articles within a given time. Such gadgets will also be useful for new editor recruitment activities. It is sustainable and can be scaled in future.
  • The approach is very innovative. However risks are very high. Basically what they are proposing is a form AI, which I am not sure can be successfully implemented within the limited temporal and financial frame of the project. The success can be measured.
  • This is an innovative project in it will provide new opportunities for involvement and increased usage that will hopefully demonstrate how an identified gap can be filled.
  • The idea itself is innovative. The project participants have done proper research on all the aspects required to complete the project successfully and were successful in identifying all the previous efforts that have been done on these lines. There is always risk involved in software development, but seeing the project plan make me confident that such risks will be efficiently minimised.
  • I'm somewhat concerned that they may be underestimating the development needs and expense.
  • I am not sure that the project goals can be achieved within the 12 months time frame. The budget can be also insufficient. The participants have skills/experience but it is not clear if they are sufficient.
  • The proposal appears doable within the grant period.
  • The project participants are well positioned to deliver the results. The budget is very much reasonable for the resource they are building. While reading the proposal for the first time, I felt the budget is low for the proposed activities, but seeing the proposer’s response on talk page clarified my doubts about that. The project participants also have experience working on similar projects in the past.
  • They've done a good job on planning engagement.
  • The community engagement is sufficient.
  • There seemed a lot of support amongst those in the community, and as this involves both software creation and its testing, that is an important element of this project.
  • A significant level of community engagement is demonstrated in the pre-planning their activities. I very much like their approach to engaging with the community, which is very cost-effective. Instead of making analysis using the available data, they’ve contacted the Arabic Wikipedia community before making an assessment, and this move is much appreciated. The proposer also has been patient to answer the community’s comments elaborately.
  • I do not believe that such fabulous AI like capabilities can be realized within the requested 12 months time frame and within the requested budget. I may be wrong. So, I am willing to support a smaller research project during which the participants can demonstrate the feasibility of their approach by demonstrating the core functionally without actually creating the gadget and involving real editors and communities.
  • Yes, this seems a worthwhile project that has implications far beyond the scope itself.
  • I would funding this project. Though I am a bit unsure about the efficiency of the end product, it is an idea worth investing. If something doesn’t work out well, it can always be improved in the future. Since Lydia is serving as an adviser, I am confident that Wikimedia Deutschland would be happy to support this project wherever required. Eventually, this will be an incredible tool for Wikimedians who are under-resourced.
IEG IdeaLab review.png

This proposal has been recommended for due diligence review.

The Project Grants Committee has conducted a preliminary assessment of your proposal and recommended it for due diligence review. This means that a majority of the committee reviewers favorably assessed this proposal and have requested further investigation by Wikimedia Foundation staff.


Next steps:

  1. Aggregated committee comments from the committee are posted above. Note that these comments may vary, or even contradict each other, since they reflect the conclusions of multiple individual committee members who independently reviewed this proposal. We recommend that you review all the feedback and post any responses, clarifications or questions on this talk page.
  2. Following due diligence review, a final funding decision will be announced on March 1st, 2019.

Questions? Contact us.

I JethroBT (WMF) (talk) 16:42, 6 February 2019 (UTC)

First of all, we would like to thank the committee for the time and effort giving us feedback on the proposal.
We would like to clarify and emphasize that the promised features in Scribe will not rely on black-box AI approaches that might be challenging to reach production level efficiency with such a small team, let alone being sustainable.


To guarantee the execution of our deliverables in time, we are planning to work with algorithms and technologies that are solidly applied and proven easy to implement and to scale. Those algorithms might not be the state of the art according to latest research papers however we believe that they will bring sufficient benefit to the users of our tool, since they have proven industrial success in a variety of tasks such as information retrieval and document summarization. A comparable approach is followed in Strephit for example, which uses in part similar approaches and could show how those can scale to the needs of the Wiki environment.
Investigating a variety of newer and more research-oriented approaches could be an extension of this work, but for the implementation of the tool we will focus on mature technologies.
Considering the sustainability of the Gadget, our approach will be following the standards of existing gadgets in the MediaWiki environment, therefore we believe the code can be updated easily by anyone. Additionally as stated in the proposal, we aim to involve volunteer developers in our work, so we encourage the usage and development of the tool based on their needs. This will also help us to ensure sticking to said standards and guidelines to seamlessly integrate the tool in the community.
Hadyelsahar and Frimelle (talk) 11:47, 15 February 2019 (UTC)

Round 2 2018 decision[edit]

IEG IdeaLab review.png

Congratulations! Your proposal has been selected for a Project Grant.

The committee has recommended this proposal and WMF has approved funding for the full amount of your request, 41,500 EUR / 47,097 USD

Comments regarding this decision:
The committee is pleased to support this development of this content-editing tool to help editors develop new articles in underserved language communities in our movement, especially in cases where translation is not possible. The project fits well within the strategic priorities of the Wikimedia Foundation related to knowledge equity, and appreciated the applicants’ focus on these communities.


Next steps:

  1. You will be contacted to sign a grant agreement and setup a monthly check-in schedule.
  2. Review the information for grantees.
  3. Use the new buttons on your original proposal to create your project pages.
  4. Start work on your project!

Questions? Contact us.


Alex Wang (WMF) (talk) 17:11, 1 March 2019 (UTC)

Credits on the initial idea?[edit]

@Frimelle:@Hadyelsahar:Congratulations on the implementation of the tool. Please have a look here: Grants:Project/Manos Kefalas/Dynamic article structure assistance, which is a derivative of Grants:Project/Rapid/Article template libraries - user sandbox space organization using article libraries like this. The functional pre-designed article skeletons tool has been presented and spread since WMCON 2018. In 2019, predesigned article skeleton libraries helped disabled secondary education students become article creators in Greek Wikipedia. A "dynamic library" with "a section suggestion and selection mechanism that can be applied to both new and existing articles" will have an amazing impact.   ManosHacker talk 22:48, 24 October 2019 (UTC)