What is the problem you're trying to solve?
In Wikipedia there is a lack of a central mechanism/tool/policy for matching contributors with articles needing editing. According to official statistics of Wikipedia pageviews in March 2016 the total sum of visit was 15,274 Millions. While there is a wide range of applications that are built to run on Wikimedia’s API and provide valuable information for Wiki data only but less of them focus on the user-contributor.
What is your solution?
Instead of a simple development of an application that will highlight the Wikipedia pages with cleanup issues, our goal is to attract users, who may be interested in editing specific pages, matched optimally to their skills and/or interest, in an intelligent manner.
Briefly, for our purpose, we need to implement the following steps:
- Create a new button next to the log-in button of Wikipedia that will ask the user to create his/her skills’ profile. By clicking the new button, the user will be transferred to the page of our application and prompted to select a new option or button to create his/her own profile. The profile will be created either by filling and submitting a form or altrnatively by the user’s LinkedIn profile.
- Create a Wikipedia collector using the wikimedia API and dataframes (data structures for statistical analysis by the statistical language R), based on the Template:Cleanup or Category:All articles needing cleanup. The list of Wikipedia:Cleanup issues includes: page layout, wikification, spelling, grammar and typographical errors, tone, and sourcing issues.
- Conduct statistical analysis and recommendation process using as input the data from the previous step in order to match users’ interests with Wikipedia’s pages needed cleanup. Depending on Wikipedia policies we will provide the option to user to receive e-mails with new pages that need editing based on his/her own skills/interests.
- Develop a user-friendly interface in order to provide users with the recommendations results.
The ultimate goal of the project is to attract more contributors and support and facilitate them through a simple, quick and intelligent way to find Wikipedia articles that need editing. More specifically:
- By focusing on users’ skills we expect to provide editing recommendations that fit better to user’s cultural, educational and intellectual background.
- By attracting more contributors to targeted articles, the editing issues will be addressed quickly and effectively, resulting in their significant reduction. Another impact will be the quality of articles which is expected to be improved since more enriched, updated and valid articles will be created.
- Through descriptive statistics we will maintain an overview for the editing volume, in order to monitor whether users through our application edit more articles and identify directly the categories that do not attract users’ interest for editing.
- By creating intelligent recommendations based on user’s skills and Wikipedia data, using mathematical concepts such as similarity (or closeness) functions. The selection variables will be based on user’s skills, Wikipedia category preferences and cleanup options (page layout, wikification, spelling, grammar and typographical errors, tone, and sourcing).
Our project will be an online recommendation tool aiming to simplify the search for Wikipedia pages that need editing.
Out team members will spend their time in the following activities:
- Development of a submission form to identify user’s skills and interests based on Wikipedia topics.
- Investigation regarding the implementation of the LinkedIn log-in option
- Development of an online webpage for the previous option
- Conducting tests during a test period with volunteers who will provide feedback for the use of webpage of activity 3.
- Investigation of the queries that should be addressed to Wikimedia API.
- Collection of Wikipedia pages that need editing.
- Development of dataframes for statistical analysis in R for Wikipedia data and user’s skills for normalization purposes.
- Definition and implementation of descriptive statistics (tables, charts and metrics)
- Statistical analysis and intelligent recommendations using the results of the analysis
- Design and development of user interface
- Conduct tests during 2 testing periods
In order to test our solution we will perform test during 2 testing periods:
- The first testing period will involve a random testing with temporary accounts, in order to test our system’s performance and functionality issues. Creating a number of test accounts, which will be deleted after the end of testing, we will examine all skills/options that we will set for recommendations. This testing cycle will take place at the start of the 4th month of our project and it will be completed after two weeks.
- The second testing period will involve a user profiling testing. More specifically, we will invite students from Aristotle University of Thessaloniki and the “Greece Creative Commons” open source community to participate to our dissemination action and test our application. This testing cycle will start in the start of the 5th month and be completed after 3 weeks.
After the end of project our purpose is to write a scientific paper for the idea, the methodologies and the results of our research and implementation. The paper will be submitted to a conference, preferably related to open source.
- User: Mpapo, (Maria Papoutsoglou), User 2: Will be recruited.
- Estimated Total for 2 developers 13,228.48 USD
- Statistical analysis scientific coordinator
- User:Komagaeshi (Lefteris Angelis)
- Estimated Total 7,174.17 USD
- Software Engineering scientific coordinator
- User:IoannisStamelos (Ioannis Stamelos)
- Estimated Total 7,174.17 USD
During several different stages of our project we will involve the community in different ways.
- At the duration of the second month we will make available to a group of volunteers the web page with skills interface in order to test it and give us feedback for improvements.
- At the duration of the fourth month, we will make 2 dissemination actions in order to inform Greece Creative Commons open source community and students of Aristotle University of Thessaloniki for our application and ask them to test it.
- Another dissemination action will be to create social media accounts (i.e. a Facebook group and a Twitter account) for publicizing our project and actions.
We consider our project as a research result with scientific and practical value for further research and educational purposes, so we are interested to strongly support it after the funding period. More specifically:
- Taking into consideration that the project is a recommendation system that will work automatically after the end of the project, the website will continue to generate updated recommendations for Wikipedia pages to users. Our academic status is a guarantee for a continuously active body of users (students, teachers, researchers) that will work with it for educational or research reasons.
- The code of the application will be available under an open source license. We expect the project to attract the interest of developers and contributors who potentially can extend it or improve it for specific or general purposes.
- By enabling the LinkedIn option for users, we will able to create an enriched knowledge base regarding multidimensional skills, which can be used in the future for further research, as extension of the proposed project.
Measures of success
During the development of our project we plan to conduct a testing phase with participants from Aristotle University of Thessaloniki and Greece Creatavive Common community, as we described earlier. Our measure of success is to attract at least 100 volunteers/testers.
After the end of our project the measures of success that will be used to see if the project is successful are:
- We aim to attract 500 pageviews per month in our website for the first 3 months of the project.
- We will give the user the chance to rate our system using a scale from 1 to 5. The rating questions will be brief and simple for user’s experience with the app. As a threshold we aim to collect a rating sum with a minimum threshold 4 for 200 users during the first three months after the end of the project.
- We will create a sub pages in our site with statistics for how many users used our application and edited articles through recommendations in a period of 3 months after the end of the project. As measure of success we aim to have at least 150 users in a 3 month period.
- We will provide an opportunity to users to receive e-mails from our application with news for new Wikipedia articles needing editing. We aim to have at least 150 request for e-mails notifications for the first three months after the end of project.
- We will measure the novelty of the idea and the approoach and also the success of the project by submitting one scientific paper to a well reputed International Computer Science conference in order to receive scientific feedback and approval through publication.
Software Engineering scientific coordinator: Dr Ioannis Stamelos (User: IoannisStamelos). He is a Professor of Software Engineering at the School of Informatics, Faculty of Sciences, AUTH. He has managed or partecipated in 30 research and development projects related to information systems. He researches and supports actively open source software and open technologies in general, and he is member of the Board of Directors of the Greek Free/Open Source Software Organization (GFOSS).
Statistical analysis scientific coordinator: Dr Lefteris (Eleftherios) Angelis (User:Komagaeshi) He is currently an Associate Professor at the School of Informatics, AUTH. His subject area is Statistics and Information Systems. His research is is focused on the development and application of statistical methods and models for analyzing data from the fields of Information Systems and Software Engineering, especially for the role of the human factor. He also works on applications of statistical methods to knowledge discovery from Web data, distributed data bases and biomedical documents. He has participated in various research projects including two recent related to competence profiling and management.
Developer 1: Maria Papoutsoglou (User:Mpapo). She is an associate research and developer in STAINS group. In the past, she has worked in the Europass project infrastructure, at the European agency CEDEFOP, and for SEN2SOC experiment under SmartSantander FP7. Also, she has worked as a technical and data analyst. She holds an MSc in Information Systems from AUTH and a Bsc in Information Technology.
Developer 2: A colleague from the School of informatics, AUTH, who will be recruited when the project will be taken over.
Please paste links below to where relevant communities have been notified of your proposal, and to any other relevant community discussions. Need notification tips?
- Through social media accounts, as we described earlier.
- Wiki-Research mailing list.
Do you think this project should be selected for an Individual Engagement Grant? Please add your name and rationale for endorsing this project below! (Other constructive feedback is welcome on the discussion page).
- Good idea. From the little experience I have with wikipedia project, the existing recommends are random. Chelhel (talk) 11:20, 16 April 2016 (UTC)
- It sounds interesting, The idea is very well written and it seems feasible since the team will follow all the necessary steps. Congratulations for your work till now, please keep it up! Nonparalog (talk) 10:31, 28 April 2016 (UTC)
- Very nice and useful idea. It will definately improve wikipedia content. Well-structured team and work plan. I hope the project will be implemented.Nanaeng
- I agree with the problem structure within the wikipedia platform as well as with the solution suggested. I also find the solution appealing and really interesting if developed for this platform. It should also help renew and edit information given with much more efficiency and has so many advantages if used. 220.127.116.11 16:18, 1 May 2016 (UTC)