Grants talk:IdeaLab/Searching for out-of-date information in wikipedias

From Meta, a Wikimedia project coordination wiki

Identification of the dates text added as an additional input[edit]

这个工作没有用。单纯从文本是没办法分辨出哪些信息过时,哪些没有过时。举个例子,如果一个条目里面写了XX足球队的现任队长是YYY,如果你不去查可靠来源的话,单看文本是没有办法断定该信息是否过时的。即便是看上去一定会过时的信息,比如“截至2006年”。如果是一家运转良好的上市公司,那么“截至2006年的规模”是过时信息。如果是一个2007年1月倒闭的公司,那么“截至2006年的规模”一定不过时。但是如果是发生在2004年的一场普通比赛,“截至2006年共有XX人收看”呢?有可能过时,有可能不过时,因为依照常理推断,一场普通比赛在比完之后的好几年里面基本上就没人再关心了,但是好几年是几年呢?需要做实验验证。朴素的模式匹配、正则表达式匹配乃至自然语言处理没办法分辨这些细微的差别。另外你如果要做简单模式匹配不用爬dump,直接搜索就可以了,比如[1]Antigng (talk) 05:38, 29 March 2016 (UTC)[reply]

One subsidiary tool for realizing these functions is an API called WikiWho[2]. This API is developed to determine each word's revision time with a high speed and accuracy. This tool will partly solve the problem you brought about. For example, if we discover one reference for a piece of news in 2009 but the person who added this reference edited the related words in 2015, then we can get a preliminary conclusion that this reference is valid at least in 2015. Although this conclusion isn't reliable on every occasion, we can build further research on it. Li Linxuan (talk) 03:18, 30 March 2016 (UTC)[reply]
你的目标是找出过时的信息,不是不过时的信息。且不论“一个用户在2015年编辑了一个2009年参考文献附近的文字,那么这个参考文献就不过时”很不靠谱。(你自己去看一下,有些用户把一大段文字所涉的参考文献摆在章节末尾甚至全文末尾,这时如果有用户编辑了末尾附近的文字,就没办法推断出这些参考文献不过时)。就算排除了某些不太可能过时的文献,那也不意味着剩下的都是过时、需要检查的。--Antigng (talk) 07:36, 30 March 2016 (UTC)[reply]
Direct use of WikiWho is to filtrate the information in the first round. We have considered the possibility that contributors add out-of-date references at certain times, but the proportion must be got by building and testing the bot. This software proposal provides a test bed with both simple and the most advanced known techniques that anyone can expand and use and refine with new methods. Also, we can identify whether it's the same user to edit the words and the references attached to them. Filtration of a proportion of in-date information can help us focus on the problematic part so the work is worthy of being done. Li Linxuan (talk) 16:34, 30 March 2016 (UTC)[reply]

Community notifications[edit]

Hi Li Linxuan and Jsalsman, thanks for the hard work you've put into this proposal! It's interesting to see your ideas on how to automatically surface instances where article content is likely to be out of date. I wanted to suggest some other areas you could consider notifying to get additional feedback on your proposal:

  • The village pumps on en.wiki and zh.wiki
  • Talk pages for Did you Know, Good Article review (en, zh), and Featured article review (en, zh) project pages
  • WikiProject Guild of Copyeditors. If there are projects on zh.wikipedia related to copyediting, I would recommend letting them know about this project, as this tool could make copyediting easier.

Please feel free to contact Mjohnson (WMF) (the program officer for Individual Engagement Grants) or myself if you have questions about the grant process, or if you want to talk about your grant proposal. I JethroBT (WMF) (talk) 00:01, 31 March 2016 (UTC)[reply]

Thanks, JethroBT. Li Linxuan, please do that, and add the URLs for those discussion sections to Grants:IdeaLab/Searching for out-of-date information in wikipedias#Community notification. Also would you please look through Category:IdeaLab members with community organizing experience for people who might be able to "volunteer" and/or find other volunteers to help you look for numerous examples of out-of-date facts to increase your training set for the machine learning approaches, and look through Category:IdeaLab members with research experience to try to find someone who can give you some third-party objective help with measurements of your effectiveness testing? Jsalsman (talk) 02:56, 31 March 2016 (UTC)[reply]
@Jsalsman and Li Linxuan: One other place to find researchers that you could try is the wiki-research mailing list. I JethroBT (WMF) (talk) 17:19, 31 March 2016 (UTC)[reply]
Please note that James Salsman is currently indefinitely blocked on meta for using sockpuppets to run unapproved surveys, and is also banned on enwiki. 69.46.0.196 22:09, 3 April 2016 (UTC)[reply]
I have requested an appeal or failing that, a reprieve on Meta, and have been in email communication with the enwiki arbcom concerning my ban there. I would be glad to discuss the details with any administrator by email to jim(_AT_)talknicer.com. Jsalsman (talk) 03:56, 4 April 2016 (UTC)[reply]

April 12 Proposal Deadline: Is your project ready for funding?[edit]

The deadline for Individual Engagement Grant (IEG) submissions this round is April 12th, 2016. If you’ve developed your idea into a project that would benefit from funding, consider applying!

To apply, you must (1) create a draft request using the “Expand into an Individual Engagement Grant” button on your idea page, (2) complete the proposal entirely, filling in all empty fields, and (3) change the status from "draft" to "proposed." As soon as you’re ready, you should begin to invite any communities affected by your project to provide feedback on your proposal talk page.

If you have any questions about IEG or would like support in developing your proposal, we're hosting a few proposal help sessions this month in Google Hangouts:

I'm also happy to set up an individual session. With thanks, I JethroBT (WMF) 00:38, 2 April 2016 (UTC)[reply]

User blocked[edit]

Please note, the user listed as "advisor" on this project, Jsalsman, has been banned on English Wikipedia and here on Meta for many years. I have blocked this most recent sockpuppet account. -Pete F (talk) 19:05, 6 April 2016 (UTC)[reply]

@Peteforsyth and Li Linxuan: Thanks Pete. Li Linxuan, it is up to you whether you feel Jsalsman is an appropriate advisor for your project given their background in creating multiple accounts over an extended period and other conduct issues, which are documented in these places:
If James will be advising this project, I think it will be important to clarify their exact role. There is reasonable cause for concern when an editor who is blocked on meta and banned on en.wiki has any official involvement with a grant. I JethroBT (WMF) (talk) 17:55, 7 April 2016 (UTC)[reply]


Eligibility confirmed[edit]

This Individual Engagement Grant proposal is under review!

We've confirmed your proposal is eligible for review and scoring. Please feel free to ask questions and make changes to this proposal as discussions continue during this community comments period (through 2 May 2016).

The committee's formal review begins on 3 May 2016, and grants will be announced 17 June 2016. See the round 1 2016 schedule for more details.

Questions? Contact us at iegrants(_AT_)wikimedia · org .

--Marti (WMF) (talk) 04:42, 28 April 2016 (UTC)[reply]

Aggregated feedback from the committee for Searching for out-of-date information in wikipedias[edit]

Scoring rubric Score
(A) Impact potential
  • Does it have the potential to increase gender diversity in Wikimedia projects, either in terms of content, contributors, or both?
  • Does it have the potential for online impact?
  • Can it be sustained, scaled, or adapted elsewhere after the grant ends?
5.3
(B) Community engagement
  • Does it have a specific target community and plan to engage it often?
  • Does it have community support?
5.1
(C) Ability to execute
  • Can the scope be accomplished in the proposed timeframe?
  • Is the budget realistic/efficient ?
  • Do the participants have the necessary skills/experience?
4.6
(D) Measures of success
  • Are there both quantitative and qualitative measures of success?
  • Are they realistic?
  • Can they be measured?
4.0
Additional comments from the Committee:
  • Yes, finding and dealing with hoaxes and inaccurate information is clearly within our priorities, and it is part of everyday life for all of the projects in the Wikiverse. That said, this proposal does not offer anything useful that would add to our current list of review tools in such a way that it could improve our current suite of tools for this.
  • Automated quality checking tools are an interesting area of research with significant theoretical potential for impact on content quality. However, the proposal does not present a credible plan for developing such a tool.
  • Interesting, but afterwards someone should fix outdated data.
  • Good to have those abilities. Quite hard to do NLP across many different languages. I would hope this can be a cornerstone of other wikis.
  • The proposal does not include well-defined quantitative measures of success.
  • This proposal deals with fairly advanced technologies. I am a bit worried about the schedule given that complexity. Measurements may be hard to see for the community, but would make sense to developers.
  • The proposal does not present a credible plan or a compelling case that the team has the understanding and experience necessary to execute it.
  • Strong backgrounds for AI. Mentors. Schedule is pretty tight at a glance.
  • Seems to have some support, but is based mostly on the Chinese Wikipedia, which suffers most from the "Great Firewall" as a deterrent to attracting contributors. Historically, the biggest defense against vandalism, outdated information, and hoaxes is "more eyes" and this proposal may not address that.
  • It's specified towards Chinese Wikipedia, but it's not really related to Chinese Wikipedia. The project itself doesn't offer a way to communicate that information to the community. If you are targeting the Chinese Wikipedia, the applicant should try Village Pump for input. The project could be hard to identify other languages. But it could have potential to be built upon later on.
  • Lack of community engagement or reasonable budget. The project needs more details and explanation.
  • I suppose if the proposal could show how this could improve on current efforts such as the periodic roundup of "articles of people over 100 years of age", or "places in non-existing nation-states" etc, then I might be inclined to support it, but currently I don't see the benefit to the movement.
  • Interesting the problem, but in my opinion the use of a bot is an old solution. The best would be a centralized data like done by Wikidata.
  • Suffice it to say that I think the experience with the tool is lacking; the community engagement is underdeveloped, and the schedule is overly optimistic.
  • Budget is unclear to me, as there is no breakdown. While the project is certainly interesting it feels incomplete. There are many technical terms from Artificial Intelligence that I am familiar with but I do not see a plan for execution / software development. It is unclear in what way existing tools are insufficient and it is also unclear what the finished product would do that existing tools cannot. In this case the advisor is indefinitely banned which is not in any way the fault of the proposer. However, he is still mentioned as a member of the team. I am unsure how this would work out.

-- MJue (WMF) (talk) 17:42, 3 June 2016 (UTC) on behalf of the IEG Committee[reply]

Wikidata[edit]

The proposed method is wrong. Facts and figures which quickly become outdated must be stored on Wikidata and transcluded from there. Nemo 10:09, 5 June 2016 (UTC)[reply]

Round 1 2016 decision[edit]

This project has not been selected for an Individual Engagement Grant at this time.

We love that you took the chance to creatively improve the Wikimedia movement. The committee has reviewed this proposal and not recommended it for funding, but we hope you'll continue to engage in the program. Please drop by the IdeaLab to share and refine future ideas!


Next steps:

  1. Review the feedback provided on your proposal and to ask for any clarifications you need using this talk page.
  2. Visit the IdeaLab to continue developing this idea and share any new ideas you may have.
  3. To reapply with this project in the future, please make updates based on the feedback provided in this round before resubmitting it for review in a new round.
  4. Check the schedule for the next open call to submit proposals - we look forward to helping you apply for a grant in a future round.
Questions? Contact us.