WikiCred/2022 CFP/Veri.FYI - A Scalable Fake News Detection Service

Veri.FYI - A Scalable Fake News Detection Service
Veri.FYI - A Scalable Fake News Detection Service
A WikiCred 2022 Grant Proposal
Project Type	Event
Author	Chris Rusnak, Valerie Miller, Karen Minott, Tianqi Luo; (localmotion127)
Contact	chris@pressdb.info
Requested amount	$10,000 USD
Award amount	Unknown

Note: This application was originally submitted via this Google Form due to a technical issue with the grant portal; this page was generated on behalf of the applicant by a member of the WikiCred team.

What is your idea?

We are PressDB, an early-stage, non-profit startup that is building Veri.FYI, an open source service for journalists and the general public to detect fake news websites more quickly, accurately and at scale. Veri.FYI is a machine learning-based platform that will take an article URL as input and will output a reliability measure along with a brief explanation of why that measure was given. Veri.FYI will take two forms: a web platform with an intuitive user interface that anyone can use on single website pages; and an API that can be integrated into Wikimedia for ease of scalability on multiple websites for Wikimedia users. To help develop this service, we will conduct user research interviews and workshops with a variety of stakeholders in and out of the Wikimedia community (editors, readers, journalists, policy experts, etc.).

Why is it important?

Journalism is in crisis. Fake news is free while legitimate news is often behind a paywall. Deepfakes and natural language generation lower the cost of entry for creating fake news. Journalists have to spend more time debunking false statements, which becomes overwhelming as the amount of fake news increases exponentially. They already spend anywhere from 15 minutes to 2+ days per claim on fact-checking [1]. Moreover, they are also facing the societal impact of fake news: harassment; bodily harm; a further loss in quality journalism; a continued erosion in public trust and literacy; a breakdown in democratic institutions; and global threats to human health.

[1] https://web.archive.org/web/20210306163603/https://rc.library.uta.edu/uta-ir/bitstream/handle/10106/26136/HASSAN-DISSERTATION-2016.pdf?sequence=1&isAllowed=y

Link(s) to your resume or anything else (CV, GitHub, etc.) that may be relevant

https://bitbucket.org/pressdb/

Is your project already in progress?

Yes. We have launched a prototype as a web platform and are testing it with a handful of users. Our prototype is available here: https://www.pressdb.info/work/veri-fyi. The prototype is currently limited to the following scope: articles that are written, in English, focused on U.S. news, are intentionally misleading, and are not breaking news.

How is this project relevant to credibility and Wikipedia?

More fake news websites appear than journalists and researchers can reasonably keep track of them. Veri.FYI could be helpful for Wikipedia editors to add sites to lists such as the Perennial Sources (https://en.wikipedia.org/wiki/Wikipedia:Reliable_sources/Perennial_sources#Generally_unreliable) or the list of fake news websites (https://en.wikipedia.org/wiki/List_of_fake_news_websites) automatically, and to prune potentially unreliable citations at scale. Veri.FYI can also be relevant to readers to understand why a given source may or may not be reliable.

What is the ultimate impact of this project?

We hope that journalists and researchers can use our service to fact-check more claims from fake news websites accurately in less time and less pressure, saving news organizations more time and funds. Ideally, this will help reduce the prevalence and influence of online misinformation.

Can your project scale?

Yes. We need a large, diverse amount of data across multiple dimensions (with respect to time, language, type of fake news, degree of veracity, modality, etc.) to train effective models for fake news website detection across different scenarios.

Why are you the people to do it?

Chris is a data scientist who has worked on several data journalism projects and has experience with natural language processing and machine learning, the technology needed to develop this service. Karen is a subject matter expert in writing and IT project management. Valerie has experience founding, managing and advising multiple tech startups and has done extensive work in creating Software-as-a-Service platforms. Tianqi has love and passion for writing and strongly believes in the important life of truth in journalism. Together, we are all passionate about tech, journalism and their combination, all while respecting privacy, security, fairness and sustainability.

What is the impact of your idea on diversity and inclusiveness of the Wikimedia movement?

Veri.FYI can have an impact on diversity and inclusiveness of Wikimedia at two levels. Within large projects such as Wikipedia English, having a diverse dataset for building a reliable fake news website detection service could help communities who are disproportionately impacted by mis/disinformation or by certain types of mis/disinformation. In addition, even though Veri.FYI is intended for English language works at this time, it could help smaller Wikipedia projects in other languages in the future if and when sufficient resources are available to assist Veri.FYI's compatibility with those languages. In both cases, our project can help fill the needs of communities in Wikimedia that do not have enough resources to mitigate mis/disinformation.

What are the challenges associated with this project and how you will overcome them?

challenges and risks associated with a fake news website detection service, including, but not limited to, the following:

• Limitations of data available for developing/refining such a model.

• False positives: incorrectly classifying a legitimate article as fake, implying censorship of true information.

• False negatives: incorrectly classifying a fake article as legitimate, implying the proliferation of false stories.

• The instability of the fake news landscape, which is constantly changing tactics, actors, etc.

To mitigate these risks, we plan to:

• Open source our data and methodology to increase transparency and accountability.

• Include a variety of stakeholders (in journalism, policy, data/computer science, ethics, etc.) as advisors and user research participants for our service.

• Hold regular audits and updates of these models. How much money are you requesting? Please specify your currency. A range is okay.

How will you spend your funds?

- Cloud computing: Model development for Veri.FYI ($5,000); API development for Veri.FYI ($3,000)

- UX research participant compensation ($2,000)

How long will your project take?

1 year

Have you worked on projects for previous grants before?

No