Jump to content

Research:Implementing a prototype for Automatic Fact Checking in Wikipedia

From Meta, a Wikimedia project coordination wiki
Tracked in Phabricator:
Task T284158
12:23, 28 January 2021 (UTC)
Duration:  2021-June – 2022-January
Fact Checking, API, NLI, NLP
This page documents a completed research project.

The aim of this project is to implement a software in the form of an open API that will automatically perform a facts validation process. In Natural Language Processing (NLP) that task is called Natural language inference (NLI), where a claim is compared with a reference to determine whether it is correct, incorrect, or unrelated. The solution should be not only precise, but also be as fast as possible in order to meet production requirements. The output of this project will be a first prototype of an Automatic Fact Checking API, running on one cloud VPS instance.


We present a new fact-checking system, the WikiCheck API, that automatically performs a facts validation process based on the Wikipedia knowledge base. It is comparable to SOTA solutions[1] in terms of accuracy and can be used on low-memory CPU instances. It uses Flair[2] ner-fast fast model and Wikimedia API python wrapper[3] for related articles search. After that, is uses sentence-based NLI model for relation classification. Finally, it uses aggregation logic to provide the final label along with related evidences.

System Architecture[edit]

WikiCheck system architecture

The application reproduces the human way to do the fact-checking process. In such a formulation, the initial input is usually a claim, which is the piece of text, that should be checked. The output is the predicted label (one of the SUPPORTS, REFUTES, NOT ENOUGH INFO) and evidence. The application is decomposed into three significant parts of the candidates selection model (Model level one) and NLI classification model (Model level two), and Aggregation level.

The first level is using Wikimedia API and Flair ner-fast model. Wikimedia API performs a full-text search through the whole Wikipedia index. Although we cannot influence the Wikipedia search engine, which is a significant limitation of our approach, we can improve the query itself, which showed an excellent boost for our task. NER model goal is to find out named entities in the claim used to make additional queries using just found keywords boosting system performance.

Model level two is the Natural language inference (NLI) model that is making three classes classification. It aimed to define exact sentences in predefined candidates' articles that are evidence of the correctness or wrongness of a given claim. It has three possible outputs: SUPPORTS, REFUTES, and NOT ENOUGH INFO. It uses a sentence-based model that allows batch processing of hypotheses, boosting system performance.

NLI model

The general idea of the presented NLI model is a Siamese network using a BERT-like model as a trainable encoder for sentences. It exploits mean pooling as a method to create sentence embeddings out of token ones. Given sentence embeddings for claim and hypothesis, the system takes their values and absolute difference as input features for the classification layer. The output is the probability of a pair of claims and hypotheses belonging to one of the three classes. The general output of level two in terms of the system is a list of all sentences—candidates to be the evidence—along with three classes of probabilities and cosine similarity between the claim and hypothesis.

The aggregation level (optional) is a classification model that aims to define the final label for the given claim. It takes the output of Model level two and returns only one table for the given claim and top-5 hypotheses supporting or refuting the given claim.

API Description[edit]

WikiCheck API has three main endpoints.

  1. nli_model - the pure NLI model that gets claim and hypothesis and returns the probability of SUPPORTS, REFUTES, NOT ENOUGH INFO classes of their relation.
  2. fact_checking_model - the endpoint that gets a claim, performs the search for hypothesis in Wikipedia database and returns the relation features between each claim-hypothesis pair that can be used for custom aggregation logic.
  3. fact_checking_aggregated - the complete end-to-end solution that provides the label SUPPORTS, REFUTES, NOT ENOUGH INFO for only given claim along with a list of five found evidence.
WikiCheck API endpoints
Endpoint name Input fields Request example Output model
nli_model claim, hypothesis https://nli.wmcloud.org/nli_model/?hypothesis=Today%20is%20Monday.&claim=Yesterday%20was%20Sunday
fact_checking_model claim https://nli.wmcloud.org/fact_checking_model/?claim=The%20Earth%20is%20flat
fact_checking_aggregated claim https://nli.wmcloud.org/fact_checking_aggregated/?claim=The%20Earth%20is%20flat

Implementation and Installation details[edit]

The structure of the project:[edit]

The code for the API and models can be found on official WikiCheck repo

The project consists of:

  • modules directory with the implementation of modules used for inference along with the script for NLI models training. link to modules.
  • The configs directory includes configuration files for training and inference. link to configs.
  • The notebooks directory includes .ipynb notebooks with experiments done during the research. link to notebooks.

We use DVC with Google drive remote for efficient model version control. If you want to get access to our fine-tuned models, you can load them from google drive

Also, you can train your model by running the


All the data used for the models training can be reached via google drive. Also, some examples of custom training strategies, data preprocessing and observation, training the optional aggregation stage can be found in GitHub notebooks section

API setup and run:[edit]

  1. Clone the official WikiCheck repo and cd into it
    git clone https://github.com/trokhymovych/WikiCheck.git
    cd WikiCheck
  2. Create and activate virtualenv:
    virtualenv -p python venv
    source venv/bin/activate
  3. Install requirements from requirements.txt:
    pip install -r requirements.txt
  4. Load pretrained models. There are two options:
    1. Loading models with DVC (preferred):
      dvc pull
    2. Loading models from google drive
  5. Run the API:
    python start.py --config configs/inference/sentence_bert_config.json


  1. FEVER Shared task 2018 leaderboard
  2. Flair Github repo
  3. Pymediawiki Github repo