Jump to content

User:Htriedman.research/Grokipedia Comparison

From Meta, a Wikimedia project coordination wiki
Created
12 November 2025
Collaborators
Alexios Mantzarlis
Duration:  2025-10 – ??
Grokipedia, citations, article similarity

This page documents a research project in progress.
Information may be incomplete and change as the project progresses.
Please contact the project lead before formally citing or reusing results from this page.


This page represents an on-wiki artifact about the paper "What did Elon change? A comprehensive analysis of Grokipedia," (to come add link) by Harold Triedman and Alexios Mantzarlis. The paper represents a first attempt to systematically characterize the differences between Grokipedia and Wikipedia as a whole.

Background

[edit]

On 27 October 2025, Elon Musk launched Grokipedia as an AI-powered alternative to Wikipedia. The tech billionaire had previously attacked the crowdsourced encyclopedia as an "extension of legacy media propaganda"; he vowed that Grokipedia's goal would be "the truth, the whole truth and nothing but the truth."

At launch, Grokipedia was composed of 885,279 entries. Media reports and spot checks of individual entries found that Musk's encyclopedia appeared to be more opinionated than English Wikipedia, adding editorial slants that appeared to align with Musk's own political views. We build upon Taha Yasseri's recent review of the 1,800 most-edited articles on English Wikipedia that also have entries on Grokipedia[1], which found that the latter appears to be highly derivative of the former but "privileges fluency and narrative over attribution."

Methodology

[edit]

Between 28 and 30 October 2025, we were able to successfully scrape nearly the entire Grokipedia corpus (883,858 of 885,279 articles, or 99.8%). By reviewing the text and citation similarity of this corpus with its Wikipedia equivalent, we seek to provide more data to determine whether Musk's product is in fact a "synthetic derivative," an ideological project, or something else altogether.

Findings

[edit]

Content and licensing

[edit]

The initial release of Grokipedia is a tale of two licenses:

  • 56% of Grokipedia's content is basically derivative of English Wikipedia, porting over the latter's Creative Commons (CC) Attribution license. Grokipedia's CC-licensed articles have, on average, a 90% similarity to their corresponding Wikipedia article.
  • 44% of Grokipedia's content is licensed according to xAI's November 2025 Terms of Service (the same way as the outputs of the Grok chatbot). These non-CC-licensed articles are notably less similar to their corresponding English Wikipedia articles (77% similarity).

Source citations

[edit]

At a high level, the two services base their encyclopedic articles frequently on the same sources, with 57 shared domains among the top 100 most-cited sources across the two encyclopedias. However source composition differs: Grokipedia leans more heavily on academic sources, government sources, and user-generated content compared to Wikipedia's relative over-reliance on news websites.

Article length and verbosity: Grokipedia articles are much longer and more verbose than Wikipedia, and cite twice as many sources. This comes at the expense of source quality. Using source categorizations for domains determined by English Wikipedia editors and domain reliability scores from external research, we find that Grokipedia relies on a lower share of reliable sources and a higher share of unreliable sources than English Wikipedia.

Reliability differences: The dichotomy between CC- and non-CC-licensed content is also visible in source citations. Non-CC-licensed articles on Grokipedia are 3.2 times more likely than the same articles on Wikipedia to contain a citation that the English Wikipedia community has deemed "generally unreliable" and 13 times more likely to contain a "blacklisted" source. It is noteworthy, even if the number of citations is trivial as an overall share of sources, that Grokipedia includes 42 citations for Nazi website Stormfront and 34 for conspiracy website InfoWars, compared to none for both on English Wikipedia.

Article subsets

[edit]

We analyzed three article subsets:

  • Elected officials – Derived from Wikidata queries for US Congresspeople and UK MPs. Showed less similarity between their Wikipedia version and Grokipedia version than other pages.
  • Controversial topics – Derived from the English Wikipedia en:Category:Wikipedia controversial topics. Also showed less similarity than non-controversial topics between versions.
  • Random subset – A random subset of 30k Grokipedia articles, for which we queried the WMF article topic and article quality models. Illustrated that Grokipedia focused rewriting the highest quality articles on Wikipedia, with a bias towards biographies, politics, society, and history.

Data release

[edit]

In conjunction with this paper, we release a full, structured scrape of the content of Grokipedia, as well as embeddings of Grokipedia chunks. These datasets are available for download on HuggingFace,[2] and our analysis code is available on GitHub.[3]

References

[edit]