Jump to content

Research talk:Understanding Wikidata's Value/Work log/2017-09-02

Add topic
From Meta, a Wikimedia project coordination wiki

Saturday, September 2, 2017

[edit]

The following are our hypotheses related to how Wikidata predicted entity quality and page views (in client pages) relates with edit types (types being bot, semi-automated, logged-in user, and anon). We use ORES for predicted quality which has a scale of E to A where A is the best quality. We use the Perfect Alignment Hypothesis to derive 5 classes of page view counts.

Hypothesis 1: When the entity quality class is fairly low (D or C) and the entity's views are in a lower class than the quality, bots and semi-automated edits are higher in proportion compared to aligned entities of the same quality. Logged-in user and anon edits are thus lower in proportion.

Rationale for Hypothesis 1: Bots create misalignment by making poor quality, low-viewed entities into somewhat better quality.


Hypothesis 2: High quality class (B or A) aligned data has a higher percentage of human (logged-in user and anon) edits compared to lower quality class aligned data. Bot and semi-automated edits are thus lower in proportion.

Hypothesis 3: High quality class (B or A) misaligned data has a higher percentage of human (logged-in user and anon) edits compared to lower quality class misaligned data. Bot and semi-automated edits are thus lower in proportion.


Rationale for Hypotheses 2 and 3: It takes human (logged-in user and anon) edits to get an entity to become really high quality.