Research:External Reuse of Wikimedia Content/Wikidata Transclusion/Examples/Article Sample

From Meta, a Wikimedia project coordination wiki
Jump to navigation Jump to search

This random sample of 100 articles on English Wikipedia was qualitatively coded by me to catalog the most common types of transclusion that happened on English Wikipedia, how these instances were recorded by the wbc_entity_usage table, and guide implementation of automated analyses of transclusion on English Wikipedia. The sample was generated by choosing a random float in [0-1] and selecting the next 100 articles (namespace=0, not a redirect) with a page_random greater than that random float.

For comparison, as of July 2020, from analysis of the wbc_entity_usage table, we know that 99.9% of English Wikipedia articles have an associated Wikidata item and 61.95% of English Wikipedia articles have some evidence of Wikidata transclusion.

Results[edit]

Out of the 100 articles that I evaluated:

  • All 100 had an associated Wikidata item `S` aspect on wbc_entity_usage table for that article's Wikidata item) -- this aligns with the 99.9% statistic from above.
  • 59 had aspects other than sitelinks in the wbc_entity_usage table, suggesting possible Wikidata transclusion in the article -- this aligns within a reasonable margin of error with the 62% statistic from above.
  • Out of these 59 with potential transclusion:
    • 2 (3%) were at best high-importance:
      • Specifically, they added facts to an infobox (t1; t2) though in practice t1 just inserted citations and did not actually set values.
    • 3 (5%) were at best medium-importance:
    • The remaining 54 (92%) were at best low-importance:
      • 21 (36%) of these were metadata templates: Authority Control (11), Taxonbar (9), and Infobox_anatomy (1), which does add values to an infobox but is restricted to just a few metadata links like Authority Control. Another 3 articles had authority control templates but also higher importance transclusion as detailed above and thus aren't counted here.
      • 33 (56%) of these were just tracking categories -- mainly coordinates or birth date and age.
  • Separate from these transclusion analyses (wbc_entity_usage does not track this), 40 of the 100 articles relied on the Wikidata description for supporting Search on the mobile apps (though not all of these actually had a Wikidata description to insert).

Summary[edit]

In other words, this analysis suggests that instead of 62% of articles having Wikidata transcluded, the situation is much closer to:

  • Only 26% of articles have Wikidata transcluded in a way that affects the article seen by the reader. This number drops to 5% for transclusion that changes the main content of the article (excluding metadata templates) and 2% for transclusion that actually changes the facts read by the reader (i.e. changes infobox values).
  • An additional 33% of articles do not transclude content but do generate tracking categories based on Wikidata.
  • The remaining 41% of articles have no form whatsoever of transclusion.
  • Overlapping with the above numbers, 40% of articles rely on Wikidata for generating the descriptions used in search of the mobile apps.

Full Data[edit]

Page Title Item ID Sitelink Only Tracking Only Wikidata Description Used Metadata Template Content Template Transclusion Category Observations C Aspect L Aspect O Aspect S Aspect T Aspect
Don't_Go_Breaking_My... Q30605335 1 1 None ['Q30605335:S']
Donja_Raštelica Q3036389 1 Low Shortdesc generated via infobox settlement ['Q3036389:C.P625'] ['Q3036389:S'] ['Q3036389:T']
Salió_el_Sol Q6117359 1 1 None ['Q6117359:S']
Archimantis_sobrina Q4786876 1 Taxonbar Low ['P830:C.P1630', 'P3151:C.P1630', 'P846:C.P1630', 'P5055:C.P1630', 'P6055:C.P1630', 'Q4786876:C', 'P685:C.P1630'] ['Q4786876:S'] ['Q4786876:T']
Fiafia Q5446285 1 1 None ['Q5446285:S']
Daniel_Webster Q106231 1 Authority Control Low AC + Tracking Template:Birth_date ['Q106231:C'] ['P570:L.en', 'P569:L.en'] ['Q106231:O', 'Q5:O'] ['Q106231:S'] ['Q106231:T']
1997_European_Athlet... Q18344942 1 1 None No page description on Wikidata ['Q18344942:S']
Urige_Buta Q3377623 Authority Control Sports links Medium Shortdesc exists. Tracking Template:Birth_date. External links generated via https://en.wikipedia.org/wiki/Template:Sports_links ['P8286:C.P1630', 'P1146:C.P1630', 'Q3377623:C', 'P1447:C.P1630'] ['Q3377623:L.en'] ['Q3377623:O', 'Q5:O'] ['Q3377623:S'] ['Q3377623:T']
Wolfgang_Martin_Stro... Q27045097 Authority Control Low Shortdesc exists. ['Q27045097:C'] ['Q27045097:O'] ['Q27045097:S'] ['Q27045097:T']
Olu_Irame Q93804386 1 Low Shortdesc exists. ['Q93804386:C.P18'] ['Q93804386:O'] ['Q93804386:S']
Travis_McGee Q3538234 1 1 Low External links template but used in references and tracking only ['Q3538234:C.P345'] ['Q3538234:O'] ['Q3538234:S']
Online_Bible Q2024371 1 1 None No page description on Wikidata ['Q2024371:S']
Caledonian_Brewery Q5019422 1 Infobox company High ['Q5019422:C.P112', 'Q5019422:C.P1128', 'Q5019422:C.P154', 'Q5019422:C.P155', 'Q5019422:C.P156', 'Q5019422:C.P2139', 'Q5019422:C.P2295', 'Q5019422:C.P2403', 'Q5019422:C.P3362', 'Q5019422:C.P576', 'Q5019422:C.P856', 'Q5019422:C.P946'] ['Q5019422:O'] ['Q5019422:S'] ['Q5019422:T']
Networking_cables Q387683 1 Authority Control Commons Category Medium Authority Control is only tracking; Commons Category via Wikidata too ['Q387683:C'] ['Q387683:O'] ['Q387683:S'] ['Q387683:T']
Arachno_Creek Q30066195 Authority Control Low Shortdesc exists. AC doesn't populate and coords is tracking ['Q30066195:C'] ['Q30066195:O'] ['Q30066195:S'] ['Q30066195:T']
Malinvestment Q1735646 1 1 None No page description on Wikidata ['Q1735646:S']
Steirastoma_poeyi Q14716861 Taxonbar Low Shortdesc exists ['P838:C.P1630', 'P846:C.P1630', 'P5055:C.P1630', 'Q14716861:C'] ['Q14716861:S'] ['Q14716861:T']
1927_William_&_Mary_... Q22026309 1 None Shortdesc exists ['Q22026309:S']
1919_Uruguayan_parli... Q7901496 1 1 None No page description on Wikidata ['Q7901496:S']
Abbey_of_the_Holy_Gh... Q4664087 1 1 Low No page description on Wikidata. Wrote over Wikidata Infobox template params ['Q4664087:C.P18'] ['Q4664087:O'] ['Q4664087:S']
Lucky_Lynx Q16844096 1 1 None No page description. ['Q16844096:S']
South_West_Pacific_(... Q7568867 1 1 Low Overwrote IMDb titles params ['Q7568867:C.P345'] ['Q7568867:O'] ['Q7568867:S']
Lawrence_Townsend Q15998594 1 Authority Control Low ['Q15998594:C'] ['Q15998594:O', 'Q5:O'] ['Q15998594:S'] ['Q15998594:T']
Anahuarque Q16251142 1 1 Low A bunch of unused categories from Wikidata for Infobox Mountain ['Q16251142:C.P18', 'Q16251142:C.P2044', 'Q16251142:C.P2659', 'Q16251142:C.P2660', 'Q16251142:C.P3137', 'Q16251142:C.P361', 'Q16251142:C.P625', 'Q11573:C.P5061'] ['Q11573:L.en'] ['Q16251142:O', 'Q11573:O'] ['Q16251142:S'] ['Q16251142:T']
Platycephalus_fuscus Q2128274 1 Taxonbar Low ['P3151:C.P1630', 'P846:C.P1630', 'P850:C.P1630', 'P938:C.P1630', 'P815:C.P1630', 'P5055:C.P1630', 'P685:C.P1630', 'Q2128274:C'] ['Q2128274:S'] ['Q2128274:T']
Cambarus_williami Q4848088 1 Taxonbar Low ['P830:C.P1630', 'Q4848088:C', 'P3151:C.P1630', 'P846:C.P1630', 'P850:C.P1630', 'P627:C.P1630', 'P815:C.P1630', 'P5055:C.P1630', 'P685:C.P1630', 'P6018:C.P1630'] ['Q4848088:S'] ['Q4848088:T']
Senoussi_(cigarette) Q2271221 1 1 None ['Q2271221:S']
Nuhiji Q7068790 1 None ['Q7068790:S']
Songs_of_Life_(The_G... Q7561550 1 None ['Q7561550:S']
Darren_McKenzie-Pott... Q87116242 1 Low Shortdesc exists. Sports ref overwritten ['Q87116242:C.P1447', 'Q87116242:C.P31', 'Q87116242:C.P569'] ['Q87116242:O', 'Q5:O'] ['Q87116242:S'] ['Q87116242:T']
List_of_aircraft_(Sb... Q56274214 1 None ['Q56274214:S']
Res_Gestae_Divi_Augu... Q734095 1 Low Shortdesc exists. Commons category tracking ['Q734095:S'] ['Q734095:T']
Nightrider_(chess) Q2413259 1 None ['Q2413259:S']
Afonso_V_of_Portugal Q299119 Authority Control Low ['Q299119:C'] ['Q299119:O'] ['Q299119:S'] ['Q299119:T']
Panikos_Hatziloizou Q27187370 1 Low Tracking: Birth date, NFT Player, Infobox Image ['Q27187370:C.P18', 'Q27187370:C.P2574', 'Q27187370:C.P31', 'Q27187370:C.P569'] ['Q27187370:O', 'Q5:O'] ['Q27187370:S'] ['Q27187370:T']
Bongo_Comics Q892419 1 1 None ['Q892419:S']
Gustavia_longepetiol... Q5471396 1 Taxonbar Low ['P3151:C.P1630', 'P846:C.P1630', 'Q5471396:C', 'P961:C.P1630', 'P627:C.P1630', 'P960:C.P1630', 'P5037:C.P1630'] ['Q5471396:S'] ['Q5471396:T']
1935_SANFL_season Q19576580 1 1 None ['Q19576580:S']
Cat_Orgy Q834915 1 Low IMDb episode filled. Short desc ['Q834915:C.P345'] ['Q834915:O'] ['Q834915:S']
Ulex Q393278 Taxonbar Low ['P1745:C.P1630', 'P846:C.P1630', 'P5945:C.P1630', 'P850:C.P1630', 'P2036:C.P1630', 'P5055:C.P1630', 'P815:C.P1630', 'P2752:C.P1630', 'P4728:C.P1630', 'P3101:C.P1630', 'P685:C.P1630', 'P3151:C.P1630', 'P6933:C.P1630', 'Q393278:C', 'P3031:C.P1630', 'P3240:C.P1630', 'P5037:C.P1630', 'P961:C.P1630', 'P7715:C.P1630', 'P1772:C.P1630', 'P5984:C.P1630', 'P830:C.P1630', 'P960:C.P1630', 'P838:C.P1630'] ['Q393278:O'] ['Q393278:S', 'Q8857075:S'] ['Q393278:T']
Roseate Q7368049 1 None ['Q7368049:S']
Tourism_Corporation_... Q7829088 1 1 None ['Q7829088:S']
Fever_121614 Q24877484 1 None ['Q24877484:S']
Inside_Detroit Q6037562 1 1 Low Four movie title trackers ['Q6037562:C.P1562', 'Q6037562:C.P2631', 'Q6037562:C.P345', 'Q6037562:C.P3593'] ['Q6037562:O'] ['Q6037562:S']
KKSE Q56697212 1 None ['Q56697212:S']
Philip_II,_Count_of_... Q68635 1 Authority Control Low ['Q68635:C'] ['Q68635:O', 'Q5:O'] ['Q68635:S'] ['Q68635:T']
Lyndon,_Ohio Q22133943 1 1 Low ['Q22133943:C'] ['Q22133943:O'] ['Q22133943:S'] ['Q22133943:T']
Principality_of_Lipp... Q14551680 1 1 Low ['Q14551680:C'] ['Q14551680:O'] ['Q14551680:S'] ['Q14551680:T']
They_Are_Billions Q47206077 1 1 Low Infobox Video game all overwritten ['Q47206077:C.P162', 'Q47206077:C.P179', 'Q47206077:C.P2670', 'Q47206077:C.P287', 'Q47206077:C.P3080', 'Q47206077:C.P50', 'Q47206077:C.P856', 'Q47206077:C.P880', 'Q47206077:C.P943'] ['Q47206077:O'] ['Q47206077:S'] ['Q47206077:T']
List_of_fellows_of_t... Q16000751 1 1 None ['Q16000751:S']
Epiphysis Q1801229 1 Template:Infobox_anatomy Low Template:Infobox_anatomy inserts metadata ['Q1801229:C'] ['P1323:L.en'] ['Q1801229:O'] ['Q1801229:S'] ['Q1801229:T']
João_Valente_Bank Q1710210 1 1 Low Coord and Infobox Mountain ['Q1710210:C.P18', 'Q1710210:C.P2044', 'Q1710210:C.P2659', 'Q1710210:C.P2660', 'Q1710210:C.P3137', 'Q1710210:C.P625', 'Q11573:C.P5061'] ['Q11573:L.en'] ['Q1710210:O', 'Q11573:O'] ['Q1710210:S'] ['Q1710210:T']
Clitophon Q1874192 1 None ['Q1874192:S']
Johnstown_Center,_Wi... Q6268773 1 Low Shortdesc exists ['Q6268773:C'] ['Q6268773:O'] ['Q6268773:S'] ['Q6268773:T']
List_of_mergers_in_S... Q2855252 1 1 None ['Q2855252:S']
Highbury_Union_F.C. Q21181216 1 1 None ['Q21181216:S']
Babaha Q5798173 1 Low Shortdesc per Infobox settlement ['Q5798173:C.P625'] ['Q5798173:S'] ['Q5798173:T']
Metropolis_(Anatolia... Q1497156 1 Authority Control Low ['Q1497156:C'] ['Q1497156:O'] ['Q1497156:S'] ['Q1497156:T']
Mat_Latos Q1075015 1 1 Low Birthdate and age. ['Q1075015:C.P31', 'Q1075015:C.P569'] ['Q1075015:O', 'Q5:O'] ['Q1075015:S'] ['Q1075015:T']
Woodstock_Railway Q603069 1 1 None ['Q603069:S']
Megachile_osea Q2753663 Taxonbar Low ['P830:C.P1630', 'P815:C.P1630', 'Q2753663:C', 'P846:C.P1630'] ['Q2753663:S'] ['Q2753663:T']
Public_holidays_in_J... Q1195354 1 None ['Q1195354:S']
Danish_Contemporary_... Q96375935 1 1 Low ['Q96375935:C'] ['Q96375935:O'] ['Q96375935:S'] ['Q96375935:T']
Onion_River_(Minneso... Q7093922 1 1 Low ['Q7093922:C'] ['Q7093922:O'] ['Q7093922:S'] ['Q7093922:T']
Zémidjan Q3576466 1 1 None ['Q3576466:S']
Mena_(album) Q6816326 1 None ['Q6816326:S']
Volvarina_habanera Q7941186 1 Taxonbar Low ['P846:C.P1630', 'P5055:C.P1630', 'Q7941186:C', 'P850:C.P1630', 'P6018:C.P1630'] ['Q7941186:S'] ['Q7941186:T']
Flying_High_(1931_fi... Q5463440 1 Low Movie ID templates ['Q5463440:C.P1562', 'Q5463440:C.P2631', 'Q5463440:C.P345'] ['Q5463440:O'] ['Q5463440:S']
Myponie_Point Q21883386 1 1 Low ['Q21883386:C.P625'] ['Q21883386:S'] ['Q21883386:T']
Elizabeth_Mayer Q5363177 1 Authority Control Low ['Q5363177:C'] ['Q5363177:O'] ['Q5363177:S'] ['Q5363177:T']
Notre_Dame_Catholic_... Q7063380 1 Low Coords ['Q7063380:C.P625'] ['Q7063380:S'] ['Q7063380:T']
List_of_Canadian_Hot... Q30674888 1 1 None ['Q30674888:S']
Arthur_Koegel Q4799384 1 1 None ['Q4799384:S']
Lists_of_Bulgarian_f... Q6565201 1 None ['Q6565201:S']
6th_Parliament_of_Ki... Q42878706 1 1 None ['Q42878706:S']
DK_King_of_Swing Q211858 1 1 Low Infobox video game ['Q211858:C.P2670', 'Q211858:C.P287', 'Q211858:C.P3080', 'Q211858:C.P408', 'Q211858:C.P50', 'Q211858:C.P880', 'Q211858:C.P943'] ['Q211858:O'] ['Q211858:S'] ['Q211858:T']
Benjamin_Dwyer Q22959040 1 Authority Control Low ['Q22959040:C'] ['Q22959040:O'] ['Q22959040:S'] ['Q22959040:T']
Patton_Glacier Q7148635 1 USGS gazetteer Medium ['Q7148635:C.P625', 'Q7148635:C.P804'] ['Q7148635:O'] ['Q7148635:S'] ['Q7148635:T']
McAfee_Peak Q14705212 1 1 Low Infobox mountain, coords ['Q3710:C.P5061', 'Q14705212:C.P18', 'Q14705212:C.P2044', 'Q14705212:C.P2659', 'Q14705212:C.P2660', 'Q14705212:C.P3137', 'Q14705212:C.P361', 'Q14705212:C.P625'] ['Q3710:L.en'] ['Q3710:O', 'Q14705212:O'] ['Q14705212:S'] ['Q14705212:T']
Sommatino Q477777 Authority Control Infobox Italian Comune High ['Q477777:C'] ['Q214195:L.en'] ['Q477777:O'] ['Q477777:S'] ['Q22321052:T', 'Q5637226:T', 'Q477777:T']
Pay_Takht-e_Varzard Q5835615 1 Low Infobox settlement short desc ['Q5835615:C.P625'] ['Q5835615:S'] ['Q5835615:T']
89_Albert_Embankment Q19460066 1 Low Coords tracking ['Q19460066:C.P625'] ['Q19460066:S'] ['Q19460066:T']
Acalles_carinatus Q54983455 Taxonbar Low ['P3151:C.P1630', 'P846:C.P1630', 'Q54983455:C', 'P815:C.P1630', 'P5055:C.P1630', 'P2464:C.P1630'] ['Q54983455:S'] ['Q54983455:T']
Saudia_Aerospace_Eng... Q17070128 1 1 Low ['Q17070128:C'] ['Q17070128:O'] ['Q17070128:S'] ['Q17070128:T']
FromeFM Q5505726 1 1 None ['Q5505726:S']
Clonakilty_Cowboys Q5134943 1 None ['Q5134943:S']
Gorilla_(James_Taylo... Q943151 1 None ['Q943151:S']
Union_Township,_Lawr... Q9045420 1 Low ['Q9045420:C'] ['Q9045420:O'] ['Q9045420:S'] ['Q9045420:T']
Radio_KAOS Q7280870 1 None ['Q7280870:S']
Ralph_Emery Q7287455 1 Low Birthdate + IMDb ['Q7287455:C'] ['Q7287455:O', 'Q5:O'] ['Q7287455:S'] ['Q7287455:T']
Marine_Technology_So... Q6764232 Authority Control Low ['Q6764232:C'] ['Q6764232:O'] ['Q6764232:S'] ['Q6764232:T']
ABMA Q2819016 1 None ['Q2819016:S']
Sten_Heckscher Q4126210 1 Authority Control Low ['Q4126210:C'] ['Q4126210:O'] ['Q4126210:S'] ['Q4126210:T']
Keiokaku_Velodrome Q6383792 1 1 Low Coords ['Q6383792:C.P625'] ['Q6383792:S'] ['Q6383792:T']
Mena_(given_name) Q1757257 1 None ['Q1757257:S']
Speed_skating_at_the... Q1005823 1 None ['Q1005823:S']
List_of_Gintama._Shi... Q48845176 1 None ['Q48845176:S']
The_Singing_Hotel Q17354098 1 Low IMDb title ['Q17354098:C.P345'] ['Q17354098:O'] ['Q17354098:S']
Kedrick Q96477644 1 None ['Q96477644:S']
Arbinda_Department Q657177 1 Low Coords + Infobox Settlement for shortdesc ['Q657177:C.P625'] ['Q657177:S'] ['Q657177:T']

Automated Analysis[edit]

I attempted to automate the qualitative coding above to get a more complete picture for English Wikipedia. This was not straightforward for infoboxes, so I skipped them in this analysis, but I was able to analyze usage of coordinate, authority control, taxonbar, birth date, and all external link templates. See code here: https://github.com/geohci/wikidata-transclusion/blob/master/check_tracking.py

The results are as follows:

  • 6,125,693 articles were evaluated. 1,815,995 (29.6%) of them were determined to likely have Wikidata transclusion and an additional 1,669,478 (27.3%) of them were determined to likely show up on the wbc_entity_usage table because they generate Wikidata tracking categories. The rest of the results below reflect the number of times a template was found (a page could have many instances of a template) not the number of articles that used the template. This distinction doesn't matter for something like Authority Control but is important for something like Coordinates or the external link templates.
  • The coordinate template was used 1,841,505 times, 1,838,775 (99.9%) as tracking and 2,730 (0.1%) as transclusion.
  • The external link templates were used 3,132,833 times, of which 1,489,314 (47.5%) was transclusion and 1,643,519 (52.5%) was tracking.
  • The Taxonbar template was used 405,785 times and Authority Control template was used 1,348,117 times. For both of these, they transcluded Wikidata almost 100% of the time (in just a few thousand instances were parameters overwritten locally, and presumably even then there was some potential for transclusion).
  • Various forms of the birth date templates showed up on 1,027,719 articles -- these templates only ever track the birth date property.