News on Wiki/Wikidata

From Meta, a Wikimedia project coordination wiki
 About Talk Past Present Future Learning Data Research 

Wikidata is a vital tool for the News on Wiki campaign. It permits us to add basic information about news outlets that may not meet Wikipedia's inclusion criteria (notability), and it permits us to create maps to visualize how many Wikipedia articles about news outlets exist in a certain place.

The following WikiProjects on Wikidata are related and worth exploring:

  • the Wikidata page on the English Wikipedia WikiProject Newspapers]] has useful info as well.

Introduction to linking a Wikipedia article with its Wikidata entry, or starting a new Wikidata entry.

Wikidata is a hybrid between a wiki and a database.

For the purposes of News on Wiki, the most significant pages on Wikidata are those about newspapers and other news outlets. Wikidata has a lower threshold for inclusion than Wikipedia, and Wikidata has far more pages about news outlets than English Wikipedia does.

A Wikidata entry can contain a great deal of information, or very little. To get a feel for how information is organized, you might look at the entry for The New York Times (a highly detailed entry) and then at the much simpler entry for The Mountain Bugle of Nevada.

Wikidata is expected, over time, to play a greater and greater role in how information is organized on the Internet. Other web services can query it as a database, and pull out structured information.

Adding databases to Wikidata[edit]

Wikidata can offer great value simply by linking existing websites, or other online databases. For instance, suppose the following websites exist:

  • A site that has the title and circulation estimate for every newspaper in Nebraska
  • A site that lists the title and founder for every Black-owned newspaper in the U.S.
  • A site that lists the name the publisher of every daily newspaper published in the U.S. in the 19th century.

Let's imagine a newspaper called the Nebraska Moonbeam. It was Black-owned, published daily in Nebraska in the 1880s. Wikidata could (at minimum) add a link (known as an "identifier") to each of those websites, so that the Nebraska Moonbeam links to the relevant page on all three websites. Better yet, through a process known as "scraping," we could import the relevant information from each of those websites, so that the Wikidata item includes specific claims (in addition to the identifiers). The Nebraska Moonbeam's Wikidata item would thus have the following information:

  • Instance of = newspaper
  • Publication frequency = daily
  • Owned by = Black owner(s) (if such a property exists)
  • Start date = 1880
  • End date = 1889

To illustrate this concept, look on the Wikidata entry for almost any U.S. newspaper for the US NPL identifier. If you find it, that's because 99of9 scraped this database during Phase 1 of News on Wiki. (For instance: visit the Portland Tribune's Wikidata entry and scroll down to near the bottom; then click the "2595" link.)

To see Wikidata on a page in Wikipedia, add the {{Authority control}} template after the list of external links. After some time, this will show Wikidata on the newspaper article in Wikipedia.

Specific databases to scrape[edit]

What other databases can we add? Here are some national ones:

  • United States Newspaper Listing (USNPL) Done
  • Chronicling America[1], a project of the U.S. Library of Congress, which uses the LCCN identifier in its URL scheme (as do some other online databases) and also uses ISSN and OCLC to uniquely identify newspapers.
  • Mondo Times
  • SmallTownPapers.com (appears to be a commercial archiving venture -- must be behind archiving project like this one)
  • Google's newspaper archive (not sure how useful it is as a data source, though it has tons of content)
  • Newspapers.com is pay-to-play, but seems to have a strong URL scheme for its pages, and they have a ton of archives. (They're also a Wikipedia Library partner, so there might be valuable lines of communication available.)
  • Podunk.com - many newspapers listed, requires more research to see how much useful info it has.
  • Echo Media, same - needs more research.

Oregon[edit]

  • Oregon Historical Newspapers archive (Univ. of Oregon) (uses LCCN as unique ID)
  • Oregon Newspaper Publishers Association - this one could be problematic, curious what data folks think. Tons of useful info, but it only has separate pages for General Members (not for Associate or Collegiate members, or non-members). So, over time...what if a newspaper drops its membership? Presumably, the record dies. Not sure how to handle. Done

Infobox newspaper[edit]

One important example of how Wikidata will shift the way that information is organized is evident within the Wikimedia world: Wikidata is increasingly used in managing the kind of infobox templates that are a high priority for this WikiProject.

  • There are many infobox templates that already rely on information as published in Wikidata. {{Infobox newspaper}} is not currently one of them, but sooner or later it probably will be.
  • On Wikimedia Commons, many categories use infobox templates that are automatically generated from Wikidata. (example)

There is an Infobox Tutorial on Wikidata.

There were 8,413 articles using the {{Infobox newspaper}}, as of February 16, 2020. See Link for the current count and Special:WhatLinksHere/Template:Infobox_newspaper for the current articles using this template. The data that should be included in this Infobox should include, at minimum: name=, type= (Daily, Weekly or monthly newspaper), foundation=, language=, ceased publication= (for defunct newspapers), headquarters= (address of newspaper), publishing_city=, publishing_country=, ISSN= (when known), oclc= (when known), and website= (when known).

Query retrieval and maps[edit]

Sample image taken from the query listed here.      Wikidata item, no WP article      WP article, no infobox      WP article with infobox The map is generated by this Wikidata query Visit the link to zoom in on cities with more than one paper, etc. Map generated August 8, 2018.

When facts are stored in databases, you can ask questions about the whole set of facts at once. One way this is done on wikidata is using the Wikidata query service.

Here are some examples of queries relevant to this project:

  1. Map of all newspapers on wikidata if they have a recorded place of publication and that place has recorded coordinates. The map is colour coded according to whether there is an en-wiki article, and if so, the link is available by clicking on the point.
  2. USA newspapers without a place of publication please provide P291 if you can find it.

You can customize the queries above, or make your own. A tutorial and examples are available to kick you off.

You can also generate a map of all newspapers in a given Category, if the newspapers all have coordinates in the articles. See the following example for Newspapers published in Minnesota: {{GeoGroupTemplate|article=Category:Newspapers published in Minnesota}} This will generate the box at the right. Clicking on the OpenStreetMap link in the box will bring up the map. Substitute the name of any Category you want to use.

Template:GeoGroupTemplate

Personalized automatically updating lists[edit]

If there is a specific subset of newspapers that you are interested in, and you can specify this with a query, you can get a personalized automatically updating list.

Here is an example by wikidata:User:Sic19 that lists a whole lot of information stored in wikidata about all Welsh newspapers. --99of9 (talk) 07:56, 10 August 2018 (UTC)[reply]

Things to do[edit]

  • Every newspaper (whether or not it's notable enough for a Wikipedia article) should have a Wikidata entry.
  • There is now a Mix'n'match set 1655 for Australian Newspapers you'd be welcome to help with. --99of9 (talk) 01:43, 7 August 2018 (UTC)[reply]

Related WikiProject on Wikidata[edit]

There is a closely related WikiProject on Wikidata; please consider reviewing their pages and/or joining that project.

References[edit]

  1. "Chronicling America". Chronicling America at US Library of Congress. Retrieved March 14, 2020.