Talk:Cite Unseen

From Meta, a Wikimedia project coordination wiki

Scales icon[edit]

The scales icon is usually associated with the scales of justice so I would assume it meant either "legal journal" or "balanced/fair source". Ed g2s (talk) 14:05, 19 May 2019 (UTC)[reply]

@Ed g2s: Ah, great point! We'll have to decide on another one. Let us know if you have any ideas. ~SuperHamster Talk Contribs 14:12, 19 May 2019 (UTC)[reply]

Multiple icons[edit]

What if the source is both questionable and a book or a questionable government source? Do the icons appear together as a pair? --Mr impossible (talk) 09:49, 10 June 2019 (UTC)[reply]

@Mr impossible: Yep! The script will add as many icons as identified. The screenshot on the page has an example of this (a government press release). ~SuperHamster Talk Contribs 14:54, 13 June 2019 (UTC)[reply]
@SuperHamster:So it does. My bad. So maybe the distinction between icons representing a source (e.g. government) and icons representing a medium (e.g. a blog) and icons representing trustworthiness (see above) could be more distinctive - or indeed colour could be used. Might aid clarity. I think it's slightly awkward to suggest that 'bad' sources, like the Heritage Foundation, are bad for everything. The Heritage Foundation website is presumably a good source for some basic things about the Heritage Foundation (when they were founded, where they are based) but a poor lens through which to understand the world. Very hard to represent that fuzziness. But amazing to see this tagging at scale. --Mr impossible (talk) 15:40, 14 June 2019 (UTC)[reply]

Cite Unseen integrations[edit]

This is a discussion of Cite Unseen's data sources, continued from en:User talk:Sky Harbor#A barnstar for you!. — Newslinger talk 07:32, 18 October 2019 (UTC)[reply]

Media Bias/Fact Check (MBFC)[edit]

After reading through your 2018 presentation slides, I'm curious about where the data is coming from. Is all of the data from Media Bias/Fact Check (RSP entry)?

If that's the case, we'll have to somehow reconcile the fact that the English Wikipedia community considers MBFC generally unreliable, since it would be a contradiction to use a data source that classifies another data source as untrustworthy. Perhaps giving users a choice between:

  1. Sources created by the community (including RSP and others, see below)
  2. Media Bias/Fact Check
  3. Other third-party sources (see below)

would resolve this, as Cite Unseen would show only one consistent group of data sources at a time. — Newslinger talk 19:36, 17 October 2019 (UTC)[reply]

@Newslinger: Great considerations! So, details on the current implementation: Only the biased, conspiracy, and fakeNews arrays came from MBFC data. What we currently do for MBFC evaluations is provide a disclaimer of sorts in the img alts mentioning that the data is coming from MBFC (e.g. "This source has been identified by Media Bias Fact Check as being moderately to strongly biased towards certain political causes through story selection and/or political affiliation."), so the reader can take that as whatever sized grain of salt they desire. Of course, alts aren't very visible or looked at, which is an issue—and if people aren't already familiar with MBFC, they probably wouldn't bother to look into it further. The rest of the categories (press, social, tabloids, etc.) are lists that Sky and I researched and put together ourselves (and of course aren't comprehensive and need expansion). We compiled those lists ourselves due to (a) lack of existing comprehensive sources for those lists and (b) those categories are a lot more objective and much easier to compile (at least compared to determining if something is "biased"); granted, there are still some gray areas within each (e.g. there are probably some news sites people would argue fit more as tabloids, different government sources have different levels of government intervention, etc.).
For the biased list, some back story: We originally used MBFC for the aforementioned categories as a way to quickly acquire biased and contentious sources (we were at a hackathon after all). In the original implementation we made at CredCon, we indicated if a source was considered by MBFC to be left or right at all (which >90% of sources are), and provided the left/right arrow indicators you can see on slide 11 in the slidedeck. One of the takeaways from other CredCon participants was that MBFC does, as you mentioned, have reliability issues, and placing something on a left-and-right scale is subjective and subject to context (geography, personal political views, etc). Coming into the current implementation, we decided to mark only the more extreme sources as biased. The line for that is currently drawn for sources that are on the far-sides of the "Left" or "Right" lines on MBFC's visual scale for each source. We also dropped the left/right arrow icons (since those are context-dependent and aren't universally recognized) in favor of the generic scale icon. Even so, I (and I think Sky as well) agree that we cannot rely solely on MBFC for this in the long-term, and should be careful about how we want to use their data, if at all. Speaking for myself here (perhaps Sky has other thoughts), I think RSP and other Wikipedia-determined source evaluations make sense to use as the higher authority here since this tool is, well, built for Wikipedia. I like the idea of providing toggleable options...and perhaps RSP data can be default, with MBFC data being an opt-in option so users can be consciously aware of what they're signing up for when selecting it. MBC could be nice as well (I haven't seen it since they added the interactive version - very cool!), could be nice to replace MBFC as a second enabled-by-default for the reasons you stated below. Unfortunately it's a bit limited in the number of sources evaluated...but better than nothing! ~SuperHamster Talk Contribs 07:58, 18 October 2019 (UTC)[reply]
Quick addendum: assuming RSP + the other wiki-compiled source evaluations are fairly comprehensive, I think it's a good idea to drop the MBFC-sourced biased array altogether, but perhaps keep the conspiracy and fakeNews lists as they represent the most extreme + advocacy-oriented. ~SuperHamster Talk Contribs 08:25, 18 October 2019 (UTC)[reply]

Perennial sources list (RSP)[edit]

JSON is exactly what I had in mind. Are there specific types of information you would be interested in, or would you prefer a full data dump of en:WP:RSP to JSON? Also, would you like the data organized in a particular way?

Each object, representing one entry from the perennial sources list, could look like this:

  • id (string): Anchor name of the entry on en:Wikipedia:Reliable sources/Perennial sources
  • name (string): Name of source
  • article (string): English Wikipedia article name corresponding to source
  • aspect (optional string): Which aspect of the source the entry applies to (e.g. contributors)
  • aliases (optional array of objects)
    • name (string): Name of alias (i.e. nickname, abbreviation, former name, alternative name, name of parent company, or name of subsidiary)
    • article (optional string): English Wikipedia article name corresponding to alias
  • status (enumerated string)
  • blacklisted (optional boolean): If true, the source is on the English Wikipedia spam blacklist or the Wikipedia global spam blacklist
  • discussions (array of objects)
    • page (string): Name of page on which the discussion is located
    • section (optional string): Section name of discussion
    • label (string): Identifier for the discussion
      • For uninterrupted requests for comment on the reliable sources noticeboard, the year of the RfC
      • For other discussions on the reliable sources noticeboard, the numerical order of the discussion
      • For discussions outside of the reliable sources noticeboard, the order of the discussion represented by an alphabetical character
    • rfc (optional boolean): If true, the discussion is an uninterrupted request for comment on the reliable sources noticeboard
    • active (optional boolean): If true, the discussion is currently active
  • lastDiscussed (number): Year of most recent indexed discussion on the source
  • inProgress (optional boolean): If true, at least one of the discussions in the entry is currently active
  • stale (optional boolean): Entries are normally marked as stale when they have not been discussed on the reliable sources noticeboard for four calendar years. However, sources classified as generally unreliable are not marked as stale if they are identified as self-published or as a publisher of user-generated content. Whether an entry is stale can usually be calculated from the value of lastDiscussed, but in some cases, needs to be overridden by this value.
    • true: Entry is stale, regardless of the value of lastDiscussed
    • false: Entry is not stale, regardless of the value of lastDiscussed
  • summary (string): Summary of previous discussions
  • seeAlso (optional string): id of related entry
  • domains (array of strings): List of domains covered by the entry

This is too much information to show or use all at once, but the details (e.g. discussions, summary, etc.) could be revealed through tooltips.

Ideally, this data would be extracted by a bot and automatically updated on a regular basis once the parsing method is stable. I want to ensure that Cite Unseen has everything it needs from the extracted information, and I'm trying to make this as comprehensive as possible since the extracted data might also be useful for other tools in the future. — Newslinger talk 19:36, 17 October 2019 (UTC)[reply]

Other data sources[edit]

After the perennial sources list, you might be interested in adding other data sources to Cite Unseen. Other lists of sources maintained by the community include:

Some of these lists aren't in a normalized format, and we would need to work with their maintainers to make the lists machine-readable.

There's also Ad Fontes Media's Media Bias Chart (MBC), which is occasionally referenced by editors on the reliable sources noticeboard. I'm not sure whether it's reliable, but it's better than Media Bias/Fact Check because it has a named staff, discloses its methodology, and reveals some of the data its ratings are derived from. Note that the perennial sources list conflicts with the MBC on some sources, so it wouldn't be a good idea to use both data sources at once. — Newslinger talk 19:36, 17 October 2019 (UTC)[reply]

WikiConference North America[edit]

Hi SuperHamster and Sky Harbor, how was WikiConference North America? Was Cite Unseen well-received?

There's currently a surge of high-profile disputes on the reliable sources noticeboard, and I'll be ready to continue work on the integrations after the discussions calm down. — Newslinger talk 02:04, 13 November 2019 (UTC)[reply]

Slides from WCNA 2019
@Newslinger: Hi! It went well! We had a good number of people attend our presentation, and towards the end we had a decent number of questions and ideas for the tool.
The slidedeck we used is at the right; the session was also recorded, which can be viewed here (in 32-141, titled "Day 1 - Session 1 - 32-141 WikiConference North America", at the 31:35 mark). One of my favorite ideas someone had was highlighting the superscript citation number (e.g. [2]) when a source has a potential issue. Will have to go back to the recording and see what else was said (way too tired to remember everything right now :P) ~SuperHamster Talk Contribs 07:27, 13 November 2019 (UTC)[reply]

Previous similar work[edit]

Hi!

Thank you for this project, it’s pretty cool :)

In case the team is not already aware of it, I wanted to point out the Décodex (Q28976880) − a 2017 experiment from French national newspaper Le Monde (Q12461) to label websites as reliable, imprecise, fake news or parody ; available as a web search engine or browser extensions.

I’m not sure whether it’s still maintained, and their database is likely closed-source. But maybe there is some inspiration to be taken from it (on things to do / not to do).

Hope this helps, Jean-Fred (talk) 13:37, 31 March 2020 (UTC)[reply]