Research:Towards Modeling Citation Quality
We would like to understand and map the quality of citations and references in Wikipedia. Reference 'quality' is a broad notion including: reliability, accessibility, neutrality, etc. In this project, we want to map a set of citation dimensions, towards the complete understanding of citation quality.
Citation Quality Dimensions
Towards modeling a full, rich notion of citaiton quality, we are exploring topic distribution and accessibility of citaitons look across different languages.
First, we define a topic for each publication, by:
- Collecting all articles where a publication is cited
- For articles in Wikipedia editions other than enwiki, find the corresponding article in enwiki. This is done by finding the Wikidata item corresponding to teach article, then retreiving the enwiki page linked from that Wikidata item.
- Assigning a topic to each article, using the WikiProject directory
We mark each publication (doi type) as Open Access or Closed Access as follows:
- We download the dataset from Unpaywall, containing, for each doi publication, a reference to its open access version, if any.
- We match the Unpaywall dataset with the entries in our data, and assign an accessibility label to each of them