What is a short name for the activity/project you are proposing?
Develop tools to add DOI information to Chinese articles
- your name and/or Wikimedia username
- your contact e-mail
- your nearest city and country
South Bend, Indiana, USA
Details of team members (optional)
If this application is for a team project, each additional team member (up to 5 total) should have their name/username, project role, location, and contact details here:
Tell us about your proposed project. What will you get achieved with this time?
Describe why you think this project is important, and how the project is different from your normal volunteer contributions.
Although China has become world's largest producer of scholarly articles, bibliographic data about Chinese articles are still very limited in Wikidata due to culture and language barriers. During the last one to two years, I have already created about 1 million items about Chinese journal articles based on information from CNKI (Q12857515), which includes the most comprehensive database for Chinese articles. However, a key limitation of CNKI is the lack of DOI information for articles (except for a few cases when the resolved webpages are hosted by CNKI itself). We know that DOI is the most important identifier for scholarly articles, so it would be very beneficial to add DOI to items for Chinese articles, which is the aim of this project.
The DOIs are typically displayed on the websites of individual journals. There are no APIs, so web scrapers need to be developed to fetch those DOIs. In many cases, websites of different journals from the same publisher are built using the same template. So there's no need to develop scripts for every single journal. Developing scrapers for several top publishers are already enough to cover many journals. I'd like to write scripts for as many publishers as possible if time permits.
Describe how you (and, if applicable, your team)are able to achieve this project. What skills, expertise, and motivation do you have which will enable you to succeed?
I am a long time Wikidata editor, who joined Wikidata since its birth. As far as I know, I am the only editor who has been systemically creating items about Chinese articles. Because of those experiences, I am very familiar with Wikidata in general, and Wikidata's bibliographic data in particular.
As mentioned above, scripts need to be developed to fetch DOI information, which requires some programming skills. As a PhD student in a STEM field, I am confident that I have the required skills to develop such tools to complete this project.
Proposed activity dates
When will you undertake this project? (This may be two, three or four days. Not necessarily consecutive). The latest allowable date is 1 May 2021.
Four days in Dec 2020 and/or Jan 2021.
Optional: Community members are encouraged to endorse your proposal and leave a rationale here.