Heritage GLAM/Rajindra Victoria Diamond Jubilee Municipal Public Library

From Meta, a Wikimedia project coordination wiki

This is a GLAM Collaboration between Wikimedia Community in Punjab and R.V.D.J. Municipal Public Library, Patiala.

Rajindra Victoria Silver Jubilee Municipal Public Library
Municipal Library Patiala
Municipal Corporation Library Patiala
The Municipal Library Patiala
Municipal Library Patiala 3

Languages for Books and Manuscripts to be digitized[edit]

  • Punjabi
  • Hindi
  • Shahmukhi
  • Urdu

Number of Books Present in Library[edit]

  • Punjabi - 20303
  • English - 18663
  • Hindi - 21342
  • Urdu - 5960
  • Books in other Languages: Sanskrit, Persian, Arabic

List of Public Domain and Copyright free books Surveyed[edit]

List of Books

Municipal Library Image 8

Timeline[edit]

The collaboration meetings and formalities started in May 2018 and since then, survey for books in the institution was done several times and we had several staff meetups. The pilot started from October 2018 and is still ongoing for digitizing rare literature and Manuscripts for Wikisource Project. Initial pilot started with 50 Books which were scanned and uploaded by the staff members of institute, under direction of Wikilover90. The consequent edit-a-thons and proofreading events were done to further proofread, validate and integrate the books into Wikipedia.

For this, various meetups, training workshops and Wikisource events were executed.

Wikimedian in Residence[edit]

In order to archive open-source content for free use, we are digitizing and documenting Public Domain content in collaborations with different institutions pan-Punjab by setting up Wikimedian-in-residence who has the knowledge, links and qualifications to lead the project and act as liaison with the GLAM partners and Government institutions.

Objective[edit]

This collaboration aims to increase the availability of Public Domain Punjabi works on Punjabi Wikisource by digitizing the books available at the institution with the focus to preserve Punjabi heritage and history. Along with that, we would like to start Hindi Wikisource, that is not yet live. We would be digitizing Urdu, Shahmukhi, and English books along with Punjabi and Hindi work.

Punjabi Wikisource[edit]

Punjabi Wikisource would be the main centric focus for Library Partners collaborating with Punjabi Wikimedians. They will provide us with Books, manuscripts, and archives in Punjabi, Hindi, Shahmukhi, and Urdu language. With the help of community resources - volunteers and tools, the scanned, post scan processed and uploaded files will be indexed, proofread, validated, the transcluded on Wikisource and later integrated into Wikipedia.

What we aim to achieve?[edit]

  • Important Books of Punjabi, Hindi, Sanskrit and English available under Public Domain uploaded under Creative Commons License.
  • Wikimedia training workshops to facilitate the digitization program and better improve coordination among Indic Wikisource community, thus, strengthening the relationship between the Government GLAM institutes and Wikimedia community
  • Teaching GLAM staff about policies and practices of Wikimedia projects and free copyright licenses for Wikimedia projects.
  • Inform GLAM representatives about possibilities and scope of Wikimedia movement and digitization of content under CC License.
  • Promotion of participation in Wikisource
  • Add content to the Wikimedia Commons and Wikisource sites

Challenges[edit]

  • The bookshelves are quite dusty and have strong allergic element that requires anyone visiting there and coming in contact with dirt requires medication. The government Municipal libraries are not in good shape and this institutions has not been cleaned since 13 years.
  • No proper catalogue, there is no online catalogue and the books are not in the stated bookshelves, which basically is making us do a lot of manual search in the dusty shelves, but we are finding quite interesting and important books, making it worth labor and allergic infection that comes with it.
  • It is challenging to find bibliographical and author bio information at online directories and archives for Punjabi authors from old times and the work continues, involving offline archives and books about author information.
  • Initially, there was trouble with OCR due to lack of Linux devices and software for Mac but recently that issue got solved for Indic community, courtesy to Jay Prakash developing Indic OCR Tool that is now integrated in all Indic language Wikisource.
  • Post Processing of the books is challenge still are trying to solve. Initially had difficulty finding post processing software for MAC. On-going work is slower because of archived post processing work.
  • SV600 Fujitsu Scanner that Punjabi Community currently owns can scan only small books and books that are of larger size and thicker volume don't get scanned completely without cutting off the lower half of the pages. Doing that with Sony Camera did not produce the right results either.
  • Bad bandwidth of internet created issues in the uploading of books.
  • In the process of uploading the books, there was issue with the underneath OCR layer that was picked up by the software. Had to rectify that by saving it again in different formats.
  • The proofreading was a challenge. In past two years, Punjabi Wikisource had less than 1170 pages proofread till October 2018 since beginning of 2017 when Punjabi Wikisource started. With a small project like ours that was still in beginner's phase in the past 2 years, getting the digitized content integrated in Wiki projects was a big challenge. With persistant campaigning via social media and outreach and bringing new volunteers for Punjabi Wikisource, we were able to complete this project.

Research Work[edit]

Documenting art
  • Manual search through hundreds of books to create a raw data for: Name of book, name of author, publishing date ( if stated on book), publishing company,
  • Correspondence via emails with Commissioner office to finalize details of the agreement
  • Search in Wikidata items through different queries to find:
  1. Authors of Punjab
  2. Authors who were born in Punjab
  3. Authors who wrote in Punjabi
  4. Authors of India
  5. Authors of India, Pakistan and British raj
  6. People of India who spoke Punjabi
  7. People of India and Pakistan without profession author
  8. People who wrote during British raj
  • Search through each Wikidata item and the attached Wikipedia article to verify the information and cross it off the list or edit/add information to Wikidata item
  • Search in the list of authors and poets from Punjabi from wikipedia
  • Search in the list of authors from Wikisource
  • Search in various online archives for the same above information
  • Checked some books such as, Lekhak Sandarbh Kosh to check biodata about author’s dob
  • Consultation with various research scholars and professors for the author’s information and the book directories and to get access with issue
  • Consultation with copyright experts for Indian authors
  • Search for different online directories, archives and books for author information

Links[edit]

List of Queries and Online Archives

Wikisource stats before the pilot project started in October 2018[edit]

Page namespace (Pages of Books) Main namespace (Article)
language all pages not proof. problem. w/o text proofread validated all pages with scans w/o scans disamb percent
te 47726 13502 39 1098 33087 24314 13118 3986 9132 0 30.39
bn 708658 684743 566 6789 16560 7474 7629 7599 15 15 99.80
ta 403283 387310 24 75 15874 7804 5768 1521 4247 0 26.37
gu 13048 1372 9 280 11387 8870 5777 1550 4227 0 26.83
ml 20849 12326 130 307 8086 671 6397 717 5680 0 11.21
sa 43557 39858 147 142 3410 2080 19482 216 19266 0 1.11
kn 48767 44805 26 481 3455 1190 21035 118 20917 0 0.56
or 6932 3815 3 50 3064 530 667 96 566 5 14.50
pa 5064 3747 6 42 1269 381 138 25 113 0 18.12
as 1470 805 0 5 660 159 1365 31 1334 0 2.27
mr 17609 16922 36 11 640 24 1496 1 1495 0 0.07

Current Indic Wikisource stats in January 2019[edit]

Statistics on Saturday, 26. January 2019 12:01PM

Page namespace Main namespace
Language All pages Without text Not proofread Problematic Proofread Validated All pages With scans Without scans %
as 2869 29 1747 1 1092 604 1402 56 1346 3.99
bn 709814 6827 685134 572 17281 7537 7551 7521 16 99.79
gu 13887 306 1140 9 12432 9999 6001 1777 4224 29.61
kn 50281 559 45867 45 4068 1810 21056 131 20925 0.62
ml 46863 2050 12501 131 32181 688 6416 729 5687 11.36
mr 18454 11 17569 36 838 28 1528 1 1527 0.07
or 7131 57 3338 3 3733 2112 656 99 552 15.21
pa 11060 188 4491 47 6334 551 149 31 118 20.81
sa 52386 149 47305 24 4907 2892 20434 437 19997 2.14
ta 403344 86 383253 53 19952 9657 6034 1793 4241 29.71
te 49724 1165 15123 68 33368 24713 13113 3971 9142 30.28


Stats of Proofreading done between December 14, 2018 and Feb 1, 2019, making Punjabi Wikisource the fastest growing and most active community globally in terms of content and editor growth