Grants talk:Programs/Wikimedia Community Fund/Rapid Fund/Dr. Bir/Bihar Bengalee Association GLAM (ID: 21974356)

From Meta, a Wikimedia project coordination wiki

Initial suggestions by the South Asia Regional Funds Committee[edit]

Upon reviewing the proposal the South Asian Regional Funds committee has the following observations:

As the reference materials during pre and immediate post independence are hard to find, the project is important to preserve history. It is commendable that the GLAM organisation is ready to provide the computers and lodging at no cost for the WiR.

The committee would like to note that the cost per page at around Rs. 8/page is in the upper band of the cost compared to professional scanning.

Suggestions to be considered:

Is there any plan to create the scanned pdf files with OCR data? The scanner mentioned does OCR by ABBYY ocr software. Does it support the Bengali language? What other options are being considered?

Is there any comparison regarding CZUS ET18-Pro and ET24-Pro?

The applicant has indicated around 45,000 pages to be scanned. Is this the no. of pages available at the institute? Or even more is available which can be scanned in subsequent projects? It is important to have this information to understand the sustainability of the project.

What happens to the scanning machine at the end of the project? Who will take care of it? Was any effort made to obtain scanning machine which was granted in previous projects in WB user group?

Is there any plan to encourage participation of marginalised communities and bridging gender gap via this project and during this project?

How is the “Quality Control (QC) of the scanning planned? It is often the case of cut pages or missing pages in the scans. Is there any consideration of scanning and QC being managed by different individuals to prevent errors due to bias?

Is there a way to improve more community engagement regarding this programme?

What is the plan for uploading the documents in commons? Will it be done as the work progresses or would be done at the end of the project?

What is the approximate dimension of the newspaper? It is mostly text or contains images too?

Will the scan be saved Date wise? or other methods?

For the item “Other hardware like External Hard Drives, Data cables etc.” the amount Rs. 20,000/- seems to be on the higher side. Could you provide more details about these expenses?

South Asia Regional Funding Committee suggests that more activities and engagement around doing QC, organising online talks regarding the work in progress for updates, and running some proof-read-a-thon to utilise scanned images as a part of the proposal.


On behalf of the South Asia Regional Funds Committee - J Balaji and THasan (WMF) (talk) 08:54, 10 October 2022 (UTC)[reply]

Hi South Asia Regional Funds Committee members, thank you for your response and detailed observations. Please find below my responses to your queries.

The committee would like to note that the cost per page at around Rs. 8/page is in the upper band of the cost compared to professional scanning.

The proposal has not asked for INR 8 per page for scanning. Wikimedian-in-Residence (WiR) will get INR 150,000 in total for scanning and post processing for the entire collection of Behar Herald. If we consider the number of pages to be around 45,000, the cost for scanning goes around INR 3 per page, which is a reasonable amount. The rest of the money is for logistical purposes, like travel, accommodation, food, internet etc., which is not directly linked to the scanning process. IMHO, it would also not be fair to include the amount to purchase the scanner into the per page amount as the scanner is a one-time investment, which will be utilized in other future GLAM opportunities. Please also note that we have not considered paying WiR for scanning per page, as the number of pages can vary.

Is there any plan to create the scanned pdf files with OCR data? The scanner mentioned does OCR by ABBYY ocr software. Does it support the Bengali language? What other options are being considered?

Behar Herald is an English language newspaper and English is well supported by all available OCR softwares. It is planned to use the ABBY OCR while post-processing using the scanner to get the text laye: The files will also be uploaded to the Internet Archive, where the same OCR software is used.

Is there any comparison regarding CZUS ET18-Pro and ET24-Pro?

The comparison between CZUR ET18 Pro and ET24 Pro can be found in this link. The resolution would be better using ET24 but the cost will increase too.

The applicant has indicated around 45,000 pages to be scanned. Is this the no. of pages available at the institute? Or even more is available which can be scanned in subsequent projects? It is important to have this information to understand the sustainability of the project.

The number of pages to be scanned mentioned in the proposal is approximate and will differ in reality. The number was roughly calculated while cataloging the volumes during the preparatory meeting with the GLAM institute.

What happens to the scanning machine at the end of the project? Who will take care of it? Was any effort made to obtain scanning machine which was granted in previous projects in WB user group?

The scanner will be handed over to the West Bengal Wikimedians User Group at the end of the project and they will take care of it. The scanner will be used in future GLAM projects which are being explored by the local community. There was no past effort from the UG to obtain any scanner from any grants.

Is there any plan to encourage participation of marginalised communities and bridging gender gap via this project and during this project?

The scope of the project is limited to digitization only, to be carried out by one WiR, so there is no such opportunity and plan to encourage participation from diverse population.

How is the “Quality Control (QC) of the scanning planned? It is often the case of cut pages or missing pages in the scans. Is there any consideration of scanning and QC being managed by different individuals to prevent errors due to bias?

There will definitely be some errors at the start, but they will be resolved over time. There will be technical limitations too while using the scanner which will surely be overcome gradually. There will be different individuals to check the quality control.

Is there a way to improve more community engagement regarding this programme?

The local community and affiliate will be engaged during the program. We have kept provisions in the fund request for travel, food and lodging for volunteers, who will visit the scanning site and overview the project.

What is the plan for uploading the documents in commons? Will it be done as the work progresses or would be done at the end of the project?

The files will be uploaded as the project progresses.

What is the approximate dimension of the newspaper? It is mostly text or contains images too?

The newspapers are of different sizes. The dimension of most of the volumes are of tabloid size. Few volumes are also of full crown size. Behar Herald contains both texts and images there.

Will the scan be saved Date wise? or other methods?

The scans will be uploaded date or volume wise.

For the item “Other hardware like External Hard Drives, Data cables etc.” the amount Rs. 20,000/- seems to be on the higher side. Could you provide more details about these expenses?

There are few full crown sized volumes which would not fit in the said scanner and thus needs to be scanned by a different kind of scanner and may require help of some professionals. As I am not sure about the amount of money needed to scan such volumes, I have requested for the amount as 20,000/- in total to include the cost.

South Asia Regional Funding Committee suggests that more activities and engagement around doing QC, organising online talks regarding the work in progress for updates, and running some proof-read-a-thon to utilise scanned images as a part of the proposal.

There are plans to present online about the project in different opportunities. As Wikisource does not support newspapers smartly, proofread-a-thon is not a plan for us for now. We will wait until Wikisource technically becomes more robust and user-friendly for newspapers.
Thanks and regards - Dr. Bir (talk) 16:22, 10 October 2022 (UTC)[reply]

In-principle recommendation of the project[edit]

Hi Dr. Bir,

Thank you for the responses. After going through the responses the South Asia Regional Funds Committee would like to recommend funding for this project. Before making the final financial recommendation, there are a few suggestions that the committee would like you to consider both in the initial planning stages of the project and during the implementation stages of the project.

The committee would like to have more information about the copyright right status of the pages to be scanned. This information is important as admins on Wikimedia Commons might mass delete the digitised material if it doesn't stand the scrutiny. It must be ensured that the pages must be out of copyright both in India and US.

It is our recommendation that the publisher/partner organisation themselves declare the material under open licence. This can be achieved with a MoU which states the intent of releasing the copyright before the kickoff of the project or as the initial activity of the project.

As scanning is a one time exercise and the scanner will be an asset for the Wikimedians of West Bengal User Group it is our recommendation that even if there is increase in expenditure, a better scanner should be considered. The current project material consists of both text and image and hence a better resolution would be an added advantage.

With best wishes on behalf of the South Asia Regional Funds Committee THasan (WMF) (talk) 09:16, 12 October 2022 (UTC)[reply]


Hello South Asia Regional Funds Committee members, thank you all for your recommendation to fund the proposal.
We have already discussed with the GLAM institute during our preparatory meeting about the process to release the license of the newspaper volumes through VRTS and they have kindly agreed to the terms. The release will be done before the start of the scanning process as agreed.
Thank you for your kind recommendation to increase the fund for a better scanner. The scanner we proposed has been tested in different Indian GLAM partnerships and has got good reviews from them. There is little troubleshooting reported and there is scope for maintenance and repair in India. We have confirmed from few vendors that the latest models i.e. CZUR ET24 Pro or ET25 Pro is still not available in India and needs to be imported which will make maintenance and repair difficult and costly. We also have no on-ground reviews of its performance from any of our contacts. We appreciate your offer but I think it would be practical to stick to the ET18 Pro model for now. – Dr. Bir (talk) 16:26, 14 October 2022 (UTC)[reply]

Funding recommendation[edit]

Hi Dr. Bir,

Thank you for responding to the suggestions made by the South Asia Regional Funds Committee. We are happy to fully fund the project and wish you success in this project. Please note that it would be ideal if you can provide regular updates about the progress of your project here on the talk page for the benefit of committee members.

On behalf of the South Asia Regional Funds Committee J. Balaji and THasan (WMF) (talk) 05:42, 18 October 2022 (UTC)[reply]

Request for extension of the Rapid Fund project (ID:21974356)[edit]

Hello,

I, Tanmay Bir, Wikimedia username Dr. Bir. would like to inform you that the completion date of my rapid fund project ID:21974356 is 31 October 2123 but due to some unanticipated circumstances, the work is getting delayed. This is because I have faced two significant problems. 1) The required scanner was unavailable then in the Indian market and it was received in the end of January 2023. 2) The scanner’s lens area is not covering two pages at a time, which is very much unanticipated to us which is causing to slow down the whole work speed.

In spite of this we are working hard and trying to complete this work as early as possible. In this context please accept our request by extending the tenure for another six months. We will not be requesting for any more money for this project.

Thanking you in anticipation

Tanmay Bir Dr. Bir (talk) 07:21, 11 October 2023 (UTC)[reply]

Hello @Dr. Bir, your request has been approved. To confirm, the new grant end date is 30/04/2024 with a reporting deadline on 31/05/2024. Good luck with your project. DSaroyan (WMF) (talk) 09:10, 11 October 2023 (UTC)[reply]