Jump to content

Grants:Programs/Wikimedia Community Fund/Rapid Fund/Extract Moroccan legal texts semi-automatically to Wikisource (ID: 22945995)

From Meta, a Wikimedia project coordination wiki
statusNot funded
Extract Moroccan legal texts semi-automatically to Wikisource
proposed start date2025-04-11
proposed end date2025-08-31
requested budget (local currency)10280 MAD
requested budget (USD)1049.95 USD
grant typeIndividual
funding regionMENA
decision fiscal year2024-25
applicant• ForzaGreen
organization (if applicable)• N/A

This is an automatically generated Meta-Wiki page. The page was copied from Fluxx, the web service of Wikimedia Foundation Funds where the user has submitted their application. Please do not make any changes to this page because all changes will be removed after the next update. Use the discussion page for your feedback. The page was created by CR-FluxxBot.

Applicant Details

[edit]
Main Wikimedia username. (required)

ForzaGreen

Organization

N/A

If you are a group or organization leader, board member, president, executive director, or staff member at any Wikimedia group, affiliate, or Wikimedia Foundation, you are required to self-identify and present all roles. (required)

N/A

Describe all relevant roles with the name of the group or organization and description of the role. (required)


Main Proposal

[edit]
1. Please state the title of your proposal. This will also be the Meta-Wiki page title.

Extract Moroccan legal texts semi-automatically to Wikisource

2. and 3. Proposed start and end dates for the proposal.

2025-04-11 - 2025-08-31

4. Where will this proposal be implemented? (required)

Morocco

5. Are your activities part of a Wikimedia movement campaign, project, or event? If so, please select the relevant project or campaign. (required)

Not applicable

6. What is the change you are trying to bring? What are the main challenges or problems you are trying to solve? Describe this change or challenges, as well as main approaches to achieve it. (required)

The Arabic Wikisource and Arabic content on the internet, in general, lack legal texts despite their importance.

Available resources on the internet are mostly in PDF format, which is unstructured and difficult to process automatically.

The primary goal of this research project is to enrich Arabic legal content with Moroccan laws, while attempting to automate or semi-automate the extraction process.

7. What are the planned activities? (required) Please provide a list of main activities. You can also add a link to the public page for your project where details about your project can be found. Alternatively, you can upload a timeline document. When the activities include partnerships, include details about your partners and planned partnerships.

Methodology: use artificial intelligence tools and python scripts to extract and structure legal texts, then upload them to Wikisource. The idea is to reduce the time required to manually prepare and review legal texts.

Here is an article I created on Wikisource: https://ar.wikisource.org/wiki/%D9%85%D8%AF%D9%88%D9%86%D8%A9_%D8%A7%D9%84%D8%AD%D9%82%D9%88%D9%82_%D8%A7%D9%84%D8%B9%D9%8A%D9%86%D9%8A%D8%A9_(%D8%A7%D9%84%D9%85%D8%BA%D8%B1%D8%A8)

The stages of this research project are:

  • Further benchmark and test text extraction methods and tools, using Optical Character Recognition (OCR) techniques.
  • Combine OCR with Document Layout Analysis to prepare a structured text. We may need Large Language Models (LLM) to help parse text and to structure the layout.
  • Implement text extraction and processing using python scripts.
  • Review and create articles on Arabic Wikisource, along with their categories and their Wikidata items.


8. Describe your team. Please provide their roles, Wikimedia Usernames and other details. (required) Include more details of the team, including their roles, usernames, Wikimedia group, and whether they are salaried, volunteers, consultants/contractors, etc. Team members involved in the grant application need to be aware of their involvement in the project.

- Wikimedia Username: ForzaGreen, software developer and data scientist. Has experience in Wikiversity and Wikidata. Will be in charge of all the work.

I may hire some task-based workers for reviewing the extracted text, if this task takes too much time, but not sure.

9. Who are the target participants and from which community? How will you engage participants before and during the activities? How will you follow up with participants after the activities? (required)

The target is the Arabic community. The type of content is almost non-existant.

10. Does your project involve work with children or youth? (required)

No

10.1. Please provide a link to your Youth Safety Policy. (required) If the proposal indicates direct contact with children or youth, you are required to outline compliance with international and local laws for working with children and youth, and provide a youth safety policy aligned with these laws. Read more here.

N/A

11. How did you discuss the idea of your project with your community members and/or any relevant groups? Please describe steps taken and provide links to any on-wiki community discussion(s) about the proposal. (required) You need to inform the community and/or group, discuss the project with them, and involve them in planning this proposal. You also need to align the activities with other projects happening in the planned area of implementation to ensure collaboration within the community.

The following tasks have been done to prepare a prototype:

  • Benchmark and test existing OCR tools for Arabic, both open source and proprietary. The best tool seems to be Microsoft Azure Document Intelligence, because it gives very accurate results in Arabic, and has some interesting features as layout detection, and custom model training.
  • Created and tested python scripts for different tasks: query OCR tools, process and clean up documents, convert documents to structured formats.
  • Searched government websites to map existing law documents. All documents are in pdf format.

This is the prototype output: https://ar.wikisource.org/wiki/%D9%85%D8%AF%D9%88%D9%86%D8%A9_%D8%A7%D9%84%D8%AD%D9%82%D9%88%D9%82_%D8%A7%D9%84%D8%B9%D9%8A%D9%86%D9%8A%D8%A9_(%D8%A7%D9%84%D9%85%D8%BA%D8%B1%D8%A8)

12. Does your proposal aim to work to bridge any of the content knowledge gaps (Knowledge Inequity)? Select one option that most apply to your work. (required)

Language

13. Does your proposal include any of these areas or thematic focus? Select one option that most applies to your work. (required)

Public Policy

14. Will your work focus on involving participants from any underrepresented communities? Select one option that most apply to your work. (required)

Linguistic / Language

15. In what ways do you think your proposal most contributes to the Movement Strategy 2030 recommendations. Select one that most applies. (required)

Innovate in Free Knowledge

Learning and metrics

[edit]
17. What do you hope to learn from your work in this project or proposal? (required)

- Learn AI tools for Arabic

  • Organise articles in Wikisource and Wikidata
18. What are your Wikimedia project targets in numbers (metrics)? (required)
Number of participants, editors, and organizers
Other Metrics Target Optional description
Number of participants 1
Number of editors 1
Number of organizers 1
Number of content contributions to Wikimedia projects
Wikimedia project Number of content created or improved
Wikipedia
Wikimedia Commons
Wikidata 20
Wiktionary
Wikisource 20
Wikimedia Incubator
Translatewiki
MediaWiki
Wikiquote
Wikivoyage
Wikibooks
Wikiversity
Wikinews
Wikispecies
Wikifunctions or Abstract Wikipedia
Optional description for content contributions.

N/A

19. Do you have any other project targets in numbers (metrics)? (optional)

No

Main Open Metrics Data
Main Open Metrics Description Target
N/A N/A N/A
N/A N/A N/A
N/A N/A N/A
N/A N/A N/A
N/A N/A N/A
20. What tools would you use to measure each metrics? Please refer to the guide for a list of tools. You can also write that you are not sure and need support. (required)

Manual tracking.

Financial proposal

[edit]
21. Please upload your budget for this proposal or indicate the link to it. (required)

https://docs.google.com/spreadsheets/d/1GXeIvDHENWeaaAQPjMz6ZAzHz_E058lBQpv6FBTSJFQ/edit?usp=sharing


22. and 22.1. What is the amount you are requesting for this proposal? Please provide the amount in your local currency. (required)

10280 MAD

22.2. Convert the amount requested into USD using the Oanda converter. This is done only to help you assess the USD equivalent of the requested amount. Your request should be between 500 - 5,000 USD.

1049.95 USD

We/I have read the Application Privacy Statement, WMF Friendly Space Policy and Universal Code of Conduct.

Yes

Endorsements and Feedback

[edit]

Please add endorsements and feedback to the grant discussion page only. Endorsements added here will be removed automatically.

Community members are invited to share meaningful feedback on the proposal and include reasons why they endorse the proposal. Consider the following:

  • Stating why the proposal is important for the communities involved and why they think the strategies chosen will achieve the results that are expected.
  • Highlighting any aspects they think are particularly well developed: for instance, the strategies and activities proposed, the levels of community engagement, outreach to underrepresented groups, addressing knowledge gaps, partnerships, the overall budget and learning and evaluation section of the proposal, etc.
  • Highlighting if the proposal focuses on any interesting research, learning or innovation, etc. Also if it builds on learning from past proposals developed by the individual or organization, or other Wikimedia communities.
  • Analyzing if the proposal is going to contribute in any way to important developments around specific Wikimedia projects or Movement Strategy.
  • Analysing if the proposal is coherent in terms of the objectives, strategies, budget, and expected results (metrics).

Endorse