Research:Converting Wikidata into a lexical resource and knowledge database in Arabic dialects

From Meta, a Wikimedia project coordination wiki

This page documents a research project in progress.
Information may be incomplete and change as the project progresses.
Please contact the project lead before formally citing or reusing results from this page.


Nowadays, all Arab states are characterized by a kind of diglossia[1] · [2]. In fact, although Modern Standard Arabic is the main language used in official life, work, administration and education, it is not spontaneously used by Arab people in daily life communication[1]. Instead, they use Arabic dialects for such a purpose[1]. Although these dialects are considered as varieties of Arabic[1], they differ from each other and from Modern Standard Arabic at morphological, phonological, orthographic and semantic scale[3] · [4].

Examples[5]:

  • قيدت آش باش نشري في الكاغذ و مشيت للعطار ياخي ما لقيتش اللي نلوج عليه is commonly understood by speakers of Tunisian Arabic. However, it is intelligible for an Arab person from the Middle East although it involves no loanword from European languages. Just for information, it becomes "لقد سجلت ما سأشتريه في الورقة و ذهبت إلى السوق و لكنني لم أجد ما أبحث عنه." in Modern Standard Arabic and "I wrote on the sheet all the things that I should buy and I went to the market. However, I did not find what I needed."...
  • Many false friends between the varieties of Arabic: بندق means "pinenut" in Tunisian and "hazelnut" in Modern Standard Arabic, بطيخ means "melon" in Tunisian and "watermelon" in Modern Standard Arabic...

The existence of such significant linguistic differences between Arabic dialects and Modern Standard Arabic can let users misunderstand an important (even if slight) part of what they read in Arabic Wikipedia.

To solve this problem, we propose to translate the sum of all human knowledge to Arabic dialects by adding labels, descriptions and aliases in these dialects to all Wikdata entities. We propose as well to translate Wikidata's interface into Arabic dialects so that users can reach the information they need in Wikidata without having to be proficient in Modern Standard Arabic or in a foreign language.

Methods[edit]

Adding labels and aliases: To add labels in a given Arabic dialect to Wikidata entities, I will use Wikidata query service to retrieve labels in Modern Standard Arabic for Wikidata entities.

After, I will process the retrieved data using Microsoft Office Excel 2007. I will keep the labels that are the same in the Arabic dialect and just change the ones that are different with reference to the following resources:

Finally, I will add the processed data to Wikidata using QuickStatements.

Adding descriptions: To add a given description in a given Arabic dialect to Wikidata entities, I will use Wikidata query service to find the entities that has the patterns that correspond to the description.

After, I will process the retrieved data using Microsoft Office Excel 2007 by just adding the required description.

Finally, I will add the processed data to Wikidata using QuickStatements.

Translating the Mediawiki system messages: Mediawiki system messages used in Wikidata's interface are translated into Arabic dialects using Translatewiki.

Results[edit]

During this research project,

  • We succeeded to add labels, descriptions and aliases in Arabic dialects to many Wikidata entities and properties.
  • We translated Mediawiki system messages so that the Wikidata interface can be seen in Arabic dialects.

Added labels[edit]

Category Modern Standard Arabic Tunisian, Arabic Script
Iraqi people List List
Syrian people List List
Languages List List
Countries List List
Emirati people List List
Plants List List
Fruits List List
Colours List List

Mediawiki translation[edit]

Language Status
Algerian Arabic Statistics
Egyptian Arabic Statistics
Moroccan Arabic Statistics
Tunisian, Arabic Script Statistics
Tunisian, Latin Script Statistics

References[edit]

  1. a b c d Mohamed, Maamouri, (1998). "Language Education and Human Development: Arabic Diglossia and Its Impact on the Quality of Education in the Arab Region.". 
  2. Zughoul, Muhammad Raji (1980). "Diglossia in Arabic: Investigating Solutions". Anthropological Linguistics 22 (5): 201–217. 
  3. "Encyclopedia of Arabic Language and Linguistics - Brill Reference". referenceworks.brillonline.com. Retrieved 2018-03-10. 
  4. Aguadé, Jordi (2006-03-05). "Writing dialect in Morocco". EDNA, Estudios de dialectología norteafricana y andalusí (in es-ES) 10: 253–274. ISSN 1187-7968 Check |issn= value (help). 
  5. Turki, Houcemeddine; Vrandečić, Denny; Hamdi, Helmi; Adel, Imed (2017-10-30). Using WikiData as a multi-lingual multi-dialectal dictionary for Arabic dialects.