Jump to content

Wikimedia Indonesia/Hibah Riset Wikidata 2024/Building Education Infrastructure Knowledge Base in West Papua from Wikidata/Laporan

From Meta, a Wikimedia project coordination wiki

Title

[edit]

Building Education Infrastructure Knowledge Base in West Papua from Wikidata

Research Team

[edit]
  • Leader: Suardi Sahid - Universitas Papua - s.sahid@@unipa.ac.id
  • M. Zaki - Universitas Papua - zakimkw@@gmail.com
  • M.Pungkas Nurrohman - Universitas Papua - m.pungkas@@unipa.ac.id

Introduction

[edit]

An important challenge in school data centers is finding accurate and reliable data. Comparing data is needed to see the various information that currently exists. These information systems are usually independent of each other, and the ability to obtain information by connecting various data sources is needed. In this work, we use Wikidata as the main data to be compared with Dapodik as the main school data information in Indonesia. The subject of the study focuses on school data in West Papua. The data obtained from the analysis will be used as the main data to see what property and values from Dapodik can be added as additional information in Wikidata. To collect data, Wikidata Query Service based on SPARQL applied to identify gaps and relevant entities in Wikidata and the Kemdikbud Portal. The data shows that out of 37 properties, only 11 properties can be added as new values in Wikidata. The purpose of this platform affects the various entities displayed on their platform. Where other factors, namely data access and the number of contributors, are the causes of incomplete school data in Wikidata West Papua.

Dapodik, the data collection system for Indonesian education, which collects comprehensive educational data across various levels and can be integrated into broader Open Data platforms[1]. On the other hand, Wikidata, a free and open knowledge base run by Wikimedia, provides a sustainable solution for the storage and representation of Open Data. Wikidata guiding principles are very similar to those of open data, allowing the integration of public datasets, such as those from Dapodik, into a global and easily accessible knowledge base. In contrast, Wikidata has become the largest structured crowdsourced knowledge base on the web [2]. With its ability to integrate data from multiple sources, this database serves as a powerful tool for knowledge management and data integration. Its structured format and rich linking capabilities make it an ideal resource for machine learning and artificial intelligence applications. Despite the massive amount of data, most research and industrial use cases require a subset of items, statements, and metadata.

Different works have been done to capture the use of Wikidata. The first work is by Shafee et al [3] who conducted research about 10 quick Tips for editing Wikidata which is Information in Wikidata is organised into statements in their work introduces reification mechanisms based on authoritative namespaces, and – partially ambiguous – natural language definitions.. on the other side, Beghaeiraver [2] stated Wikidata is highly interlinked and connected to many other datasets, but it is also very rich, complex, and not available in RDF. The last Baker et al [4]. Describe how Wikidata has been used for bibliographic information and provide some scientometric statistics on this information.

The purposes of this research are (1) to assess the current state of education infrastructure data about West Papua in Wikidata; and (2) to demonstrate how open data like Wikidata can support equitable development, especially in areas that lack resources.

Methods

[edit]

This research applied a qualitative descriptive approach using secondary data sources. Secondary sources are used to describe, summarize, or analyse information or details originally presented in another source; meaning the author, in most cases, did not participate in the event. This type of source is written for a board audience and will include definitions of discipline specific terms. Use secondary sources to obtain an overview of a topic and help identify relevant primary data sources.

Overview of Wikidata Process

[edit]

Wikidata operates as a collaborative, crowdsourced knowledge database where anyone can contribute or edit the data, even without an account. It is a document-oriented database organised around items, which represent a distinct topic, concept, or object. Each item has a unique persistent identifier, a positive integer starting with the uppercase letter Q, known as the "QID" (Wikidata Q identifier). Based on its type, the items in Wikidata can be likened to documents. Each Wikidata item contains structured data in the form of statements, which define relationships with other documents, items or values.

Overview of Dapodik Process

[edit]

Dapodik (Data Pokok Pendidikan) is the official school data management system developed by the Indonesian Ministry of Primary and Secondary Education. It consolidates national school level data across Early Childhood Education (PAUD), Basic Education, and Secondary Education (Dasmen) where the data will be divided by level such as Elementary School, Middle School, High School, and Vocational School. The data will be updated annually, typically at the end of November (example: for the 2022/2023 school year, the data will be updated at the end of November 2022).

Data Sources

[edit]
  • Wikidata: Data was retrieved using the Wikidata Query Service and SPARQL (SPARQL Protocol and RDF Query Language). The main query included an administrative entity(P131)” with the value “West Papua (Q5096)” in tabular form. The output was exported through Web services in several formats, including JavaScript Object Notation, or JSON, and Resource Description Framework, or RDF.
  • Dapodik: To access school-level data in West Papua via Dapodik, users must hold a valid Dapodik account and access it through an online portal or the Dapodik application. School data can be accessed via prefill or by logging into the Dapodik application.Accessed by using the name of the school as the key word in Dapodik. The data can be seen automatically in the platform or downloaded in .XLS, .CSV, or .PDF format.

Research Timeline

[edit]
Kegiatan Nov Des Jan Feb Mar Apr Mei
Persiapan dan Pertemuan dan pembentukan Team Dilaksanakan
Penyusunan Instrumen dan Rencana Pengambilan data Dilaksanakan
Pengambilan Data di Lapangan Dilaksanakan Dilaksanakan
Pengolahan Data Dilaksanakan Dilaksanakan
Finalisasi dan Penyusunan Data Dilaksanakan Dilaksanakan
Publikasi Dilaksanakan Dilaksanakan

This research has been conducted since November 2024-May 2025. In the implementation of the research for approximately 6 months. Starting from the formation of the research team in November which was then continued in December in compiling the instrument. Then in January we continued to review literature and data processing. Then in February, March and April is the data processing as well as continued with editing school data. Because the duration of this research is only 6 months and this research still requires time for revision and data processing so that the duration given is not enough to complete this research to the publication stage. Therefore in the final stage of this research we try to display the final data that we have compiled together with the team.

Results and Discussion

[edit]

Dapodik as main sources that will be used as a comparison data to find out the magnitude of the Wikidata version of schools in West Papua and number of schools on the Dapodik. So, through identification we can see the completeness of school data that has not been recorded on Wikidata. While getting the data from Dapodik we used the Portal to access the school data by using the name of the school or we can find the school data based on the province.

In particular, this paper makes the following contributions.

  • We analysed existing School data in Dapodik and mapping them in Wikidata to identify the Entities/Properties Especially Data School in West Papua.
  • We surveyed and verified existing records of schools data by visiting the school and/or local Education Quality Assurance bureau (Balai Penjaminan Mutu Pendidikan lokal)
  • We updated Wikidata data records related to schools in West Papua,
  • We assess the completeness of Wikidata records by comparing data in Dapodik with Wikidata using an adapted version of existing Wikidata’s Entity Schema on Elementary, Middle and High School.
  • We reported the updated records of Wikidata in this paper. (West Papua)

As a result of this project. We have updated 759 records, synchronising the records with primary sources, which are Sekolah Kita and Dapodik data bases. Including validating 4 basic Wikidata properties for elementary school in the records as follows:

The Swedish School Entity Scheme was displayed in the data below. In this case the Sweden Entity used as an example in creating the new Entity Schema for school data in Indonesia. Based on the finding of this research we plan to create the new Entity scheme including Curriculum and Special Need Served.

Entity Schema School in Sweden Indonesian Entity School Plan
<http://www.wikidata.org/entity/>
start = <@schoolunit>
<http://www.wikidata.org/entity/>
start = <@schoolunit>
wdt:P31 [ wd:Q88965416 ] ;
# instance of school unit
wdt:P31 [ wd:Q97042318,Q97500812,Q97595888 ] ;
# instance of school unit
wdt:P7894 xsd:string ;
# identifier
wdt:P4128 xsd:string ;
# identifier
wdt:P131 .{1} ;
# administrative area
wdt:P131 .{1} ;
# administrative area
wdt:P17 .{1} ;
# country
wdt:P17 .{1} ;
# country
wdt:P5955 .{1} ;
# Sekolah Kita
wdt:P5884 .{1} ;
# dapodik
.{1} ;
# Curriculum
.{1} ;
# Special Need Served

1. Overview of Wikidata and Dapodik School Data

[edit]

Based on the Dapodik system, in 2025, there are 1362 recorded schools in West Papua, both from Elementary and High School levels. Systematically, this database has thirty-seven categories and is divided into three main sections: Profile, Recapitulation, and Address (Figure 1.a). However, Wikidata provides 12 public data with QIDs as it is shown in Figure 1.b.

Fig. 1 Comparison between Wikidata Items vs Dapodik Items


School Data

[edit]

Table 1. School data between Dapodik and Wikidata in 2025

Platform TK/Kb SD SMP SMA Total
Dapodik 541 583 167 71 1362
Wikidata 522 167 71 759
Difference (Dapodik-Wikidata) 541 61 603

Table 1 presents a comparison of the number of schools by education level between two data platforms, Dapodik and Wikidata in 2025. In general, Dapodik recorded a higher number of schools than Wikidata, with a total of 1.362 schools, while Wikidata recorded 759 schools. The overall difference is 603 schools.

Overall, the data shows a persistent gap between these two systems: Dapodik shows a consistent increase in school records, while Wikidata reflects a decrease, especially in providing all-inclusive and up-to-date data. This may affect the accuracy of geospatial analyses or any projects that depend on Wikidata for data.These variations highlight the problems in ensuring accuracy and completeness of educational data on different platforms. It is important to solve these problems in order to improve the planning of educational programs and the allocation of resources. To do this, it is necessary to improve synchronization and validation processes, guaranteeing the information is correct and up-to-date.

2. Factors affecting the completeness of school data in Wikidata.

[edit]

According to the data analysis, the completeness of school data is influenced by four primary elements: (1) Immediacy, not every editor takes ownership of the accuracy of the information in Wikidata because this database acts as a for-profit website; (2) Public information, not every Dapodik statement can be added to Wikidata as a new Entity or Property; (3) The access of getting the valid data school is also becoming the main problem; and (4) The number of contributors in updating the data school in wikidata specially data school in West Papua.

3. An overview of how Wikidata Build Education Infrastructure in West Papua.

[edit]

Wikidata serves as a central repository for structured data, allowing all Wikimedia projects to access and share consistent information. As an open access platform, several sources can be added to Wikidata, with references to support the validation of the data. Figure 2 shows Wiki Data's role in presenting school data in West Papua.

Fig. 2 Wikidata's Role in Presenting School Data in West Papua

In presenting school data in West Papua Province, Wikidata not only plays in completing the information on Wikipedia but also in presenting school data, especially in finding the coordinates of schools To obtain school data, we can use the query service tool to obtain the school database. The data presented on Wikidata is multilingual for further customization based on preference. The collaborative model of Wikidata enables a global editing community to contribute, with changes tracked and referenced for validation [5]. This democratization of data creation fosters greater inclusivity and addresses representation gaps in mainstream datasets.

Furthermore, Wikidata's integration with external tools such as Scholia offers enhanced visualization capabilities, transforming structured data into interactive profiles and graphs. Originally developed to support scholarly communication, Scholia’s success illustrates how tools built on Wikidata’s infrastructure can be repurposed to display educational statistics, institutional profiles, or school-specific data[5].

In contrast to proprietary or government systems like Dapodik, which centralizes school data but limits public access and reusability, Wikidata provides an open, reusable knowledge graph. Dapodik has proven instrumental for government planning, offering structured categories such as teacher data, infrastructure, and enrollment [6] [7]. However, Dapodik is constrained by access policies and system limitations that hinder data interoperability. Several studies show that while Dapodik’s system quality and data reliability are high, its impact on local data transparency is limited by operator competence and technical infrastructure[8].

Building education infrastructure in Wikidata does not merely involve uploading data, it also involves aligning local education realities with global linked data standards. The challenge lies in mapping local properties (e.g., Dapodik IDs, curriculum models, or special needs support) to Wikidata’s existing property schema, or in creating new properties through community proposals. As the research on operator performance in Jember Regency highlights, infrastructure quality and digital competence significantly impact data accuracy and update frequency [8]. This observation is consistent with issues in West Papua, where limited internet access, low digital literacy among school staff, and insufficient contributor engagement result in underrepresentation of education.

Conclusion and Recommendations

[edit]

This paper revealed that not all of the Dapodik data is available. The Dapodik administrator may access the portal editing, however the school is unable to access certain school information. Once a year, the access is made available at the end of November). On the other hand, anyone can contribute information to Wikidata since it is open access. In addition, the completeness of the data and the private information were caused by both public and private data. Only 11 of the 37 properties in Dapodik can be added to Wikidata as properties. In West Papua, there are 821 public and private schools ranging from elementary to high school, and 61 of them are not listed in Wikidata. The role of community in supporting the completeness of the school data in rural areas especially in West Papua is important.

This paper revealed that not all of the Dapodik data is available. The Dapodik administrator may access the portal editing, however the school is unable to access certain school information. Once a year, the access is made available at the end of November). On the other hand, anyone can contribute information to Wikidata since it is open access. In addition, the completeness of the data and the private information were caused by both public and private data. Only 11 of the 37 properties in Dapodik can be added to Wikidata as properties. In West Papua, there are 821 public and private schools ranging from elementary to high school, and 61 of them are not listed in Wikidata. The role of community in supporting the completeness of the school data in rural areas especially in West Papua is important.

This research will give an insight to the public to be concerned in giving contribution to provide the valid and complete data in Wikidata as the open source which is able to access to the public. Hopefully, through this research the data can be used as information to update the school data in West Papua specially the data in kindergarten. This study still needs to be further developed through case studies in adding entity schemas to school data to complete the completeness of school data information on Wikidata.

References

[edit]
  1. Dapodik (2025). "Data Induk Satuan Pendidikan". Portal Data. 
  2. a b Beghaeiraveri, S (2021). "Experiences of Using WDumper to Create Topical Subsets from Wikidata". 
  3. Shafee, T (2023). "Ten quick tips for editing Wikidata,” Plos Computational Biology". 
  4. Baker, J (2024). "I have always found the whole area a minefield: Wikidata, historical lives, and knowledge infrastructure,". Plos Computational Biology. 
  5. a b Lemus-Rojas, M (2018). "Creating structured linked data to generate scholarly profiles: A pilot project using Wikidata and Scholia". Journal of Librarianship and Scholarly Communication. 
  6. Rusnati, I (2022). "Pemanfaatan sistem data pokok pendidikan (Dapodik) dalam pengelolaan sekolah dasar". Jurnal Administrasi Pendidikan. 
  7. Yunis, R (2017). "Analisis kesuksesan penerapan sistem informasi data pokok pendidikan (Dapodik) pada SD Kabupaten Batu Bara". IJCCS. 
  8. a b Qomariah, N (2024). "The influence of work facilities and competence on the performance of Dapodik operators at public middle schools in Jember Regency". International Journal of Management Science and Information Technology.