Research:Mathematical literacy and editing topics on Mathematics on Wikipedias

14:12, 6 May 2017 (UTC)
João Alexandre Peschanski
Duration:  2017-April — 2017-October

This page documents a research project in progress.
Information may be incomplete and change as the project progresses.

This research project intends to associate countries' level of mathematical literacy and the quality and quantity of contributions from these countries on entries on Mathematics on Wikipedias. This will provide a nonanecdotal understanding of how structural constraints --such as cross-national differences in education and scientific culture levels-- have an impact on users' editing strategy and eventually the quality of different Wikipedias as reliable resources on science.

The general intuition is that the lower the level of numeracy in a country the lower the quantity and quantity of edits on entries on Mathematics that originate from this country --we want to test this intuition. To our knowledge the sole academic piece that has associated scientific culture and Wikipedia quality has major methodological flaws and does not provide reliable evidences on this association.

A motivation for this study is that Brazil --from where this research is being proposed-- ranks amongst the countries with the lowest level of numeracy in several cross-national studies on numeracy, particularly OECD's PISA, and as community members we have been deeply involved in improving entries on Mathematics on Wikipedia in Portuguese (results of our effort are published on two programs at the Outreach Dashboard[1][2]), as we believe a higher-quality Wikipedia in terms of scientific content might also have an impact on a country's scientific-culture level. Our research team has funding support from the São Paulo Research Foundation and the University of São Paulo. The research project that FAPESP has accepted to fund is available on the Commons.[3]


Data collection[edit]

Mathematical culture levels[edit]

We will focus on the 70 countries that are assessed by OECD's 2015 PISA and use their data as the variable for approaching mathematical culture levels in these countries.[4] There are known methodological caveats with this data, that will be mentioned in our paper, but to our knowledge this is the most reliable cross-national dataset on scientific culture, including numeracy.

The use of the PISA dataset will be facilitated by an OECD member. NeuroMat will host a conference on numeracy in mid-May, with the presence of a keynote speaker from the OECD who has agreed to provide a training on how to operationalize the PISA dataset.[5]

Editing topics on Mathematics[edit]

The strategy for getting the data on the quality and quantity of edits on Mathematics that originates from each country that PISA takes into consideration has been planned in a sequence of steps. The first step is to list all entries on Mathematics on all Wikipedias. We have tested three different strategies for generating this dataset and have come up with what we think is the most reliable strategy. This step is mostly done and as soon as our dataset is completely ready, we will make it available to the community. Preliminary results were sent to Dr. Aaron Halfaker (WMF research team), who has provided early suggestions on how to generate this dataset.

The second step is finding out from where edits --and how many and of what size-- are coming. We are still discussing how the data we expect should look like, but we understand aggregated-level data of edits from countries assessed by PISA should be sufficient for what we expect to achieve. We will need total number of edits and bytes added originating from the countries we are interested in on entries on Mathematics that we have listed. This data output should discriminate between "good" edits, and edits that were reverted and their reversions (editing activity that is associated to "bad" edits).

The best way to present this data is not fully decided yet and should be specified as soon as possible.

From a chat with Dr. Halfaker, we understand the WMF only keeps records of edits for short periods of time. This will be noted on our a methodological section, as we will only have data from a small period. We hope to be able to get data prior to holiday period.


  • 4/1: start of research project Yes check.svg Done
  • 4/15-5/15: data collection (first step of entries on Mathematics) Yes check.svg Done (on 5/25)
  • 5/16: meeting on operationalization of PISA dataset Yes check.svg Done
  • 5/20: request of data to the WMF research team Yes check.svg Done (on 5/30 -- as an email to Ahalfaker / EpochFail)
  • 7/15: submission of conference paper to have inputs from academic peers on research (planned)
  • 9/20: first draft of research article (planned)
  • 10/30: final paper (planned)

Policy, Ethics and Human Subjects Research[edit]

There is no ethical concern for this research. Wikimedia data will be of two types: content production (nothing special here) and aggregated-level data of from where edits on topics on Mathematics (number of valid edits and size of valid edits discriminated from reverted edits and their reversions) originate. As data should not be sent at the individual level, no security concern should be expected, thus being in full compliance to the security expectations of the WMF team.


Once your study completes, describe the results an their implications here. Don't forget to make status=complete above when you are done.


(As listed on research project submitted to FAPESP)

