Grants:Project/Hackfish/Global food and nutrition database

From Meta, a Wikimedia project coordination wiki
statusnot selected
Global food and nutrition database
summaryUsing numerous nutrition data sources from around the world, we will develop a Wikibase instance for global nutrition data that can be incorporated into Wikidata easily. The project aims to engage participants from diverse Wikimedia communities to make a comprehensive global database.
targetWikidata
type of granttools and software
amount36,464USD
type of applicantindividual
advisorSab742StephaneGigandet
contact• 5colorsaday@gmail.com
this project needs...
volunteer
affiliate
join
endorse
created on08:11, 19 February 2020 (UTC)


Project idea[edit]

Photo of vegetables by Jasper Greek Golangco

What is the problem you're trying to solve?[edit]

What problem are you trying to solve by doing this project? This problem should be small enough that you expect it to be completely or mostly resolved by the end of this project. Remember to review the tutorial for tips on how to answer this question.

Food, nutrition, and health are some of the most highly engaged topics in the Wikimedia ecosystem, and around the world. Food Composition Data (FCD) is a key piece connecting those three topics, providing nutrient data for each food item. There is a need for an open and structured database for a global FCD and Wikimedia - especially Wikidata - is a perfect place to accommodate some of these data. Due to the diversity and complexity of the existing FCDs, it would be helpful to have a placeholder that can accommodate all the details from the existing FCDs, from which Wikimedia/Wikidata project editors can pull information deemed appropriate for Wikimedia. Accommodating all the details are important as the needs within Wikimedia projects can change over time.

We believe that Wikimedia projects and communities are well suited to create this complex knowledge base. Many FCDs - which currently come in various different formats (e.g. Portable Document Format (Q42332), comma-separated values (Q935809)) - include varying degrees of details. Nutrient content of unprocessed food items (e.g. apples) can also vary for the same item from different areas and times because of changing characteristics such as climate and terroir. However, the current FCDs are not well-suited for reflecting these changes. In fact, research institutes and intergovernmental agencies have attempted to create a global FCD in the past and none has succeeded to this date. Development and maintenance of such database are difficult if the contributors are limited to small/closed groups of researchers and employees in this field. Importantly, even though there are also wide regional variations in foods that are commonly consumed, some places lack access to regionally appropriate FCD, up-to-date FCD, or FCD in their own languages, leading to disparities in data availability and accessibility and ultimately, in scientific evidence in health research. These issues are highly related to many of the issues raised in Sustainable Development Goals. We need a more open and collaborative system.

What is your solution?[edit]

For the problem you identified in the previous section, briefly describe your how you would like to address this problem. We recognize that there are many ways to solve a problem. We’d like to understand why you chose this particular solution, and why you think it is worth pursuing. Remember to review the tutorial for tips on how to answer this question.


1. What is the solution to this problem?

We propose a Wikibase instance, WikiFCD, to create a global nutrient FCD. This wiki-based system will engage participants from diverse wiki communities to make this database universally accessible, up-to-date, and comprehensive.

Through this pilot project, we will write schemas to describe our data model based on five large food composition datasets that are already available online and develop good documentation for both project development and use. The focus on equity and global nature of the project requires diverse participants, which is very much in line with the missions of projects supported by the Wikimedia Foundation. Through this pilot project, we hope to show how peer production can contribute to the improvement in data/knowledge disparities in global nutrition. We believe that WikiData is an awesome way to build connections between a range of free culture related nutrition projects like Open Food Facts that might do the same.

We will test several automated and manual methods to populate the wikibase with nutrient data from 5 food composition databases from around the world (see the Project Plan section] section for details). We will write schemas to describe our data model. We will map our properties to Wikidata properties.

Based on the in-depth discussion both with various WM members and nutrition researchers over the past year, we decided that it would be most useful to diverse communities with diverse needs if we build a comprehensive Wikibase instance for all existing FCDs from around the world. While some communities may be only interested in the most common nutrient information (e.g. total fat, trans fat, calories, sugar), other communities may want more specific information (e.g. Phytic acid (by HPLC/HPAE) : Zinc ratio) or other information related to the food item (e.g. scientific names, varieties of fruits, geo-locations).

2. Why is this a good idea?

  • First, this Wikibase instance will significantly improve the usability of FCD from different sources for diverse users - from WikiProjects and Wikipedia editors and viewers to academic researchers to public health workers. WikiProject food and Drink on English Wikipedia and its equivalents in other languages are universally popular WikiProjects among editors and likewise, many articles on food and drink are within the top 10% of any Wikipedia's articles by pageviews. This new project can contribute to a topic that is of high interest to many people.
Building a structured dataset is also a key step in identifying most appropriate data to borrow in resource-poor settings where up-to-date, detailed, and regionally appropriate FCD are not readily available. This new database will also open up ways to explore new research questions to explore more nuanced nutrition data (e.g. changes in nutrient content of the same product, depending on the climate conditions of the year), which can potentially make substantial advances in nutrition and health research.
  • Secondly, by creating an instance of Wikibase for this project, we will be able to design our own data models, with input from Wikidata, to incorporate data from heterogeneous data sources. If subsets of the data are appropriate for Wikidata, we will be able to provide machine-actionable ShEx schemas that will help us prepare data for other systems. In this way the data will be readily-available for incorporation into Wikidata if desired.

WikiFCD can make significant contributions to various WM communities with interests in nutritional data. The Wikibase we are creating will be an expert-curated data set that is mapped to Wikidata. The Wikidata community as well as any other wikibase community will be able to reuse entity schemas or entity data (or both!) from our system. Data will flow back into Wikidata if desired. We strongly believe that WikiFCD will make a positive impact on WM and other communities by accommodating their varying needs and bringing more equitable access to an easily usable database. This Wikibase instance is a new and innovative approach that addresses problems that need to be, and yet have not been, solved.

  • Finally, we will complete this project with diverse communities from around the world as these FCD can be translated into/from many languages. The design of Wikibase will allow us to more easily support additional languages in the data itself, as well as in user interfaces.

Project goals[edit]

What are your goals for this project? Your goals should describe the top two or three benefits that will come out of your project. These should be benefits to the Wikimedia projects or Wikimedia communities. They should not be benefits to you individually. Remember to review the tutorial for tips on how to answer this question.

Goal #1: We will build and test the wikibase in which food composition data from diverse settings can be entered, maintained, and retrieved.

Goal #2: We will translate and link the data into other languages.

Goal #3: We will involve participants from diverse communities to make sure that all available data are accommodated and made available in this database.

Project impact[edit]

How will you know if you have met your goals?[edit]

For each of your goals, we’d like you to answer the following questions:

  1. During your project, what will you do to achieve this goal? (These are your outputs.)
  2. Once your project is over, how will it continue to positively impact the Wikimedia community or projects? (These are your outcomes.)

For each of your answers, think about how you will capture this information. Will you capture it with a survey? With a story? Will you measure it with a number? Remember, if you plan to measure a number, you will need to set a numeric target in your proposal (e.g. 45 people, 10 articles, 100 scanned documents). Remember to review the tutorial for tips on how to answer this question.

1. Proof-of-concept

a. We will use 5 databases mentioned below to test if our schema is appropriate to accommodate various information included in databases from different places.
b. Once the project is over, other databases can be entered, following the examples we develop in this project.

2. Methodology

a. We will develop a tutorial and documentation for edit-a-thon participants to follow.
b. Once the project is over, these tutorials and documentation can be used by future participants to enter and maintain the database.

3. Alignment with WMF strategy

a. One of the elements of Wikimedia’s strategy focuses on “Knowledge equity”, which includes “communities that have been left out by structures of power and privilege”.
b. Supporting multiple language communities serves this purpose, as food composition databases are more common in English and languages spoken in the EU.

Do you have any goals around participation or content?[edit]

Are any of your goals related to increasing participation within the Wikimedia movement, or increasing/improving the content on Wikimedia projects? If so, we ask that you look through these three metrics, and include any that are relevant to your project. Please set a numeric target against the metrics, if applicable. Remember to review the tutorial for tips on how to answer this question.

  • Total participants: 50
  • Number of newly registered users: 10
  • Number of content pages created or improved, across all Wikimedia projects: This is difficult to estimate but we expect that nutrient data on at least 10,000 food items will be included.

Project plan[edit]

COVID-19 planning[edit]

We had initially planned offline edit-t-athon events to engage communities in editing and using WikiFCD. We will be removing these offline events to follow WMF's new guidelines. We believe that engaging communities is a key component of any Wiki-based projects, and so we have allocated some of the budget to resources to share information on this new Wikibase and engage diverse communities (e.g. tutorials, content drive, journal articles).

Activities[edit]

Tell us how you'll carry out your project. What will you and other organizers spend your time doing? What will you have done at the end of your project? How will you follow-up with people that are involved with your project?

The product of this project will be released as free cultural works/open data.

System development
  1. Description - We will use the docker image of Wikibase created by WMDE [1]. We will use QuickStatements as well as custom bots developed using the WikidataIntegrator python library to populate the Wikibase.
  2. Outputs - A wikibase instance named WikiFCD hosted on a university server.
Data Modeling & bulk data import
  1. Description- We will use ShEx to express the schemas for our data models. We will align the properties in our Wikibase with relevant Wikidata properties.
We will first create a wikibase, based on our analyses of 2 large food composition databases as the starting examples:
1) FAO/INFOODS Analytical Food Composition Database Version 2.0 (AnFooD2.0)
2) USDA Foundation Foods database December 2019
2. Outputs- ShEx schemas and SPARQL query code
Community Engagement

UPDATE: We will not be hosting any offline events to meet WMF's COVID-19 response guidelines.

We will work with the community engagement intern to plan 3 sessions of content drive/contest to engage digital communities. The focus of the content contest will be on translation of the existing data in our Wikibase instance. The contest will run over 1 month for each session. Groups will be awarded with a recognition banner and physical gifts for either the highest number of translated items in one language or engagement of highest numbers of languages. The other session will occur towards the end of this pilot project for one week and focus on using our tutorials and documentation to add data from databases not listed in our proposal. We will provide a list of databases participants could use in this session. We will reach out to wikidata communities.

We will also work with the OFF community to test ways for other communities to easily use data from our Wikibase instance.

We will host 2 webinars to teach participants about peer production, Wikimedia projects, and how to contribute to Wiki-based project with an example of how to edit and use WikiFCD.

Outputs - Linked FCD in Wikibase
Outputs - new editors and users in WM and WikiFCD projects
Outputs - report on how WikiFCD data could be used in another Wikibase instance (based on the collaboration and feedback from the OFF community)
Documentation
  1. Description - The process of finding datasets, identifying meta-data (e.g. copyright, year of publication), entering data, translating data, and using data for analyses will be documented.
  2. Outputs - We will generate multiple ShEx schemas that will help us communicate our data model to stakeholders. We will write a tutorial for users of the system. We will write federated SPARQL queries that others may reuse that demonstrate how to combine WikiFCD data with data from Wikidata.
Communication
  1. Description - Promotion of project outputs, feedback gathering, presentation at nutrition workshops, tutoring of interested volunteers and newcomers
  2. Outputs - Blog posts, feedback reports, ShEx schemas, peer-reviewed journal articles
Project management
  1. Description - We will report our progress twice in 12 months.
  2. Outputs - Mid-term report (6 months), Final report (12 months).
WP/Month 1 2 3 4 5 6 7 8 9 10 11 12
WP1 - System development X X X X X X X X X X X
WP2 - Data Modeling X X X X X X X X
WP3 - Community engagement X X X X X
WP4 - Documentation X X X X X X X X
WP5 - Communication X X X X X X X X X X
WP6 - Project management X X X X X X X X X X X X

Budget[edit]

How you will use the funds you are requesting? List bullet points for each expense. (You can create a table later if needed.) Don’t forget to include a total amount, and update this amount in the Probox at the top of your page too!

Our budget includes the costs of two main developers who will be involved with the project data models and software engineering, for the duration of the project. Engineering/scientist positions will be filled by the grantees. The positions are split as follows: One person working on data models and model alignment with Wikidata and one person working on maintaining the Wikibase and leading bot development work. The community outreach intern (to be recruited) will work with wiki communities, academic researchers, and students to organize online events.

For dissemination reasons, we are planning to visit Wikimania, to talk with editors in person about their needs and wishes for contributing food composition data. Wikimania is the right venue for this, as it will have a large pool of editors from different Wikipedia language versions.


Item Budget
Data scientist (10 hours per week for 8 months (34 weeks)) $30x10x34 = $10,200
Software engineer (10 hours per week for 8 months) $30x10x34 = $10,200
Community outreach/communication intern (8 hours per week for 8 months) $25x8x34 = $6,800
Webinar costs (teaching consultant fee; any fees associated with online sessions) $1000 per session x 2 session = $2000
Server hosting (12 months at Johns Hopkins School of Public Health) $22x12 = $264
Documentation and open access publication $7000
Total $ 36,464

Community engagement[edit]

Community input and participation helps make projects successful. How will you let others in your community know about your project? Why are you targeting a specific audience? How will you engage the community you’re aiming to serve during your project?

  • Wikipedian communities in Seattle and in India
  • Academic nutrition communities in Boston and in Baltimore who are new to Wikimedia and peer production projects but are interested in helping (based on the seminar and discussion we have had over the past year).
  • We will host two webinars (in lieu of the workshop we'd planned for Wikimania 2020 due to COVID-19)
  • We will share our data models via ShEx schemas
  • We will share SPARQL queries that others can use to combine WikiFCD data with Wikidata
  • We will notify Open Food Facts team and community any editing events and seminars we host.

Get involved[edit]

Participants[edit]

Please use this section to tell us more about who is working on this project. For each member of the team, please describe any project-related skills, experience, or other background you have that might help contribute to making this idea a success.

Project manager (volunteer) - Mika Matsuzaki (nutritional epidemiologist)
Data modeler - Kat Thornton (information scientist)
Software Engineer- Kenneth Seals-Nutt (software engineer)
Advisor (volunteer) - Sabri Bromage (food composition advisor/nutritional epidemiologist)
Advisor - Stéphane Gigandet (Open Food Facts founder)

Community notification[edit]

Please paste links below to where relevant communities have been notified of your proposal, and to any other relevant community discussions. You are responsible for notifying relevant communities of your proposal, so that they can help you! Depending on your project, notification may be most appropriate on a Village Pump, talk page, mailing list, etc. Need notification tips?

Discussions during drafting and editing of the proposal
  • discussed project ideas and collaboration with Open Food Facts
  • email correspondence and phone calls with senior Wikidata editors.
  • Feedback on draft proposal from members of Wikimedia Cascadia and WikiPathways.
  • pinged WikiProject Food.
  • contacted a member of Wikimedians of Kerala for an Edit-a-thon idea.
  • linked to WikiProject Nutrition.

Endorsements[edit]

Do you think this project should be selected for a Project Grant? Please add your name and rationale for endorsing this project below! (Other constructive feedback is welcome on the discussion page).