Grants talk:Project/Hackfish/Global food and nutrition database

Add topic
From Meta, a Wikimedia project coordination wiki
Latest comment: 2 years ago by Vladimir Alexiev in topic Other project

Why not in Wikidata?[edit]

Could you please expand on why you choose not to enhancing Wikidata instead of putting the data elsewhere, inaccessible for rendering in other Wikimedia projects? Ainali (talk) 19:45, 19 February 2020 (UTC)Reply

I believe there is a discussion on WD itself! There are lots of detailed properties here, which is often not something WD is excited about adding, but it would be good to see that pushback explicitly stated somewhere [where interested Wikidatans who are also running into these issues can find it, and update it over time] :) –SJ talk  20:40, 19 February 2020 (UTC)Reply

Thank you for the feedback! We have been thinking about this for a while as we'd initially planned to do this in Wikidata. Knowing how variable the types and depths of information are in FCDs, it is possible that we will include data that may never be appropriate for Wikidata. We decided that it'd make sense to build a Wikibase where we can hold all the details.

Example datasets may look like this FAO data on detailed information on phytate or more standard data which includes aggregate phytate data. And this is just one nutrient; there are many more nutrient types as well as meta data. We believe that it is important have these discussions in WD. We also feel that the discussion on WD and this Wikibase database could develop simultaneously.

Our approach includes the creation of ShEx schemas Wikidata:WikiProject Schemas we will publish these schemas in Wikidata's E namespace for entity schemas. This way the data models that we are discussing could also be shared and discussed on Wikidata. Our approach is to prepare data in a Wikibase and then help coordinate getting the data into Wikidata once the Wikidata community has built consensus on how much nutrient data will be appropriate for Wikidata. Hackfish (talk) 21:52, 19 February 2020 (UTC)Reply

Other project[edit]

This proposal completely fails to explain how it differs from concurrent projects and how it could do better than them. In addition to Wikidata, a nice project is Open Food Facts, which last time I checked was entirely free software and open data. We don't necessarily need to create a competitor to them if they're doing their job fine. Maybe it would be better to contribute to their project and then import into Wikidata only the data that we have an immediate need for? All these questions need to be explored well in advance of a proposal like this. Nemo 13:31, 23 February 2020 (UTC)Reply

Thanks for the comments, Nemo! We have indeed looked at, learned from, and extensively discussed Open Food Facts, which we all think is a great project working in a very different, but adjacent, space! We have explored the questions you raised in some depth and apologize that our original proposal didn't make this clear.
Open Food Facts is very complementary because it contains processed food data from packages. While these data are also very useful—and while there might be some overlap—the datasets we're proposing to incorporate contain a lot of data on ingredients (i.e., unprocessed food data like Fuji apple) and offer substantially more detailed nutrient data than what food packages describe. For instance, one the of the links provided above in a response to another comment showing data we would include shows a dataset containing "phytic acid, determined by indirect precipitation", "phytic acid, determined by direct precipitation", "phytic acid, calculated from phytate phosphorus, by anion exchange method" and so on. This is far out of scope for Open Food Facts in its current form.
Given that our projects are both using Wikibase, we see a lot of opportunities to connect the two databases. Given that we have the goal of merging some (or all?) of our data into Wikidata at some point in the future, we think WikiData is an awesome way to build connections between a range of free culture related nutrition projects like OFF that might do the same.
Of course, it is not clear at this point whether Wikidata will ever want to incorporate data with as much detail as FDC database like our would have. The phytic acid dataset is only one example demonstrating the diversity and complexity of FCD. That's why we're proposing building this alongside existing systems and building connections between them. Toward your point, there is, to our knowledge, no other free culture/free software focused project that can accommodate FCD data of the type we are describing.
We did forget to mention that what we're proposing will definitely not be proprietary. Everything we produce will be available as free cultural works/open data. We will be including public datasets that we have permission to release and redistribute. All the software we use and build with free software. Hackfish (talk) 00:57, 24 February 2020 (UTC)Reply
Thank you for the answer. I don't completely understand: does your project aim to be a superset of Open Food Facts? Also, why do you say data about Fuji apples would not be welcome on Open Food Facts? I never tried contributing there, but my impression was that they accept anything with an EAN (although I don't remember if an EAN is required), and you can definitely get apples packaged and sold with an EAN (sadly).
If your project is not a superset but it focuses on "classical" nutritional data about ingredients, then I'd like to know how it compares to the various official databases like EFSA dietary reference, food composition and food consumption databases, USDA food composition and food data.
I see the mention of USDA under "Data Modeling & bulk data import" but it's not enough to say you wish to import that data, you need to have a plan on how you will keep it up to date, what really you need to import and for what purpose, how the data will be augmented on your project, what value eventually you will provide by hosting and presenting it your way that will make your presentation of the data compelling compared to the various competitors, etc. Nemo 07:37, 25 February 2020 (UTC)Reply
Happy to clarify these points!
First, we want to emphasize that our first goal of this project is the re-organization and standardization of the existing databases and we will do our best to classify and store every bit of information from each database. We will not limit the amount of data to be included in our Wikibase instance, with the hope that different communities, such as Wikidata, can pick and choose what to include in their own databases. Our Wikibase instance can serve as a place to sort data from all the databases that are available on the Internet, including ones you mentioned, into one place, so that users can pull, combine, and analyze necessary data more easily. We believe that this project will be able to offer something these FCDs currently do not/cannot do, by harnessing the power of peer production.
I use many of these FCDs as a nutritional epidemiologist for research, and also as a migrant individual who records dietary information related to foods from my native country as well as other countries. The current situation of having multiple incomplete databases in various formats is much less than ideal for meeting the needs of diverse communities and individuals. For instance:
* I keep my daily dietary records in English. The software I'm using primarily uses the data from USDA.
* I sometimes eat foods more commonly consumed in Japan like 海ぶどう.
* I look for this item in the USDA databases, using several keywords including its scientific name, but I can't find the information. Perhaps this data exists in another database but I'd need to check each database one by one.
* I use another algae item as a substitute in my record but the nutrient data are available in the Japanese database.
This is just one example of the kinds of problems people may run into because of the incomplete connectivity among the existing databases. A global FCD can open up ways to explore many, many more new questions and solutions not just in food and nutrition but in many other topics that Wikimedians may be interested in. We believe that having this placeholder for all FCD information is a good way to contribute to different Wikimedia projects.
I believe OFF and our Wikibase instance take distinct approaches. According to their website:
"Open Food Facts is a database of food products with ingredients, allergens, nutrition facts and all the tidbits of information we can find on product labels."
OFF builds up the database by individually contributing nutrient data from food product labels. Our project's approach is different. We will be using existing databases and compile them into a standardized and structured database. As I mentioned before, OFF and our Wikibase instance are complimentary. On of the strengths of OFF is the ability to have product nutrient data that are not yet in larger public databases like USDA databases. Combining OFF and our Wikibase instance, we can have a more comprehensive FCD than any single database. And of course, OFF can use any data from WikiFCD that they deem useful for their own database. We would love to find ways to formally connect out datasets.
Great point about the plan beyond data importation. Like any Wikimedia project, peer production has the potential to actually keeping information more up-to-date than any working groups with limited numbers of participants. Any methods we employ to import, check updates, and maintain this database will be documented, so that future participants can easily learn how to do each of these activities and start contributing to the project. We will engage in outreach activities to involve diverse participants and we hope that documentation/tutorials and community engagement will increase the chance of frequent updates of the data.
In short, there is an enormous amount of food composition data on the Internet already. There have been several attempts to combine some of these databases but none has succeeded in creating a comprehensive and easy-to-use global database. Open and collaborative peer production has its advantage over smaller working groups in compiling these databases into one place and maintain the data. We believe that this is a powerful solution to the decades-old problem in nutrition and will benefit a wide range of audiences, from Wikidata users to FCD developers.
We hope this clarifies your questions. Thanks for engaging in this topic! Hackfish (talk) 20:58, 2 March 2020 (UTC)Reply

@Hackfish and YULdigitalpreservation: Can you explain what multilingual processing and entity linking you will do? Bringing Asian language databases into the fold would be very useful, but speaking from experience I know how hard such tasks are.

Works on a Key Knowledge Gap[edit]

In looking at areas that have broad under-coverage in the movement as part of my work on Campaigns and I keep landing on food and agriculture topics as potential places of activation. I wanted to highlight the potential for a project like this to activate a community working in key knowledge gaps against the SDGs-- where there are activists and broad body of funding and research monies available -- and potentially in highlight just how poor Wikipedia's coverage is on these topics. I also, in general, am supportive of piloting new knowledge domains on other wikibases, and then figuring out how to integrate with other Wikimedia projects. I am curious about the potential overlap here with other Open Food Data type projects, but I think the real promise is a focus on SDG critical information and working with communities in non-European language contexts. This seems reasonable, and a good project, from that perspective. Astinson (WMF) (talk) 15:57, 2 March 2020 (UTC)Reply

It would be compelling to see more focus on digital community engagement, i.e. some type of content drive or contest. Astinson (WMF) (talk) 15:58, 2 March 2020 (UTC)Reply
Thank you so much for the comments and suggestions! I love the idea of digital community engagement through content drive/contest. We will work with the community engagement intern to plan and incorporate this idea. We will edit the proposal to add this point.
We hope that our wikibase will be useful to OFF and we'll be able to demonstrate seamless data exchanges between the two. I also completely agree that this is an important knowledge base for several SDGs and we look forward to working with diverse communities to improve it.
Thanks again for the feedback! Hackfish (talk) 21:29, 2 March 2020 (UTC)Reply

Eligibility confirmed, Round 1 2020[edit]

This Project Grants proposal is under review!

We've confirmed your proposal is eligible for Round 1 2020 review. Please feel free to ask questions and make changes to this proposal as discussions continue during the community comments period, through March 16, 2020.

The Project Grant committee's formal review for Round 1 2020 will occur March 17 - April 8, 2020. We ask that you refrain from making changes to your proposal during the committee review period, so we can be sure that all committee members are seeing the same version of the proposal.

Grantees will be announced Friday, May 15, 2020.

Any changes to the review calendar will be posted on the Round 1 2020 schedule.

Questions? Contact us at projectgrants (_AT_) wikimedia  · org.

I JethroBT (WMF) (talk) 19:26, 2 March 2020 (UTC)Reply

Aggregated feedback from the committee for Global food and nutrition database[edit]

Scoring rubric Score
(A) Impact potential
  • Does it have the potential to increase gender diversity in Wikimedia projects, either in terms of content, contributors, or both?
  • Does it have the potential for online impact?
  • Can it be sustained, scaled, or adapted elsewhere after the grant ends?
6.3
(B) Community engagement
  • Does it have a specific target community and plan to engage it often?
  • Does it have community support?
6.0
(C) Ability to execute
  • Can the scope be accomplished in the proposed timeframe?
  • Is the budget realistic/efficient ?
  • Do the participants have the necessary skills/experience?
6.0
(D) Measures of success
  • Are there both quantitative and qualitative measures of success?
  • Are they realistic?
  • Can they be measured?
2.0
Additional comments from the Committee:
  • A new instance of Wikibase? Without a community?
  • I am encouraged to see this project proposal suggesting looking into another aspect of knowledge gaps. I am concerned there are already existing initiatives and wondering why the proposers have chosen to not to align with those.
  • It does fit with Wikimedia's strategic priorities. However, it looks like a competitor to "Open Food Facts", another free software and open data but the concept of both projects is different.
  • Interesting concept but some concerns about the methodology.
  • This seems iterative, but minimally so.
  • The goals are measurable, but I am not sure how innovative or impactful they will be.
  • This proposal has realistic measures of success and clear targets for evaluating impact and capturing learning. In addition, it is well-positioned to create long-term impact.
  • The project goals can be accomplished in the timeframe and budget.
  • The scope can be achieved within 12 months or less and the budget is realistic and efficient. But it isn't clear from the budget what the Community outreach/communication intern would be doing for 8 hours per week for 8 months
  • They really need to become more involved since they were not able to get any endorsements.
  • The proposal has very little community engagement with current Wikipedia communities.
  • There is no significant community engagement and support
  • A lot of concerns about the creation of another instance. The answers in the discussion page don't convince me.
  • I do not find this project to be critical to the current state of knowledge. There are existing resources they could join to do this work.
  • It does fit with Wikimedia's strategic priorities and the budget is reasonable but not enough community engagement. There is a need for a wider or major community discussion to ascertain whether this could be hosted on Wikidata or not. However, it looks like a competitor to "Open Food Facts", another free software and open data but the concept of both projects is different (from the proposal & answers to questions on the proposal talk page)

Opportunity to respond to committee comments in the next week

The Project Grants Committee has conducted a preliminary assessment of your proposal. Based on their initial review, a majority of committee reviewers have not recommended your proposal for funding. You can read more about their reasons for this decision in their comments above. Before the committee finalizes this decision, they would like to provide you with an opportunity to respond to their comments.

Next steps:

  1. Aggregated committee comments from the committee are posted above. Note that these comments may vary, or even contradict each other, since they reflect the conclusions of multiple individual committee members who independently reviewed this proposal. We recommend that you review all the feedback carefully and post any responses, clarifications or questions on this talk page by 5pm UTC on Tuesday, May 11, 2021. If you make any revisions to your proposal based on committee feedback, we recommend that you also summarize the changes on your talkpage.
  2. The committee will review any additional feedback you post on your talkpage before making a final funding decision. A decision will be announced Thursday, May 27, 2021.


Questions? Contact us at projectgrants (_AT_) wikimedia  · org.


--Marti (WMF) (talk) 01:43, 10 May 2020 (UTC)Reply

Response to the committee[edit]

Thank you so much for reviewing our proposal! We wanted to respond to the three major issues raised by the committee.

1. Relationships with existing initiatives

The most important issue raised by reviewers were concerns that our project overlapped with existing initiatives, Open Food Facts (OFF) in particular. This issue was raised on the talk page for our proposal during the discussion face but reviewers felt that our response there was not convincing.

We have made major changes to the text our proposal to try to explain in more detail how this project and OFF are complementary and to try to explain the difference between nutrition-label data (OFF's domain) and food composition data (WikiFCD's). Although they are related, they are different data with different sources, different audiences, different challenges, and different uses. Food composition data is best understood as a "downstream" source of granular data for projects like OFF as well for Wikibase instances like Wikidata.

We have spoken in depth with Stéphane Gigandet (the founder and leader of OFF) who has in turn spoken about our project with the OFF team. Gigandet is excited about our project and, with support of the OFF team, and has graciously added his name to the list of advisors for our proposal. Part of the reason Gigandet is excited about proposal is that OFF has previously attempted to incorporate some fruit and vegetable FCD from USDA and CIQUAL and run into some of the issues our team discusses in this proposal, like shifting formats. With Gigandet's advice, we will work closely with OFF to ensure that WikiFCD not only does not compete or duplicate effort with OFF but that it provides a useful resource that they can draw from.

Although it was not raised in the reviews, a new Wikibase instance like WikiFCD will reduce the burden for other communities such as Wikidata as well so that they can focus on their main project aims. We have updated our proposal to make it clear that we will work closely with Gigandet to integrate our work with OFF as a way of demonstrating how other Wikibase instances can incorporate data from WikiFCDs.

2. Community engagement

A number of reviewers raised concerns about our ability to build an engage a community. Although we agree that this reflects the biggest challenge and risk for this project, we believe that, with a WMF grant, we will have the resources we need to succeed.

If our project has less in the way of existing community support than some other grant proposals, it is because our goal is to engage new groups of experts in the WMF ecosystem rather than just calling upon already overtaxed Wikimedia volunteers. The type of outreach we are proposing will involvement building partnerships which could lead to expansion and diversification of Wikimedia contributors. Our approach is to provide domain experts experiences with Wikimedia systems and tooling that they find valuable. This strategy for engaging domain experts is consistent with the findings and recommendations of the GeneWiki program of work.

Hackfish is an established academic expert in global health and nutrition. She is currently working closely with San Francisco State University and Harvard School of Public Health and will be starting as an Assistant Professor at the Johns Hopkins University in September 2020. Hackfish is well positioned to use her deep connections in the academic nutrition community to help this project succeed and this engagement project will be the large part of what the intern will work on. We have already garnered strong interest in this project among the nutritionists at both Harvard and Johns Hopkins and will be working with teams at both places to contribute to WikiFCD and to engage broader communities.

3. Budget Question

There was a lack of clarity in our proposal about what the intern would be doing for 8 hours each week. We have edited the proposal to clarify that the intern will be working with the project manager to create online learning tools and seminars to develop and deliver curriculum focused on teaching nutrition professionals to use and contribute to WikiFCD as well as about peer production, WM projects, and ways to contribute to Wiki-based projects in general. We aim to employ a student who is interested in working with online communities who can support us one day a week for two semesters so as not to create burden on their workload and interfere with their academic work. Given the interests seen at the seminars, we are confident that we will be able to identify such an individual.

Hackfish (talk) 16:46, 15 May 2020 (UTC)Reply

Round 1 2020 decision[edit]

This project has not been selected for a Project Grant at this time.

We love that you took the chance to creatively improve the Wikimedia movement. The committee has reviewed this proposal and not recommended it for funding. This was a very competitive round with many good ideas, not all of which could be funded in spite of many merits. We appreciate your participation, and we hope you'll continue to stay engaged in the Wikimedia context.

Comments regarding this decision:
We will not be funding your project this round. While the committee appreciates the value in the work you are doing, given limited funds to award they are not convinced this project will create enough benefit to Wikimedia projects to warrant priority for funding within the scope of the Project Grants program.

Next steps: Applicants whose proposals are declined are welcome to consider resubmitting your application again in the future. You are welcome to request a consultation with staff to review any concerns with your proposal that contributed to a decline decision, and help you determine whether resubmission makes sense for your proposal.

Over the last year, the Wikimedia Foundation has been undergoing a community consultation process to launch a new grants strategy. Our proposed programs are posted on Meta here: Grants Strategy Relaunch 2020-2021. If you have suggestions about how we can improve our programs in the future, you can find information about how to give feedback here: Get involved. We are also currently seeking candidates to serve on regional grants committees and we'd appreciate it if you could help us spread the word to strong candidates--you can find out more here. We will launch our new programs in July 2021. If you are interested in submitting future proposals for funding, stay tuned to learn more about our future programs.

-- On behalf of the Project Grants Committee, Morgan Jue (WMF) (talk) 19:15, 29 May 2020 (UTC)Reply