Community Wishlist Survey 2023/Larger suggestions/Train a language model to perform SPARQL queries
Appearance
This proposal is a larger suggestion that is out of scope for the Community Tech team. Participants are welcome to vote on it, but please note that regardless of popularity, there is no guarantee this proposal will be implemented. Supporting the idea helps communicate its urgency to the broader movement. |
Train a language model to perform SPARQL queries
- Problem: Even with the new query builder, the queries are not intuitive and they are hard to write if you are not knowledgeable in SPARQL.
- Proposed solution: Use all the queries that have been solved by editors to train a model that can translate text into SPARQL queries.
- Who would benefit: People who cannot write SPARQL queries but who want to query wikidata using natural language.
- More comments:
- Phabricator tickets:
- Proposer: MathTexLearner (talk) 23:00, 29 January 2023 (UTC)
Discussion
- @MathTexLearner: Thanks for this proposal, unfortuntately this is out of scope for our team, but we will move this one to larger suggestions. KSiebert (WMF) (talk) 12:57, 30 January 2023 (UTC)
- For what it's worth, ChatGPT does a pretty good job at writing common SPARQL queries for Wikidata. At least the kind of queries that do not rely on obscure properties or have performance problems. MarioGom (talk) 21:36, 31 January 2023 (UTC)
- I tried it and it doesn't work well. The query I tried is: "Can you write me a sparql query for Wikidata that provides a list of all majors in Bavaria displaying their name and the town they are major of?", and while it provided a SPARQL code, it was not right and it did not produce any result. MathTexLearner (talk) 22:53, 31 January 2023 (UTC)
- I tried my hand at it as well - and found that ChatGPT could not generate a workable query. Ottawajin (talk) 12:15, 14 February 2023 (UTC)
- I tried it and it doesn't work well. The query I tried is: "Can you write me a sparql query for Wikidata that provides a list of all majors in Bavaria displaying their name and the town they are major of?", and while it provided a SPARQL code, it was not right and it did not produce any result. MathTexLearner (talk) 22:53, 31 January 2023 (UTC)
- There have been a few (pre-LLM) attempts like Platypus. It would certainly be very powerful; no idea how feasible it is. I suspect it is hard to do this without having a huge training set of SPARQL queries with natural-language descriptions. --Tgr (talk) 00:28, 1 February 2023 (UTC)
- There is a large amount of queries at d:Wikidata:Request_a_query, however some work would be needed to create proper descriptions for all the generated queries. MathTexLearner (talk) 14:56, 1 February 2023 (UTC)
- Some queries are long and difficult to learn for new users. Thingofme (talk) 02:02, 23 February 2023 (UTC)
- There is a large amount of queries at d:Wikidata:Request_a_query, however some work would be needed to create proper descriptions for all the generated queries. MathTexLearner (talk) 14:56, 1 February 2023 (UTC)
- MathTexLearner you may be interested in this tool https://observablehq.com/@pac02/hello-sparql-queries-dataset. This makes it easy to find examples of SPARQL queries. PAC2 (talk) 06:28, 3 February 2023 (UTC)
Voting
- Support Strainu (talk) 20:19, 10 February 2023 (UTC)
- Support //Lollipoplollipoplollipop::talk 20:41, 10 February 2023 (UTC)
- Support NMaia (talk) 06:00, 11 February 2023 (UTC)
- Support Plaga med (talk) 11:32, 11 February 2023 (UTC)
- Support Conny (talk) 18:09, 11 February 2023 (UTC)
- Support --NGC 54 (talk|contribs) 01:36, 12 February 2023 (UTC)
- Support Ameisenigel (talk) 09:20, 12 February 2023 (UTC)
- Support Waldyrious (talk) 05:01, 13 February 2023 (UTC)
- Support Althair (talk) 01:53, 20 February 2023 (UTC)
- Support Ropaga (talk) 11:26, 20 February 2023 (UTC)
- Support Ταπυρ (گپ) 12:11, 21 February 2023 (UTC)
- Support cyrfaw (talk) 19:00, 21 February 2023 (UTC)
- Support Thingofme (talk) 02:01, 23 February 2023 (UTC)