Jump to content


From Meta, a Wikimedia project coordination wiki
This is a proposal for a new Wikimedia sister project.
Status of the proposal
Reasonno support. Pecopteris (talk) 05:44, 20 August 2023 (UTC)[reply]
Details of the proposal
Project descriptionWhat is the project purpose?

This project proposes to create a unique discussion platform (in the unitary sense) to allow citizens, associations, experts, companies, etc. to exchange spontaneously on societal topics by writing contributions. In this project two technologies in particular would be used: semantic web technology and textual analysis tools, i.e. NLP (Natural Language Processing) methods.

The semantic web tools would allow to link the contributions in a knowledge graph to highlight all the dimensions related to a problem (economic, social, environmental, etc.) as well as the argumentative structures.

NLP tools would allow to group the contributions according to their meaning to avoid redundancy and simplify the reading of the debates.

More details on these two technologies are given in the Proposal section.

What will be its scope?

The main use cases would be :

-Provide a graphical interface to navigate in knowledge graphs. -To propose an input window so that the user can write his contribution or a new topic for debate. -Group contributions according to their semantic proximity. -Propose services (SPARQL queries https://www.w3.org/TR/rdf-sparql-query/ ) to extract the desired information. -Feed a database and update it according to the additions and classifications of the contributions.

How would it benefit to be part of Wikimedia?

Being able to substantiate an argument with at least one article or publication is essential to ensure the quality of the discussions. Knowledge mobilization and reactivation is essential in this project and being part of Wikimedia would therefore be an opportunity. In this perspective, users could use, among others, the pages wikipedia, wikisource, wikiversity, wiktionary, wikinews to support an argument and, conversely, the identification of active debates but without precise references could give indications on pages to be built or extension needs to be filled. The type of collaborative platform envisaged would use the semantic web. Thus there would be close links with the wikidata project (which has created URIs for wikipedia concepts. These resources (i.e. these URIs) could be used within the framework of this project to identify resources and aggregate knowledge about them). Data and metadata would be produced and could feed wikidata: the number of debates, classification of debates, structure of debates (in terms of attendance), articles cited, debates requiring additional resources, characteristics of contributors if possible etc. If this project materializes, it could benefit from the moderation rules already set up by wikimedia. Wikidebat, could benefit from the image of wikipedia (which is known to all) and could be easily identified as a place for consultation and debate in a collaborative and open mode.

This project would mobilize textual analysis tools that could also enrich, in return, other projects from the investments and research made in the framework of Wikidebat.
Potential number of languagesThis would be a matter for further discussion as appropriate. Knowledge or debates would make sense to be in several languages, but this may not be the case for all debates, which for some will only make sense locally.
Proposed taglineConnect and organize to understand
Technical requirements
New features to requireThis project would require the use of databases. They could either be relational databases or triplestores if we place ourselves in the technologies of the semantic web.

Many similar treatments can be done with relational databases and triplestores. In this presentation, I place myself in the ideal case where this project could be done using semantic web technologies.

The choice of the semantic web would allow other users, researchers, journalists or others to add their layers of information on existing triples, thus increasing the value and uses of the data. It would be a way to ensure that shared information (such as reference pages, definitions, etc.) would be the same for all contributions referring to it. Finally, it would be possible to evolve the data voucher more flexibly than a relational database would.

The raw contributions to the discussions will have to be processed using NLP (Natural Language Processing), an API would be needed to call the python programs used.

A graphical user interface would be needed to explore the knowledge graphs.
Development wikiNot yet, but i hope so!
Interested participants
Me at the moment. In case of favourable opportunity, the goal would be to create collaborations and to integrate in this project all those who want to participate in order to realize a demo.


Debate tools based on information technology already exist, but these may have limitations, for example, because of their silo organisation which makes it difficult to identify the interactions that exist between debates, or because of the restrictions that can be made on the subjects of the debates. These projects are obviously very positive, but the proposal presented here seeks to propose a tool where the issues and topics are not selected a priori and where all the debates develop in a single place with the objective of identifying the different dimensions (social, economic, environmental etc.) attached to an issue and their ramifications with other issues in order to have the most complete overview possible of the issues and stakes.

This proposal is based on semantic web technologies. The RDF (Resource Description Framework https://www.w3.org/RDF/ ) language model gives the essential idea of representing data in the form of subject - predicate - object triples, where subjects and objects and predicates are resources identified by URIs (or literals for objects). An example of a triple: a book (Subject, identified by its URI) - has for author (predicate, identified by its URI) - name of the author (object, identified by its URI).

By aggregating the triples together, step by step, we can build graphs without limit. Many languages are built on the RDF language model to enrich the possible representations of triples, one of which allows to determine its own categories of object and relationship, it is the OWL language (Web Ontology Language https://www.w3.org/OWL/ ).

The classes envisaged, which would be defined in OWL, whose instances would be used in the discussions are : C1 Question to open a new debate C2 Definition to return on the same definition the occurrences used in the contributions. C3 Thesis (a short sentence, possibly limited in number of characters) to present the main idea of a contribution. C4 Argument (a short sentence, possibly limited in the number of characters) to support the thesis. C5 Justification that would contain links to sources of articles as a justification class to support arguments and text added by contributors. Ci More technical classes that are not directly displayed but are important for the processing flow (how for example the class of thematic memberships) … The set (Ck) of necessary classes would have to be completed in the event of a favourable opportunity notice.

In practice the contributions would develop around the Questions and the Contributions, in their most complete form, would be composed of the Thesis, Argument, and Justification classes. In order to encourage the participation of all it would be possible to consider incomplete contributions (only a thesis for example) which would then be the expression of an opinion.

In order to relate the contributions to each other and to reveal polarities among the set of ideas, the following relationships can be defined also with OWL

P1 Contradict to mean that a contribution is opposed to an existing contribution P2 RequestPrecision to mean that an argument is not precise enough P3 confirms to mean that a thesis based on another set of sources and justification confirms a similar thesis. P4 Complete to mean that new arguments complement an existing thesis. P5 A for source to relate an argument to its sources P6 is used to relate a definition used in a contribution to its official URI that would be shared for all. ... The set (Pn) of necessary predicates would have to be completed in case of a favourable opportunity opinion.

NLP (natural language processing) tools would be used on theses and arguments, which for the sake of simplicity would be short sentences, which we would try to group according to their meaning. An example is given below to explain the main idea:

The two theses are the two theses below:

-S1 "Biodiversity loss is an even more dramatic problem than climate change".

-S2 " the decrease in biodiversity is the major current risk. »

In the perspective of this project it would be a question of being able to group together the two sentences above which are very close semantically to avoid having too many duplicates which would diminish the overall readability.

This specific step of textual analysis would require methodological deepening in order to determine the adequate treatments. A first idea would be to transform these sentences into a normal form and determine the essential grammatical groups, and to use word embeddings or dictionaries to determine a distance between these two sentences and to group them together if the distance is sufficiently small. Tests should be carried out to determine the thresholds at which contributions can be grouped together.

Similarly, NLP treatments could be used to determine at a more global level the general theme of theses and arguments. Here, for S1 and S2 it would be "biodiversity".

To continue the example, if we have an S3 sentence "Biodiversity is not a major issue. "It would belong to the same general theme "biodiversity", but should not be grouped together with the first two sentences S1 and S2 because it presents an opposite idea.

Unfortunately there is no demonstration tool for the moment, and one of the objectives of the demo would be to determine, by putting the situation in context, what are the relationships and classes to be defined, i.e. to define the two sets (Ck) and (Pn).

The aim of this tool would be to create a link between the spheres of citizenship, academia, production, associations, etc. and, why not, by ricochet, on the political spheres. There are strong stakes around the debates. The fact that scientific results may have been known for many years without being taken into account in the political agenda can be seen as a dysfunction. Fake news and cognitive biases, on the other hand, distance us from methodical approaches (admittedly imperfect and constantly to be pursued) to establish solid facts.

The aim of this proposal on your site is to have as many critics as possible to judge the appropriateness of this project. AND IN CASE OF FAVOURABLE OPPORTUNITY, THE GOAL WOULD BE TO CREATE COLLABORATIONS AND TO INTEGRATE IN THIS PROJECT ALL THOSE WHO WANT TO PARTICIPATE IN ORDER TO REALIZE A DEMO. To conclude this part, this tool would allow both to mobilize the knowledge already accumulated by wikimedia but also to identify the new subjects at stake to be educated and participate in the construction of knowledge.

What will be its scope?

The scope of this project is:

To create a platform for continuous debate where Contributions are organized in knowledge graphs and where groupings of Contributions would be made according to their meaning to improve readability. The use cases are the following:

NAVIGATE IN THE KNOWLEDGE GRAPH The user could navigate in the knowledge graph and directly visualize the debates either via a graphical interface or via an interface representing the tree structure in a textual way.

In the case of a graphical user interface, one can imagine being able to zoom in on certain parts of the knowledge graph. At the most general level one would observe the relationships between the general themes, then one could click in a theme to see the debates located within it and so on until the Contributions.

SEARCH IN THE KNOWLEDGE GRAPH The user can search to select debates by keywords. Underlying SPARQL (https://www.w3.org/TR/rdf-sparql-query/ ) queries would be used to display the corresponding sub-graphs either graphically or textually.

PROPOSE A NEW DEBATE The user can propose a new topic for discussion. The NLP processing would check that a similar topic does not already exist. If similar topics already exist, the user would be asked to confirm that the topic is really different.

ADD A CONTRIBUTION The user can point on a group of Contributions or on a particular Contribution and open a dialog window to write his Contribution and specify what is the relation to the initial idea or group of ideas (contradict, ask for clarification, confirm...) and add links (to wikipedia pages, newspaper articles or scientific articles) to build his Arguments.

GROUP CONTRIBUTIONS TOGETHER The NLP tools would be used to group Contributions according to their meaning, this implies : - Implementing NLP tools to compute the semantic proximities of new Contributions to existing ones. - Using SPARQL queries to assign (modify or update) membership classes according to the results of the classification (for example, using the cases described in the proposal section, we would have at the level of the most general theme for S1 "biodiversity", and at a finer level "biodiversity seen as a serious problem"). Here the classes are represented by labels, but in reality it would surely be several nested classes represented by non-significant codes associated with labels.

MANAGING ABUSIVE CONTENT Use the NLP tools to avoid abusive content (insults and, as far as possible, hateful content and defamation etc.). The automatic treatment of insults is certainly the simplest. Other types of moderation could be explored with NLP treatments but human controls may be necessary if NLP is not sufficient.

SPECIFY SERVICES A knowledge graph can be very large. Services will have to be designed to facilitate access to content and enhance its value: for example, SPARQL queries could be built to identify the most active debates, the debates with the most ramifications or the new debates. SPARQL queries could be predefined with a variable as a parameter to be modified or allow advanced users to directly write their SPARQL query.

DATABASE MAINTENANCE the databases would need to be updated to reflect new additions of contributions and new classifications made on triples (to determine membership categories). And these updates should ideally be made in real time so that the user can check that his contribution is effective.

Proposed by[edit]

Wiikkkiiii (talk)

Alternative names[edit]

Related projects/proposals[edit]

Domain names[edit]

Mailing list links[edit]


People interested[edit]


Comment Comment @Wiikkkiiii: I can't really give a direct vote here. Can you provide a demo website if it is not much trouble? Arep Ticous 14:04, 4 May 2020 (UTC)[reply]

Comment Comment @Arepticous: Hi Arepticous, thanks for your message! I hope to provide a demo soon. I’ll make a point in one month or two. — The preceding unsigned comment was added by Wiikkkiiii (talk) 18:26, 18 May 2020 (UTC)[reply]

Comment Comment @Wiikkkiiii and Arepticous: Is this stale? There are no comments except the 2 above, and there is still no demo. AnotherEditor144 t - c 13:05, 1 March 2021 (UTC)[reply]

Comment Comment @AnotherEditor144 and Arepticous: a basic demo in R could be possible — The preceding unsigned comment was added by Wiikkkiiii (talk) 23:49, 1 March 2021 (UTC)[reply]

Comment Comment @AnotherEditor144 and Arepticous: I think it would be a good idea to define some kind of deadline for this demo in order to decide afterwards on the status to be given. Would this summer, in July or August, be acceptable against your process? Wiikkkiiii (talk) 12:31, 4 March 2021 (UTC)[reply]

Comment Comment @AnotherEditor144, Arepticous, and Wiikkiiii: I find this idea very interesting although quite challenging. I would be in for few devs in Python or R to test the technical feasibility. But there is a lot to debate about it. For example, it is quite ok to find similar threads in a web site like stackoverflow. But when it comes to debates, the complexity is way harder. Take the debate “should all major political decision be made via public referendum?” and “should citizens be able to promote any law if it acquires a minimal number of supporters?”. They are quite similar but the underlying arguments would vary a lot. It may not be the best example but sometimes the way the question is asked dramatically changes the debate itself. So, any open debate platform with a sufficient number of contributors would end up with a lot of similar but actually quite different debates. Could an algorithm manage that level abstraction and complexity? If you want to talk live here is my discord forum: https://discord.gg/T32qP2Bxyx

Yvandeniswiki 11:18, 7 March 2021 (UTC)[reply]

Comment Comment @AnotherEditor144, Arepticous, and Yvandeniswiki: Great! Thanks for your comments and proposal. I agree that there are methodological issues that need to be addressed first, and for me that's still in the design phase, but I think it's possible to find treatments to address those issues and improve quality.

All of the following arguments are open, it's not a closed answer, and I would be happy to discuss them, and we can set a date for that.

The first idea is that the classifications we could make are not raw data but derived data, which can be updated whenever NLP programs are improved.

Semantic proximity would be computed for all classes (RDF sense) so if we find strong similarities between the structure of the arguments (strong similarity between subtrees) this would be a good indication to merge the discussions.

We need to specify the owl (or rdfs) ontology, to define the possibilities of organizing the data. It would be possible to define a semantic relation "close debate". It is in fact possible that a first debate provokes a related debate as an evolution.

One can imagine several levels of classification, from fine to broad. Even the broadest classification could be useful: in a class called "policy makers-people" one could gather the types of arguments.

Wiikkkiiii (talk) 21:26, 19 March 2021 (UTC)[reply]

@Wiikkkiiii: "I would be in for few devs in Python or R to test the technical feasibility." I just so happen to be good at Python, but aren't there any others who can work on it? AnotherEditor144 t - c 23:12, 19 March 2021 (UTC)[reply]

Comment Comment @AnotherEditor144, Yvandeniswiki, and Arepticous: Excellent! Personally, I can develop a bit in python, I tested some libraries for NLP treatments. But it's still too early to code, we need to specify a bit more the ontology, the type of NLP treatments, etc. Maybe we could look for people who could help while we think about designing a test version? The best thing would be to have a workspace for these exchanges already. I can create a github repo to exchange, let me know if you are ok with that

Wiikkkiiii (talk) 01:09, 20 March 2021 (UTC)[reply]

@Wiikkkiiii: Yeah, I'm ok. I can set up the repo if you want. AnotherEditor144 t - c 09:23, 20 March 2021 (UTC)[reply]
@Wiikkkiiii: The repository is at https://github.com/estella144/wikidebat. AnotherEditor144 t - c 12:09, 20 March 2021 (UTC)[reply]

I completely agree; this will be a good project. I will Support Support.

Comment Comment @Yvandeniswiki and AnotherEditor144: Thanks so much for your support and for the repo! I'll post a document in a week or two with the details of ontolongy and the methodological questions I've thought of so far. This would be for discussion and supplementation. Ivan is also interested. Based on the agreed principles, we could decide on a common pace for a roadmap of the test version! :) Wiikkkiiii (talk) 18:49, 20 March 2021 (UTC)[reply]

@Wiikkkiiii: How do I know it's you who wants to contribute? We never disclose our Github usernames to each other. (Well, I have, but I don't expect you to unless necessary.)

Comment Comment @AnotherEditor144: Ok, I just made a Pull Request for an information model. I'm not quite familiar with the Wikimedia best practices. If it's more convenient to switch to another one, no worries. Wiikkkiiii (talk) 14:19, 21 March 2021 (UTC)[reply]

Ok. @Wiikkkiiii: I invited you (so that you can write to the repository). AnotherEditor144 t - c 14:27, 21 March 2021 (UTC)[reply]
@Wiikkkiiii: Won't you respond to it? AnotherEditor144 t - c 18:41, 21 March 2021 (UTC)[reply]
@Arepticous: won't you come? AnotherEditor144 t - c 14:28, 21 March 2021 (UTC)[reply]

Ok thanks :@AnotherEditor144: — The preceding unsigned comment was added by Wiikkkiiii (talk) 20:55, 21 March 2021 (UTC)[reply]

@Wiikkkiiii: I noticed @mentioning you on Github doesn't work. You don't respond. AnotherEditor144 t - c 20:56, 21 March 2021 (UTC)[reply]

Comment Comment @Wiikkkiiii:, maybe consider making a test in wikispore? Just a suggestion. -Gifnk dlm 2020 From Middle English Wikipedia 📜📖💻 (talk) 08:53, 28 January 2022 (UTC)[reply]

here -Gifnk dlm 2020 From Middle English Wikipedia 📜📖💻 (talk) 08:59, 28 January 2022 (UTC)[reply]

See also[edit]