Jump to content

Talk:Wikinews Pulse

Add topic
From Meta, a Wikimedia project coordination wiki
Latest comment: 1 month ago by Gryllida in topic related discussion - news schedule

Alternative proposal: Wikibase-based Wikinews Pulse

[edit]

In other word: a Wikimedia news aggregator project. This will be an Wikibase instance to store all articles ever published in all newspapers and news websites (and potentially, by extension, blogs), it can be automatically imported by bots (example).

  • By only allowing news articles published elsewhere, we obviated most of BLP and OR issues.
  • By using bots to import articles, we reduced the need for volunteer engagement.
  • News aggregator is by itself not a reliable source. But this project can provide an easy way to "invoke" an article as citation in Wikipedia.
  • We will also have a machine-readable and queryable dataset for news articles.

--GZWDer (talk) 19:52, 17 July 2025 (UTC)Reply

Note this will exclude many but not all of exclusive reports - we can still allow news about Wikimedia but they should be published elsewhere (e.g. at Diff blog or Signpost), and they will be included in news aggregator. Note news aggregator does not collect full texts (which are usually copyrighted), only title, metadata and URL to actual article (for free-content or copyright-expired news source, also potentially a link to Wikisource where people can read full text). GZWDer (talk) 19:58, 17 July 2025 (UTC)Reply
I guess d:Wikidata:WikiProject Periodicals#Article item properties would also be relevant, as would any other information or ideas about wiki-adjacent w:news aggregators. Pharos (talk) 07:21, 6 August 2025 (UTC)Reply
Is your conception for this that it be based on Wikibase/Wikibase.cloud primarily, with some connections to Wikidata? Pharos (talk) 07:46, 6 August 2025 (UTC)Reply
I was thinking along somewhat similar lines, but what you describe would be a complementary project to Wikinews Pulse. Building a largely automated dataset of externally-published news articles would be a very helpful preparatory aspect of this, but it would not replace the need to actually aggregate multilingual headlines with human-driven community curation. Pharos (talk) 19:16, 21 July 2025 (UTC)Reply
Hi GZWDer, Csisc and I have been talking about your version of the idea in some detail; see below. Definitely interested in making this happen and involving some of the research labs that are working on algorithms + tools for efficiently tracking/ clustering/ summarizing/ evluating news. This could also be a place to link in the diverse ways different wiki communities have organized on-wiki source-reliability evals for news sources. –SJ talk  12:35, 29 September 2025 (UTC)Reply

Rough prototype for visualization purposes on Wikidata

[edit]

Wanted to share this, to help give a visualization of one potential way "current events" content might be modelled as a Wikidata/Wikibase representation:

Pharos (talk) 19:07, 21 July 2025 (UTC)Reply

Proposed domain name

[edit]

When you say "multilingual service, to be hosted at https://wikinews.org", do you mean it would be hosted at www.wikinews.org, similar to www.wikisource.org which is the multilingual Wikisource existing alongside the language subdomains? Nemo 06:58, 22 July 2025 (UTC)Reply

Indeed, that is the proposal. This way, there is a multilingual service that focuses on machine-readable short content (what I've for simplicity called "headlines"), while longer prose content can still be available at the different language subdomains. Pharos (talk) 14:48, 22 July 2025 (UTC)Reply
Thanks, makes sense. Nemo 18:03, 22 July 2025 (UTC)Reply
Make sense ~ Sheminghui.WU (talk) 23:40, 23 July 2025 (UTC)Reply

Why headlines

[edit]

Why focus on headlines? Headlines are patently the least reliable and nutritious part of any news publication. They are generally sensationalist, misleading, full of mistakes. The entire point of Wikinews is that it allows us to not parrot misleading headlines: even if you create an article that's just a list of references, you can pick an actually informative title for it.

As for "generated" headlines, if that means some kind of automatic prose generation, I daresay that's clearly not a good idea. Headlines are the worst possible source to automatically generate more headlines, and summarising the entire text of the articles is not going to be easier. Nemo 07:02, 22 July 2025 (UTC)Reply

The proposal is not at all to dump headlines from news publications into an AI, and then to generate an average LLM headline. I agree this would be a very, very bad idea! It is instead to write short, approximately one-sentence, summaries of current events in the style of w:Portal:Current events, but to do it in a machine-readable form similar to Wikidata. The "generation" to convert to prose in different languages would not be LLM-based, but supported by the type of deterministic (non-hallucinating) linguistic tools being developed for Wikifunctions/Abstract Wikipedia. Pharos (talk) 14:43, 22 July 2025 (UTC)Reply
Got it. It makes sense as a showcase for Wikifunctions. So the job would be to model the newsworthy facts data model, as well as the corresponding functions? I wonder if that's going to be feasible under time pressure. Presumably it will be easier for repeatable and predictable events, such as celebrity deaths and natural catastrophes? Nemo 18:03, 22 July 2025 (UTC)Reply
Have a look at the Wikidata prototype in the section above. I think with a minor research project, it would actually not be so hard to figure out perhaps 100-1,000 mini-models that would cover the format of the majority of possible "headlines". Pharos (talk) 18:04, 27 July 2025 (UTC)Reply

Topics beyond a single event on Wikidata

[edit]

 Question: Currently, WN’s translation process is indeed quite inconvenient — this is something I’ve experienced firsthand (and the copyright agreements aren’t even unified). However, I still don’t quite understand how Wikidata could better correspond to WN articles. True, the current linking on WN is really inadequate. For example, international and general coverage of a news event could easily be placed under a single Wikidata item. Some articles are manually linked, while others aren’t. But news reports are too variable; it’s very difficult to have a clear, consistent subject of reporting, and the boundaries are too fuzzy."

A colleague (Comrade Moqin, Zauber Violino) suggested: "In that case, maybe we should create a 'small topic,' where various related news reports could be placed. Then the 'small topic' could be linked to other projects." I think the same way. I believe for news reports, it would be better to have something similar to a Category on the local site. For instance, if readers are interested in the Trump's assassination, they could find all related articles. But in practical terms, this would already be present in the channel templates on the local site (like the Politics channel).

Additionally, linking WN to Wikidata could mainly help editors with translation purposes.

  • ↑This is some of my chat history outside the site, related to this topic. Could you briefly and clearly explain some of the misunderstanding here and how to solve the problem? Thanks. ~ Sheminghui.WU (talk) 02:51, 28 July 2025 (UTC)Reply
If you look at the example Wikidata item mentioned above Portal:Current events/2025 July 2 (Q135322430) you will see that each value for significant event (P793) includes a number of qualifiers. I believe if you want to group a series of events together as a small topic (like several incidents related to the assassination attempt), which would be analogous to a local category as currently used on Wikinews, a useful qualifier would probably be something like part of (P361). Pharos (talk) 18:44, 30 July 2025 (UTC)Reply

Third option

[edit]

Can we vote for a third option, a status quo were Wikinews will remain in the same form? BilboBeggins (talk) 21:31, 31 July 2025 (UTC)Reply

The two options are to support the proposal or to oppose it. Actually, both options are consistent with the various Wikinews language editions remaining in their current form.
The proposal is only to add a multilingual data portal onto a different site, https://www.wikinews.org/ which is currently only used as a place to link to the various languages. Pharos (talk) 22:23, 31 July 2025 (UTC)Reply

Wikidata front end

[edit]

This would seem to be a good interim step, along the lines of w:Wikidata:Wikidata front ends#Topic-specific front ends, and specifically based on Universal Almanac (for events) and Scholia (for news sources). Pharos (talk) 07:38, 6 August 2025 (UTC)Reply

Parallel project from WMF

[edit]

I learned at Wikimania about this parallel project from WMF, which has distinct outputs, but I think could share in a news ecosystem that plays well with Wikidata and Wikipeda's Current Events: mw:Wikimedia Enterprise/Breaking news. @Bawolff: You may find this interesting in terms of something more concrete to build on and integrate with. Pharos (talk) 14:02, 12 August 2025 (UTC)Reply

  • That looks like a way to identify clusters of articles being actively edited on a given day, for high-profile events. It would be good to see counts for more than 24h, and to see recent view count (perhaps from the previous 24h, but we should also get a rough estimate of same-day views)
  • I would love to see a broader time range option (up to a week), searching by keywords and not just QID, clustering by related articles (not just single-article stats), and clustering across languages. All of the cross-linkage data needed to do this should be in Wikidata already. –SJ talk  16:56, 26 September 2025 (UTC)Reply

so you're proposing a drudge report?

[edit]

really? ltbdl (talk) 01:52, 14 August 2025 (UTC)Reply

No, not really, the proposal is basically for an improved version of w:Portal:Current events. And which would then probably replace the existing implementation, which has been around for many years, and is fairly popular but has significant room for evolution. Pharos (talk) 14:24, 14 August 2025 (UTC)Reply
But: the Drudge Report format is useful to a wide range of people, which is why it is still one of the most popular news pages on the web despite its extreme simplicity. We also had a low-key community newsletter for a few years that had a similar format. –SJ talk  16:56, 26 September 2025 (UTC)Reply

Portal:Current events organic traffic growth in recent years

[edit]

On English Wikipedia over recent years, actually Portal:Current events has shown organic traffic growth relative to the Main Page. This is highly unusual when most English Wikipedia portal-type pages have been in long-term decline, and I do believe it reflects a deeper hunger from our readers for more NPOV news content. Pharos (talk) 16:29, 18 August 2025 (UTC)Reply

Automated processing of news feeds

[edit]

@Csisc: this is a good place to describe what we were talking about at Wikimania:

  • What processing should we run on news feeds to generate knowledge units in the news?
  • What evals should we run on the feeds to estimate news quality (from the publisher, author, context)?
  • What libraries of benchmarks, datasets, models, and more should we gather on Huggingface to help organize this?

SJ talk  22:33, 19 August 2025 (UTC)Reply

@Sj: I am sorry for the late reply. I was deeply engaged in completing several other projects that required my immediate attention. Concerning what we talked about regarding Wikinews Pulse, I am honored to share the following points:
  • Wikinews Pulse, by its very nature, will require frequent and dynamic updates, since the news landscape evolves not just daily, but minute by minute. Relying solely on human contributors to capture and organize this constant stream of information would be unrealistic and unsustainable, given the sheer volume and speed at which new stories emerge. To address this challenge, we envisioned an automated approach that leverages existing online news distribution mechanisms, particularly RSS feeds, which aggregate the latest headlines from diverse media outlets. The core idea is to process these RSS feeds programmatically and extract individual news items as they are published. However, instead of treating each headline as an isolated entry, the system would employ algorithms to measure semantic similarity or contextual relatedness between titles and short descriptions. Using these relationships, the system can automatically group related items into clusters, with each cluster representing a distinct news story or evolving topic. From a user perspective, this clustering has clear advantages. For example, if multiple news outlets report on the same international summit or breaking incident, their respective headlines would be clustered together under one coherent group. This not only reduces redundancy but also allows users to view a consolidated perspective of global reporting on a given issue. Furthermore, clustering provides a natural foundation for higher-level tasks such as ranking stories by prominence, detecting emerging trends, generating summaries, and linking coverage to related Wikimedia resources like Wikipedia articles or Wikidata entities. Please find an example of how such a processing is done at Step-by-Step Pipeline for Wikinews Pulse.
  • To decide which external resources can be used for the clustering and processing of news items, we need to establish a transparent and community-driven framework. As part of this effort, we are considering the development of a Wikibase-based directory of AI models and datasets that can support tasks related to the operation of Wikinews Pulse. Such a directory would not only catalog available resources but also document their provenance, training methodologies, performance benchmarks, and licensing terms. This database will ensure that the choice of models and datasets is traceable, reproducible, and aligned with Wikimedia’s principles of openness and accountability. It will also make explicit which resources are most suitable for specific tasks (For example, clustering multilingual headlines, detecting duplicate news reports, linking entities to Wikidata items, or generating event summaries). By structuring this information in Wikibase, the directory itself becomes a living knowledge graph, enabling contributors to query, update, and curate the AI ecosystem collaboratively. Ultimately, this approach will strengthen transparency, facilitate responsible AI adoption, and provide a clear governance model for the tools that power Wikinews Pulse. Please find a full explanation of this proposed approach at AI usage in Wikimedia Projects. --Csisc (talk) 20:02, 22 September 2025 (UTC) Hi Csisc, this is great. Yes, we need among other thingsReply
  1. a directory of tools;
  2. a directory of news feeds;
  3. a feed of clustered news-topics;
  4. a working WikiCite to capture the flow of added/removed citations :)

SJ talk  16:56, 26 September 2025 (UTC)Reply

Step-by-Step Pipeline for Wikinews Pulse

[edit]

Unlike a traditional news aggregator, Wikinews Pulse is envisioned as a structured knowledge graph built on Wikibase, the same underlying platform as Wikidata. This means that instead of only grouping news items by similarity, the system also models the entities and relationships described in the news, thereby creating a machine-readable and queryable representation of ongoing events.

Ingestion of News Sources (RSS Feeds)

[edit]
  • Extract headlines, timestamps, descriptions, and metadata.
  • News item → stored as an entity in the KG, with properties linking it to its source, publication date/time, and language.

Source reliability evaluations

[edit]
  • Assess each source for reliability using standard benchmarks
    • From which benchmarks? Publisher / Author / Structure / Content...

Preprocessing and Entity Extraction

[edit]
  • Apply Named Entity Recognition (NER) and entity linking to detect people, organizations, places, and other key concepts.
  • Link them to Wikidata entities wherever possible (e.g., “United Nations” → Q1065).
  • If no Wikidata item exists, a temporary placeholder item is created in Wikinews Pulse for later community curation.
    • What tools? NER: SpaCy has en_core_web, de/es/fr/zh_core_news... and a multilingual model. QID mapping: is there an OpenRefine reconciliation endpoint for this? (Todo: that API needs localization; is only English) dbpedia uses sth like this for Flexifusion. See also WikibaseIntegrator. -sj

Semantic Embedding and Clustering

[edit]
  • Use embeddings to group related news items into clusters representing distinct stories or events.
  • Each cluster becomes an Event entity in the KG. [Some of these clusters end up organically getting their own WP or WD entries]
    • What tools? Some sort of multilingual news clustering[1, using WD embeddings, GenSim?]. This is related to "trending topics" clusters, e.g. via Bing News API. Can be improved with a separate cluster-naming tool.

Example: “UN holds emergency meeting on climate crisis” → Event: UN Climate Meeting 2025.

Event Representation in Wikinews Pulse

[edit]

Each event node in Wikinews Pulse would include structured properties such as:

  • Participants: Individuals, organizations, or groups involved.
  • Location: Linked to geographic entities.
  • Time: Start and end date/time (to the granularity available).
  • Sources: Linked back to news articles in the cluster.
  • Context: Links to related Wikidata/Wikipedia entities.

Example:

Event: UN Climate Meeting 2025

  • has participant → United Nations (Q1065)
  • has participant → António Guterres (Q311440)
  • location → Paris (Q90)
  • date → 2025-09-21T12:00Z
  • related to → Climate change (Q125928)
  • reported in → Reuters article (Entity: N12345), BBC article (Entity: N12346)

Ranking and Story Evolution

[edit]
  • As new news arrives, additional sources are linked to existing event nodes.
  • Temporal properties allow the KG to represent story evolution over time (e.g., a summit announcement → the summit itself → post-summit analysis).
    • Is this its own tool?

Human Review and Community Curation

[edit]
  • Volunteers and editors can merge duplicate event entities, refine labels, and ensure correct linking to Wikidata.
  • Community oversight ensures consistency with Wikimedia’s knowledge standards.

Final Representation and User Access

[edit]

End users don’t just see “clusters of news” but structured event knowledge graphs.

Queries become possible, mainly using SPARQL:

  • “What events did António Guterres attend this month?”
  • “Which global leaders were reported together with Xi Jinping last week?”
  • “What events related to Paris were reported in September 2025?”

--Csisc (talk) 19:56, 22 September 2025 (UTC)Reply

@Csisc: Hi, I am curious about how far the progress has gone so far. Is there any demo model available yet? The Board is expected to make its final decision about Wikinews in mid-December. If Wikinews is discontinued, could this potentially become a new project for us to work on? Thank you. -- Asked42 (talk) 14:37, 26 November 2025 (UTC)Reply
@Pharos. -- Asked42 (talk) 14:38, 26 November 2025 (UTC)Reply
@Asked42: I can talk from a technical point of view. I met people from the Wikibase Development Team during WikiCite 2025. I explained to them what we need for running a Wikibase for Wikinews Pulse. They said that they will enable editors of Wikibase databases to use Wikidata entities as properties and objects. We are waiting for these upgrades to go live before we proceed. Concerning the other points, I think that Pharos is more positioned than me to answer them. --Csisc (talk) 16:32, 26 November 2025 (UTC)Reply
Great. Thank you for your work. If there are any major developments related to this, please let us know. Also, if there is any way community members can help please inform us. --Asked42 (talk) 13:45, 27 November 2025 (UTC)Reply

AI usage in Wikimedia Projects

[edit]
@Sj: Explaining here a framework that can be useful to manage how AI is used in Wikinews Pulse.

Regarding our conversation during Wikimania, we mainly focused on the question of how artificial intelligence (AI) components could be effectively embedded into Wikimedia Projects such as Wikipedia, Wikidata, and even newer initiatives like Wikinews Pulse. The idea was not only about technical integration but also about ensuring that the use of AI aligns with the values of transparency, accountability, and community-driven governance that have always guided Wikimedia.

As a starting point, we reflected on the AI Strategy proposed by the Wikimedia Foundation a few months ago, which placed a strong emphasis on the principle of traceability. This principle means that any AI-generated contribution within Wikimedia platforms should be clearly identifiable and reviewable so that community members can understand its origin, evaluate its reliability, and correct it if necessary. To uphold this, it was suggested that AI tools or bots should not be tasked with broad or complex responsibilities, but rather with small, specific, and well-defined functions. By limiting the scope of their activity, we can establish clear links between the AI system, the output it produces, and any potential errors or biases that may arise. This approach would not only make it easier to detect and correct mistakes but would also strengthen trust among contributors and readers, ensuring that AI complements rather than undermines human collaboration within Wikimedia projects.

In addition, it is essential to establish a comprehensive directory of all the datasets and pretrained models that may be utilized by AI bots within Wikimedia Projects. This directory would serve as a central and transparent repository, ensuring that contributors and developers have clear visibility into the resources being used to train and deploy AI components. Each entry in this directory would not only reference the model itself but also explicitly link it to the tokenizer and the training methodology applied during its development. Such information is crucial for reproducibility, enabling others to understand how the model was built and to replicate or refine it if needed.

Moreover, the directory should document the specific datasets or benchmarks that were employed as training and testing resources. This includes noting the accuracy rates, evaluation metrics, and potential biases discovered during testing, thereby providing the community with an objective measure of the model’s reliability and limitations. In addition to these technical details, every model listed should be explicitly associated with the Wikimedia-related task it is designed to perform—whether that involves suggesting citations in Wikipedia articles, detecting vandalism, enriching structured data in Wikidata, or supporting content discovery in emerging projects like Wikinews Pulse. By doing so, the directory not only fosters transparency and accountability but also ensures that AI tools remain purpose-driven, community-aligned, and adaptable to Wikimedia’s evolving ecosystem of knowledge projects.

To make this vision practical, the creation and maintenance of the directory could follow a structured framework:

Community-Governed Repository

  • The directory should be hosted on a Wikimedia-supported platform (e.g., a Wikidata-based structure).
  • Governance should follow Wikimedia’s collaborative model, with both technical contributors and community members participating in curating entries.

Standardized Metadata Schema

  • Each dataset or model entry should follow a standardized template including:
    • Model name, version, and source.
    • Associated tokenizer and training method.
    • Training datasets and evaluation benchmarks.
    • Reported accuracy, precision/recall, F1 score, or other relevant metrics.
    • Documented limitations and biases.
    • Specific Wikimedia task(s) supported.
  • This ensures consistency and comparability across models.

Verification and Peer Review

  • Before inclusion, datasets and models should undergo community review to validate their provenance, licensing, and relevance to Wikimedia projects.
  • Peer review mechanisms, similar to Requests for Comment (RfC), can be applied to resolve disputes or assess controversial resources.

Traceability and Auditability

  • Each model should include references to training logs or published papers where possible.
  • Tools could be developed to automatically link model performance reports with their Wikimedia applications, enabling easy tracking of successes and failures.

Ongoing Maintenance and Updates

  • Models evolve over time; therefore, the directory should maintain version histories and changelogs.
  • Regular updates would ensure that deprecated or underperforming models are flagged and that new, community-endorsed models can be integrated seamlessly.

Ethical and Legal Safeguards

  • The framework must ensure that only datasets compliant with Wikimedia’s open knowledge and licensing principles are used.
  • Ethical considerations such as fairness, bias mitigation, and inclusivity should be documented for every resource.

By implementing this framework, Wikimedia would not only guarantee transparency in the use of AI but also create a living infrastructure where AI tools are responsibly cataloged, evaluated, and improved over time in alignment with the Wikimedia movement’s mission of free and reliable knowledge for all. We envision developing this database, to be called WikiMLkit, in RDF format as part of Wikibase Cloud, once the upcoming revisions of Wikibase have been implemented. We will be primarily focused on processing all the models on Hugging Face first, as they are available under free licenses. It will be very easy to process them as they are already assigned to corresponding tasks and somehow working. --Csisc (talk) 19:30, 22 September 2025 (UTC)Reply

Integration with Wikidata and other tools

[edit]

I'm not clear on the desired workflow here, though one could imagine many of them.

  • What sorts of information, from what sources, clustered and organized in what ways, would be stored as Pulses?
  • How could editors annotate or revise thise Pulse data?
  • Would this initially be stored in a database on WM Cloud?
  • What parts of this datastream would end up in Wikidata, using what properties and entities? Probably wants a Wikidata Project
  • What parts of this datastream would be accessible to news-editing scripts? Are there existing scripts or would these be new tools for news editors? Would the same tools work for Current Events contributors and Wikinews contributors on most projects?

More thoughts, workflows, and details would be helpful. And could inform which current newsies are interested in participating and developing the idea. Since Wikinews itself gave rise to this concept, it would be good to have wikinewsies from different languages (who are already discussing how to update their workflows) give feedback on the overall concept and how they could see using it. –SJ talk  22:38, 19 August 2025 (UTC)Reply

[edit]

https://en.wikinews.org/wiki/Wikinews:Water_cooler/proposals#News_schedule FYI thanks Gryllida 04:14, 27 January 2026 (UTC)Reply