Research talk:Wikipedia Edit Types

From Meta, a Wikimedia project coordination wiki

Leave feedback on the Edit Types project / tool here (feel free to add a new section/topic with your feedback). Additional tracking of issues can be found on Github.

Badges and adding standardized tags to edit summaries[edit]

Hi there, this project seems really interesting. Are there any plans or ideas yet on how this will be used?

I think it could be useful to make it easier to quickly understand changes at a glance or to filter them (for example on the GlobalWatchlist). These could also be used for certified badges users are given, similar to plain edit-counts. Many other open collaborative contributions-based sites use gamification, feedback and contribution-summaries as core features of their sites. While these could potentially facilitate edits for the sake of increasing some stats, they could also give a clearer view of one's contributions and substantially motivate both newcomers and, for retainment and increasing contributions, existing editors.

Here is a screenshot from the talk page of the "Automated classification of edit types" project:

A mockup of edit type category icons is presented on top of an article history page of the "Automated classification of edit types" project.
A mockup of edit type category icons is presented on top of an article history page of the "Automated classification of edit types" project.

Is something like this being considered?

I recently made a proposal that's concerned with badges (and rankinglists) for MediaWiki software development contributions, so having something similar for editing could potentially be (or become) relevant to it.

Moreover, I've been somewhat manually tagging my edits on English Wikipedia via edit summaries where I used the tags

  • "ce" for copy-editing where contents were only slightly changed and/or ancillary changes like adding a reference or adding subsection-headers were made
  • "expanded" for adding new content
  • "updated with ..." for adding new content which was previously added to a timeline that gets the article more up-to-date
  • "rv" for reverts
  • "Please update with ..." when proposing how an article could be updated on a talk page

I don't really know why I did this but I think having such tags could make edits clearer for others and ease/improve evaluation. Maybe this is relevant to the taxonomy but I don't have a clear proposal/s there yet. Note that the edit could have the "ce" and "expanded" edit-categories where various parts of the edit's changes can be categorized as either of these. I think in final version/s of the Edit Types Taxonomy tool it would need such granular accounting and also consideration of e.g. the number of characters (excluding references!) added or removed.

Lastly, I tried the tool here and it seems like comprehensive edits are only tagged as "change (1)" instead of e.g. "syntax/ce (7), expanded (20kB), references (14), media-images (2), category (1), wikilinks (10)". Is the plain tag "change (1)" the intended only tag of comprehensive edits or is this work-in-progress and only not yet more granular?

--Prototyperspective (talk) 23:40, 15 January 2022 (UTC)[reply]

Thanks @Prototyperspective: for these thoughts / pointers!
As far as use cases: the initial hope is for large-scale analytics -- e.g., better quantifying the output of campaigns or detailing shifts in edit behavior on wikis. The reason for that is large-scale analytics are a bit friendlier to bugs etc. while we continue to work through edge-cases. Eventually I can see a lot of potential applications too though like the mock-up you posted to help patrollers in filtering Recent Changes. On top of assessing general interest, the main question there will be one of infrastructure probably -- can we compute the diffs and edit types quickly enough to supplement RC efficiently?
Regarding the manual tags that you add to edit summaries: those sorts of labels are probably a later stage of the project if it happens. To keep the project manageable and easily extendable across all Wikipedia languages, I'm focusing first on aspects that are very specific to wikitext syntax -- e.g., Templates vs. Wikilinks. What you're describing I think is a bit more high-level and would be figuring out what combinations of the actions I'm trying to detect (Template change, Wikilink insert, etc.) define the things like copy-editing or content expansion etc. Your edit summary tags will be very useful for that stage though it's still likely several months out at least and I'm less certain about whether I will take it on because it will be hard to do this effectively across different language editions.
To the specific examples of e.g., copyedit vs. expansion: we just started working on how to best describe Text changes in a more useful manner (see Github issue). The current thinking is to probably aim for saying how much whitespace / punctuation / words / sentences / paragraphs were altered. This is already quite hard because many languages don't use white-space to separate words or have different types of punctuation, but I agree it's important for trying to distinguish the potential impact of an edit (though obviously even changing a single word or character in a word can have a huge impact on meaning).
Regarding the tool, it continues to be updated so hopefully would be in a better state now (biggest change since I think you tested it is that it handles changes that happen to nested content -- e.g., a template within a reference -- much better now). "Change" probably will continue to be the generic label used when something that already exists in the wikitext is altered in some way (big or small) -- more details in this section. We are considering expanding this slightly (see Github issue) but Text is the main area I'll focus on here I think. I also just made it hopefully a bit easier in the tool to get a link to an example and share it here! --Isaac (WMF) (talk) 19:10, 24 January 2022 (UTC)[reply]
Created an issue on Github here to outline the proposals and ideas contained there in a more coherent way.
Makes a lot of sense to focus on these things first. Yes, it's edit-types on a more abstract level. There may already be some python libraries that allow you to see whether a significant amount of new text was added vs copyediting and moving text around (some of that slightly changed). If this is built from scratch it's probably more difficult than it seems, but if there already are some libraries that can do the bulk of that, then it may not be harder than it seems and maybe even easier to implement a first working draft than it seems. I guess the complication here is that the script would need to "understand" the wiki-syntax. With all the natural language processing tools available (python, javascript, etc), I'd imagine something that's useful to build this already exists (at first a proof-of-concept version wouldn't work for all cases anyway).
More details can be found in that linked issue, including more info on why I think it could greatly vitalize Wikipedia. Thanks a lot for your in-depth reply and for your work on this really important project. Prototyperspective (talk) 20:15, 29 October 2022 (UTC)[reply]

Content-free feedback[edit]

Just saw your comment on phab:T265163. This is AWESOME! Great work. Very exited to see where it goes next. I have a bunch of ad-hoc hacky diff parsing and evaluation libraries on my laptop and I'd love to get rid of them. Enterprisey (talk) 00:36, 2 March 2022 (UTC)[reply]

@Enterprisey, thanks for the positive feedback! It was finally released as a Python package last week (pypi), so can easily be installed now. If you run into issues using it, don't hesitate to let me know. -- Isaac (WMF) (talk) 13:14, 8 March 2022 (UTC)[reply]

different name spaces[edit]

I was curious, is there an intention to treat different categories of pages differently? For instance, adding new content in the context of a discussion, is very different than adding content in the content of a page that we're building up. This could partially be split out by namespace, but you may have to be a little more nuanced? (one example I was looking at is 186495772@frwiki ) Effeietsanders (talk) 08:02, 19 August 2023 (UTC)[reply]