Grants talk:IEG/Editor Interaction Data Extraction and Visualization

Clarifying roles and responsibilities[edit]

Hi Pine and Halfak (WMF), thanks for all your work on this proposal!

I'm wondering if you can share some more details about how you'd be dividing up roles and responsibilities for this project, as I understand it mostly rests on the 2 of you to do the labor. Specifically:

Pine, as your time is the sole "expense" this grant would fund, can you please share a bit more about what activities you'd be responsible for? I'm particularly wondering what things would be in your wheelhouse as a Research Analyst.
Aaron, I note that you are volunteering with your WMF staff account. Does that mean that this project needs WMF staff to do some of the extraction or other work in order to complete this project? What are the pieces of this project that only you can do (either as staff, or in your personal volunteer capacity)?

This feels like a new experimental case for IEG, and I want to make sure we understand it fully before marking the proposal eligible. Best wishes, Siko (WMF) (talk) 00:02, 4 October 2014 (UTC)[reply]

Hi Siko (WMF), some of this is very much variable depending on the scope of this project, and Aaron has just found another person who may work with us. I think it would be good for Aaron and I to meet to flesh out this proposal some more. Unfortunately my time to work on this is limited until the end of next week; this proposal could have used at least another week of discussion prior to submitting it but we were up against the submission deadline, so we are still sorting out how this project will work.

Questions that Aaron and I should discuss:

Aaron's role: is he working in a WMF capacity, in a volunteer capacity, or both?
Pine's tasks and time expectations
Role of our third participant
Timelines and deliverables

Halfak (WMF) can you set up a time with me to answer these and other questions from Siko? Perhaps we could meet on Wednesday morning.

--Pine^✉ 00:36, 4 October 2014 (UTC)[reply]

Hey Siko (WMF) and Pine. I apologize for the late response. I've been under the weather this weekend and traveling this week.

This project is not an official part of my duties as WMF staff. My role on this project is that of a volunteer. I'll help Pine with what he needs to get the project up and running (e.g. edits to this proposal) and help him find collaborators (e.g. Fabian_Flöck and HaithamS_). I'll also work to produce some of the editor interaction datasets and consult with Pine and other volunteers/advisors about formats, distribution and APIs for editor interaction data. After typing this out, it's clear that I should have signed the proposal with my volunteer account (EpochFail). I'll go fix that right away. Sorry for the confusion. --Halfak (WMF) (talk) 03:32, 7 October 2014 (UTC)[reply]

Done --EpochFail (talk) 03:40, 7 October 2014 (UTC)[reply]

Thanks for this clarification, EpochFail, that's helpful! Pine, I expect the committee will want to better understand answers to address the other 3 bullet points on your list of questions when they review this proposal (particularly your tasks in each activity specified, and any gaps remaining in the team as it forms), so I'd encourage you to keep working towards adding these pieces into the proposal over coming days as you sort them out more clearly. Meanwhile we'll mark this proposal eligible so you can proceed towards review :) Best wishes, Siko (WMF) (talk) 20:37, 9 October 2014 (UTC)[reply]

Project meetings we had one meeting today and are planning another one for Saturday. Hopefully by the end of Saturday we will have a reasonable project plan that all participants agree with and that the Committee can review. --Pine^✉ 18:55, 13 October 2014 (UTC)[reply]

Sensitivity of data[edit]

Hi Pine. I realize that my name is listed as an advisor on the project, but I would like to ask a few questions as this project is now being proposed for IEG. One of the major questions I have in mind is around data sensitivity. The proposal states that one of its main goals is to publish user interaction data, do you think that some users might raise concerns (privacy, potential of misuse, .. etc) around the published data? If yes, what kind of concerns, and how this might contradict with the goals of this project?. Thanks. --HaithamS (WMF) (talk) 22:07, 9 October 2014 (UTC)[reply]

Hi HaithamS (WMF), thanks for your question. The current plan is to use only public data, in much the same way that xtools uses public data. --Pine^✉ 20:03, 10 October 2014 (UTC)[reply]

Yes, the data is public, but I think some care should be taken when showing interaction examples involving specific individuals. It could be perceived as "picking on" individuals unfairly. I don't think we want to be showing actual usernames in those cases, User1, User2, etc, or other more descriptive but non-identifying names should be chosen. There may be situations (e.g. ArbCom) where the ability to use the tools to produce interaction logs with actual user names is perhaps appropriate but not in the "lab rat" situation. I guess I am saying that the software should always anonymise except when instructed to reveal the names of specific users when legitimately requested. Kerry Raymond (talk) 02:11, 18 October 2014 (UTC)[reply]

Hi Kerry Raymond, we are still discussing this question regarding visualizations, but all of the data is public already so I can't think of anything that our visualizations would reveal that isn't already easily knowable on a per-user basis. Our visualizations are more about the broader scope of interactions on Wikipedia, and trying to make those larger scale interactions easier to understand. If someone wanted to visualize interactions about a particular editor, it is already easy to do so, and it seems to me that xtools already does this. Our visualizations are not revealing anything that isn't already public. If someone wants to troll a user, I doubt that these visualizations will be very useful to them. The worst that they could do is show a series of hostile interactions, and that is unlikely to be new information to anyone who personally reads user talk pages, or searches for warning templates or negative information on user talk pages using Wikipedia's normal search tools. Can you think of a case when these visualizations would harm privacy? I can't think of one. --Pine^✉ 18:55, 18 October 2014 (UTC)[reply]

In terms of identifying editors who are biting others this is similar to a project I ran on the English Wikipedia which had to be stopped because of negative feedback. We tried to avoid naming and shaming the people who were making incorrect deletion tags, but it was easy to track them down through the links and it became deeply contentious. So I would make sure you anonymised this. WereSpielChequers (talk) 04:15, 19 October 2014 (UTC)[reply]

I endorse what Kerry Raymond and WereSpielChequers have written above: please make sure everything is anonymised. Especially from the GErman perspective, which is particularly data-sensitive, there is a big difference between data that is theoretically publicly available, and actual processing of this data and generating conclusions and interpretations, which are then in sum very easily understandable to everyone, while the raw data as such can only be interpreted by experts. In a nutshell: the chances of this idea are also at the same time the risks. If you plan to deploy that on German language projects, make sure everything is anonymised or there is an opt out. Best,--Poupou l'quourouce (talk) 11:20, 19 October 2014 (UTC)[reply]

The concerns above are noted, and there appears to be consensus that the visualizations should be anonymized. I will make efforts to do this, for example by not including usernames in graphs. Even now, without this project in existence, it is possible for users to create similar graphs that do name users, and because of the nature of the data sets that we will produce it will be possible for others to create graphs that do name users. However, we can avoid creating graphs ourselves that name users, and additionally we can explicitly remind anyone who uses our data that using the data to harass users is prohibited by the Wikimedia terms of use. I will be happy to consult with anyone about the nature of visualizations as this project moves forward. One of the principal purposes of this project is to understand editor interactions as a system rather than at the level of individual users, and to hopefully encourage a more civil environment in the long run. Harassment is contrary to the long-term goals of this project. --Pine^✉ 18:17, 22 October 2014 (UTC)[reply]

Eligibility confirmed, round 2 2014[edit]

This Individual Engagement Grant proposal is under review!

We've confirmed your proposal is eligible for round 2 2014 review. Please feel free to ask questions and make changes to this proposal as discussions continue during this community comments period.

The committee's formal review for round 2 2014 begins on 21 October 2014, and grants will be announced in December. See the schedule for more details.

Questions? Contact us.

Jtud (WMF) (talk) 22:20, 9 October 2014 (UTC)[reply]

Analysis and visualization methods[edit]

"We will make editor interaction data easy to understand by using visualizations."

Who will generate these visualizations? I'm down for a good 2d visualization here and there, but I don't have much experience in network visualization. Who will do that work? --EpochFail (talk) 15:50, 13 October 2014 (UTC)[reply]

That task will probably be let by me, possibly in cooperation with Fabian or Haitham. --Pine^✉ 18:53, 13 October 2014 (UTC),[reply]

It would be good to understand if either of the others have actually confirmed they are willing and able to contribute to these tasks, Pine. My impression so far is that HaithamS (WMF) offered advice on the general idea of this project before it became a grant proposal, but I'm less sure how your recent conversations have gone with him about his role. Generating the visualizations can take significant volunteer time and that's well beyond scope of an advisor. Cheers, Siko (WMF) (talk) 22:32, 17 October 2014 (UTC)[reply]

Hi Siko (WMF), I am expecting little more than occasional advice from HaithamS. We already have a number of ideas for visualization methods that are less resource intensive than creating new tools from scratch. --Pine^✉ 19:01, 18 October 2014 (UTC)[reply]

As pretty as network visualisations can be, I don't know if it's actually massive networks that we need to visualise here. I Further to my comments elsewhere on this page, I think we are probably most interested in visualising the interactions that have high individual significance or are cumulatively significant. Any visualisation I can think of would probably have time as the X axis, perhaps sometimes using a log scale to emphasis recent history over ancient history. Just to take a simple question. Is reverting more common in 2014 (as a percentage of all edits) than it was in the past? Who is being reverted, newbies vs editors of varying levels of experience? Who is doing the reverting? Is that changing over time? Is the first edit a user makes to an article more likely to be reverted than subsequent edits? Is that changing over time? I think the game with visualisation is looking to see correlations with editor attrition. I think the kinds of questions that might be better addressed with one of the those network visualations might be to visualise interaction against high-level categories. So do a layout based on distance between categories and colour them by the scoring of the interactions. That might show that certain categories were more prone to certain kinds of interaction patterns, e.g. POV issues around politics articles, reverting unsourced material in BLPs. Are disagreements more likely to occur in politics than geography? Kerry Raymond (talk) 03:38, 18 October 2014 (UTC)[reply]

Kerry Raymond, I'm stoked about your list of ideas as they serve as excellent use-cases for the data we'd like to produce. I want to make it much easier for you and others to pursue these types of questions. Our goal is to produce datasets that enable many types of visualizations and analysis methods. --EpochFail (talk) 19:24, 18 October 2014 (UTC)[reply]

Kerry Raymond one thing we are discussing is a separation of metrics from network visualization, since there can be analysis done that doesn't need to involve a huge amount of visualization. If network visualizations turn out to be easy then maybe we will do many of them, but if they turn out to be resource-intensive and produce little meaningful information then maybe we will go with simpler metrics and graphs. --Pine^✉ 19:31, 18 October 2014 (UTC)[reply]

At least for the part of intra-article visualizations, I plan to have a network graph of the users in the article, possibly only of the last X revisions, so that it will not be overly complex and still intuitive to understand, and to complement that with curves that show aggregated metrics over time alongside the graph. Another thing I would like to implement is an easy-to-use function to go from the more abstract graph representation back to the relevant content diffs to see what actually produced a certain edge in the graph.--Fabian Flöck (talk) 21:32, 18 October 2014 (UTC)[reply]

Community notification[edit]

We're a bit behind here. It seems to me that the Wiki Researcher community will be our primary audience for this work. If so, here's a few places it seems that we should canvass:

wiki-research-l
Active editors of R:Index
Active users of R:Quarry

I'm sure I'm missing others, but this seems like a good place to start. --EpochFail (talk) 15:53, 13 October 2014 (UTC)[reply]

Woops. Nearly forgot about gendergap-l and Gender gap.

I suspect that, for newcomer interactions, hosts at en:WP:Teahouse would be interested too. --EpochFail (talk) 15:55, 13 October 2014 (UTC)[reply]

Formats[edit]

Hey folks. I figure now is as good of a time as any to start talking about formats for editor interaction datasets.

I propose that the core event of an editor iteraction can be represented as a triple of:

<interaction> ::= <person> <person> <timestamp: int>

<person> ::= inst. of Human

<timestamp> ::= int

Since the wiki software represents persons as users -- registered and anonymous -- it seems clear that we need to simplify to:

<interaction> ::= <actor: user> <actee: user> <timestamp: int>

<user> ::= <registered user> | <anonymous user>

<registered user> ::= <id: int>

<anonymous user> ::= <text: str>

We'll also like to carry a payload of metadata about the event (was it positive or negative? what was the topic of conversation? etc.):

<interaction> ::= <user> <user> <timestamp: int> <meta>

<meta> ::= <type: str> ...

... ::= A relevant data structure for the type.

Now for an example:

revision	wikitext	event (JSON)
1	== His stolen watch. == The article is missing information about [...]
2	== His stolen watch. == The article is missing information about [...] : What information are you talking about? Was his [...]	{ actor: {text: "123.123.123.123"}, actee: {id: 987654}, timestamp: 1984567890, meta: { type: "talk_page_section", section: { index: 1, title: "His stolen watch." }, conversers: 2 } }
3	== His stolen watch. == The article is missing information about [...] : What information are you talking about? Was his [...] :: Yes it was. There's an article in the [...]	{ actor: {id: 987654}, actee: {text: "123.123.123.123"}, timestamp: 1984567890, meta: { type: "talk_page_section", section: { index: 1, title: "His stolen watch." }, conversers: 2 } }

revision

wikitext

event (JSON)

== His stolen watch. ==
The article is missing information about [...]

== His stolen watch. ==
The article is missing information about [...]
: What information are you talking about?  Was his [...]

{
  actor: {text: "123.123.123.123"},
  actee: {id: 987654},
  timestamp: 1984567890,
  meta: {
    type: "talk_page_section",
    section: {
      index: 1,
      title: "His stolen watch."
    },
    conversers: 2
  }
}

== His stolen watch. ==
The article is missing information about [...]
: What information are you talking about?  Was his [...]
:: Yes it was.  There's an article in the [...]

{
  actor: {id: 987654},
  actee: {text:  "123.123.123.123"},
  timestamp: 1984567890,
  meta: {
    type: "talk_page_section",
    section: {
      index: 1,
      title: "His stolen watch."
    },
    conversers: 2
  }
}

--EpochFail (talk) 16:39, 13 October 2014 (UTC)[reply]

I would suggest pulling out useful things from <meta> into top-level fields - like "positive<boolean>, topic<string>, etc". That would lend itself better to dumping into a database table and allowing people to query the dataset. But generating this interaction dataset seems to me like the best part of the proposal, I love it. As a side note, I would side with Nemo that it would be better to start on a different wiki. Start on something like etwiki and it'll be much faster to generate the dataset, then if it's useful, enwiki folks will be begging you to do it there as well. Also, in the time it would take to analyze enwiki, you could probably analyze a dozen small wikis and you'd have some very nice cross-wiki comparisons to make. Milimetric (WMF) (talk) 13:58, 17 October 2014 (UTC)[reply]

I'd suggest sitting down and deciding on the theoretic model, then the information model, before deciding on any data representation. Otherwise the risk is that the chosen representation isn't powerful enough or isn't efficient for the types of queries one might need to do. Kerry Raymond (talk) 02:40, 18 October 2014

(UTC)

I would agree that where the interaction happens does matter and that all interactions have a timestamp for each party as in Wikipedia we have no actual single-point interaction (because they would be edit conflicts) but rather a set of individual actions at the same page separated in time.Kerry Raymond (talk) 03:45, 18 October 2014 (UTC)[reply]

Kerry Raymond, I'm not sure what you are talking about with a "theoretical model". The data model is a theoretical model. My goal in producing datasets is not to produce something that is "indexed", but rather something that might be processed or indexed in many different ways. "power" and "efficiency" don't enter the equation here. Only "expressiveness". This data model implies that all interactions involve two users and occur at a specific time. It can support one-many and many-many interactions through multiple records (e.g. [A-->B], [A-->C] == [A-->(B,C)].

Also, I disagree that "where" the interaction happens doesn't matter. In general, the context of interaction tends to matter quite a bit. The data model I propose would include "actors" and "actees" which captures the asymmetry inherent in asynchronous editing. --EpochFail (talk) 19:32, 18 October 2014 (UTC)[reply]

┌─────────────────────────────────┘
Milimetric (WMF), thanks! We'll definitely start on smaller datasets, but I don't think that moving away from our native language is a good idea. One thing that we can do is process an article or a talk page at a time. Simplewiki isn't a bad place to work from once we are ready for medium-scale datasets.

EpochFail, I think we are in "violent agreement" that context *does* matter, something must have got lost in translation if you thought I was opposed. It sounds like we have a nomenclature disagreement on "model" happening too. But I'd certainly like to lift the conversation at least to the UML class level or similar (what are our entities, what are their relationships, what are the properties -- noting that most properties are just poor man's relationships with entities we didn't think interesting, which is often worth teasing out) before we dive into JSON representation. I wouldn't mind a few axioms. We've just agreed one I think (that no two actions can happen on an article at the same time, and therefore the actions on an article are a total ordering, whereas actions in WP in general are a partial ordering). I think there's also a distinction between when things happen and when another user observes them to happen which is important if we are to reason about causality, but I don't think we have data on what pages a logged-in user reads and when (do we? do we have any data on watchlists which might be some kind of proxy?) In my experience, arguing about models tends to tease out a lot of issues that nobody thought of in the first place and is generally to the benefit of the research. At the very least, it means there is a tighter and more explicit scoping of the project (these are the things we are considering, these are the things we are not considering). It means we are more able to write assertions and tests into our code that saves time in debug. Etc. Although I realise from the larger context than just this grant application, that there are use cases, I agree they have not been spelled out here in this grant request very well. I think there is a danger of building a tool in isolation of use cases as it may mean the tool ends up incapable of serving any use case (a solution looking for a problem, as they say). Kerry Raymond (talk) 00:00, 19 October 2014 (UTC)[reply]

Kerry Raymond, it seems you have not noticed the en:Backus–Naur Form I used above above to discuss the fundamental data model before getting into JSON. In this case, BNF is both more appropriate and more abstract than UML. If you look back at my original post, you'll notice that I started with a "perfect" data model that allowed for inst. of Human to be captured, then I stated how inst. of Human is impossible and how we could/should use <user> in the form of IP/user_id instead of Human. I think we're on the same page here.

You mention "building a tool in isolation", but rather, we are building a dataset format in the public. How is a public proposal with a well described (in BNF) data model "isolation"?

I think it is important to keep in mind that the above is a *proposal* for a data format. I brought this up on the talk page so that it wouldn't be developed in isolation. I really appreciate the discussion you have brought. I just want it clear that we agree on starting with abstractions (which I did) and this is not being developed in isolation (here we are not being isolated). --EpochFail (talk) 17:09, 19 October 2014 (UTC)[reply]

Re. moving meta-elements up, this is a problem for relational schemas since meta-elements can differ depending on the interaction type. We'd probably use a separate table for meta elements in a relational schema. I think that makes for an intuitive denormalization structure in the json. However, there is one clear case where we should move a field up: meta.type. Every interaction will have a type, so it can be in the main json object. --EpochFail (talk) 19:36, 18 October 2014 (UTC)[reply]

Use case missing[edit]

The scope is not defined anywhere, as all sections are tautological. One section says what we do is requested and the next says what is requested we do, then goals repeat the same and then several sections dive into implementation details. Please define what you actually are talking about. --Nemo 08:29, 17 October 2014 (UTC)[reply]

Not one of the original proponents, but I think the end game here is to reduce editor attrition resulting from unpleasant interactions. There is plenty of anecdotal evidence that it is interaction with other editors that drives people away, both the newbies and the experienced editors, but I don't know if we have any handle on the nature and patterns of such interactions. Is it one person being repeatedly (and perhaps deliberately) unpleasant to another over a long time? Is it a process being worn down drip by drip by a series of unpleasant interactions by a large number of people (probably mostly unintended)? Are there signs of revenge? You reverted my edit a month ago on article X, so I'll say something negative on a talk page about something you've contributed on an apparently unrelated article. So I think we need to find all the places at which editors can interact to build profiles of interaction. A hypothesis might be "Editor attrition follows a big argument around a single article, which is characterised by <some pattern of interaction>". If so, we could have software watching for such patterns and try to intervene to calm things down before editor attrition occurs. Kerry Raymond (talk) 02:35, 18 October 2014 (UTC)[reply]

English Wikipedia: no[edit]

Anything which begins with the English Wikipedia is 99 % certain to fail being expanded anywhere else. I oppose. Start with one or two wikis other than the English Wikipedia and I may believe that one day this will become for all languages; otherwise, just say you'll never go beyond en.wiki. --Nemo 08:29, 17 October 2014 (UTC)[reply]

I think the choice of language is effectively constrained by the languages spoken by the researchers. One would often need to compare the quantitative data with qualititative data. You can't judge whether an interaction is a friendly or unfriendly one if you don't understand that langauge. Also, many major research journals are published in English, so English examples are more useful for publication. I don't see a problem at this stage to be explicit and restrict the scope to en.WP. Indeed, I think for an initial foray into this space, it is better to narrow the focus. Kerry Raymond (talk) 02:19, 18 October 2014 (UTC)[reply]

That still gives you Meta, simple English and Commons as possible test places. WereSpielChequers (talk) 04:20, 19 October 2014 (UTC)[reply]

Agreed. Do we state anywhere that we'll only be doing English Wikipedia? --EpochFail (talk) 17:10, 19 October 2014 (UTC)[reply]

No, although the work could be adapted to other languages. We could likely include Simple English and much of Meta and Commons as well, but I think that starting with English Wikipedia makes sense, and we will likely remain only with English Wikipedia at least for the duration of this grant.

EpochFail, yes: «Hopefully the project will scale to all Wikipedias and perhaps Commons, but we will start with the English Wikipedia». I maintain that this will fail to be expanded anywhere else if you start with English Wikipedia. If the intention is to expand it somewhere else at some point, but the current composition of the team doesn't allow that, fix the team. --Nemo 06:27, 23 October 2014 (UTC)[reply]

I feel that it makes sense to start with the language that is best known to the current team and which happens to be the language of the largest Wikipedia. If we show good success with our work in the English language then we can consider future scope expansion by bringing on people who have native or near-native proficiency in additional languages. --Pine^✉ 09:13, 4 November 2014 (UTC)[reply]

Repeating your point doesn't help; I maintain that doesn't work. Also, WereSpielChequers' question wasn't answered. --Nemo 17:39, 4 November 2014 (UTC)[reply]

overlooked interaction type[edit]

Possibly it was overlooked because it was so obvious that it didn't need mentioning, but I would have thought editing the same article was an interaction. The current list only mentions reverts as an interaction of interest, but I can certainly build up a liking or disliking of another editor even when we are not actually reverting each other. OK, two editors editing the same article 5 years apart probably isn't a terribly interesting interaction, but 5 mins apart is. So, I think editing the same article is relevant, but its significance is modified by the time gaps between them. Again, I dislike at least one editor because of a series of unsourced edits they did some years ago across a large number of articles. So, the significance/strength of our interaction in any one article is low, but cumulatively I've grown to dislike them a lot. How does that play out if this editor and I come into close contact on another article? Am I now pre-disposed to be unfriendly toward them over an unrelated matter?

I suspect we need to be able to find all interactions involving a pair of editors and develop some kind of weighting to determine the likely "strength/significance" of that interaction as well as the "sentiment" of it (friendly/postive, or not). Specifically, I would think the distance in time and the distance apart in the text of the article are both relevant to the significance of the interaction. If years apart we edit different sections, it seems a very minimal interaction. If minutes apart we edit the same sentence, it seems a very significant interaction. For example, I often react to unsourced facts being added to articles on my watchlist by adding in a citation if I can easily find one. Clearly this fits the "close in time, close in text" as a significant interaction. I think that I am being helpful, and I hope the other person thinks so too, but maybe the other person perceives my actions as an implicit criticism of their contribution.

Even with talk pages and voting pages, even if we aren't interacting directly on individual topics or votes, the fact that both of us edit that page suggests we may well be reading one other's remarks and forming opinions positive or negative about the other. Again, it's a weaker interaction that if we are going head-to-head with one supporting and one opposing on the same issue. So I think the weighting model probably has to be different for different types of pages. If you oppose my request for admin rights, I am probably going to take it more personally than if we disagree on a borderline case of notability when neither of us has contributed to the article.

So I would think we would want to be able to extract all possible interactions and then weight them rather than decide in advance that some aren't significant. The drip-drip-drip of growing anger/frustration may arise from lots of low-significant interactions. We probably want to calculate some kind of score based on across all interactions to measure the extent of absolute interaction as well as a score than then uses sentiment analysis on the interactions to determine the polarity of those interactions, which then leads to some sense of the feelings between the people. Of course, this assumes that interactions can cancel each other out. Does your revert of my edit yesterday get forgotten if you give me a barnstar today? Or would a "thanks" today be sufficient? I assume thanks is on the list of interactions? Kerry Raymond (talk) 03:11, 18 October 2014 (UTC)[reply]

I certainly agree with that. This interaction type is important although not that explicit. See also my comment in the intra-article interaction discussion below. We can surely extract that, it is more an issue of how to define what counts as an interaction and what doesn't and how to express it. That is a non-trivial task (How far away in time or textual distance do I have to edit for my edit to constitute an interaction with you? And what type will it be? Or how much will I weigh it?). What has been done in research so far (sorry, don't have the papers in mind) as far as I can recall was mostly building networks of co-editorship inter-article. Combining this with intra-article editorship (e.g. in the same section in the same day) is very interesting. --Fabian Flöck (talk) 15:21, 18 October 2014 (UTC)[reply]

In the same vein, I would strongly recommend to extend the kinds of interactions described in the grant somehow like this and make it more systematic:

Intra-page
- articles
  - antagonistic (delete/undo) = reverts
  - supportive (reintroductions/redeletes)
  - co-editing (editing in vicinity of each other (time/space) under certain constraints in an article)
- talk pages
  - non-user talk: replying to each other in a thread
  - user talk: posting on each other's talk pages
  - for discussion: other talk spaces

Inter-page
- Co-editing of articles and talk pages (constraint or weighted by time/space)

--Fabian Flöck (talk) 15:34, 18 October 2014 (UTC)[reply]

so after discussing this with EpochFail and Pine, I would say the following: 1. the inter-article co-editing (who edited the same article as someone else, maybe weighted or constrained by time) could be post-processed from the primary datasets on article interaction that we will produce. 2. the intra-article co-editing/collaboration is much trickier. a) defining what counts as an interaction in an article (time/space-wise) if there is no editing of the same words is very subjective and ambiguous, and it would be unsure if the produced dataset would be meaningful at all. b) extracting this would mean a major adaption of the algorithms we already have and would consume many resources from other tasks. So this is something that we would likely not pursue in this project, but is definitely something worth working on. These kinds of co-editing relations in articles are complementary to the datasets we plan to generate and could extend them later on.

--Fabian Flöck (talk) 19:17, 18 October 2014 (UTC)[reply]

questions from rubin16[edit]

Hello, Pine! :) Could you, please, expand the problem you want to solve? Who are that researchers with requests, what strategic objectives are involved, what do you expect to change or introduce to wiki-community as a result of this research? rubin16 (talk) 13:11, 18 October 2014 (UTC)[reply]

PS: Have you seen that? rubin16 (talk) 13:11, 18 October 2014 (UTC)[reply]

Hi Rubin16, I am familiar with the graphs from HaithimS. There is a graph at the top of the main project page Grants:IEG/Editor Interaction Data Extraction and Visualization.We have expanded the information available under Grants:IEG/Editor Interaction Data Extraction and Visualization#Project idea to respond to your other questions. Thank you for asking. --Pine^✉ 21:03, 18 October 2014 (UTC)[reply]

Thanks, now it's more understandable for me :) rubin16 (talk) 16:28, 19 October 2014 (UTC)[reply]

Editor interaction data based on edit activity inside articles: Dataformats/-sets and Visualizations[edit]

Hi, so I'm gonna sketch out what I already discussed with Halfak and Pine on hangouts in terms of what I could provide to the project. (I didn't know exactly how to integrate it into the main article, so I start my draft here; please move what you feel is relevant or tell me).

This covers intra-article interactions (so in one article at a time). Although the sets of editors/nodes from single articles could later be merged to generate a graph for a whole category or even whole Wikipedia.

What we have so far:

We extended the wikiwho algorithm we wrote (see here ) to generate relationship data between editors in an article based on edits.

Basic wikiwho authorship detection[edit]

So what is given from the original wikiwho algorithm is an output that tracks the authorship of single tokens of text and looks something like this (simplified):

(Legend: Under the tokens (words+special chars) of a revision you see the original author and the revision of origin for that token,

the # means "deletion of the token 4 lines above (previous revision), by the revision indicated on the left" )

revID	editor	action description	Tokens ->

0	A	add
			There	is	a	house	on	a	hill	.
			A	A	A	A	A	A	A	A
			0	0	0	0	0	0	0	0

1	B	light deletion B->A, add		#				#
			There	was	a	house	on	the	hill	.	A	tree	was	standing	close	!
			A	B	A	A	A	B	A	A	B	B	B	B	B	B
			0	1	0	0	0	1	0	0	1	1	1	1	1	1

2	C	deletion C->B									#	#	#	#	#	#
			There	was	a	house	on	the	hill	.
			A	B	A	A	A	B	A	A
			0	1	0	0	0	1	0	0

3	D	full revert D->C, reintro. B
			There	was	a	house	on	the	hill	.	A	tree	was	standing	close	!
			A	B	A	A	A	B	A	A	B	B	B	B	B	B
			0	1	0	0	0	1	0	0	1	1	1	1	1	1

4	C	light delete C->B , add													#	#
			There	was	a	house	on	the	hill	.	A	tree	was	standing	nearby	.
			A	B	A	A	A	B	A	A	B	B	B	B	C	C
			0	1	0	0	0	1	0	0	1	1	1	1	4	4

Interaction extraction[edit]

Now we can transform that output into explicit interactions between the editors, as shown in the table below.

There are 4 different types of interactions:

"delete" --> a token gets deleted, the deleting editor is the sender, the editor whose token was deleted is the receiver of the edge
"undo" --> undoing a deletion or a reintroduction of a token. The "undoer" is the sender, the editor getting her action undone is the receiver.
"reintroduction" --> the sender reintroduces content of the receiver that was previously deleted.
"redeletion" --> the receiver deleted content, it was subsequently reintroduced and the sender now deletes the content again.

The first two interactions are regarded as antagonistic from the sender towards the receiver and marked with a "-", the latter two (3.+4.) are taken to be supportive and marked with a "+". (We deliberately refrained from using "revert" here, as actually both the "antagonistic" actions are reverts of some sort. When making the translation to a "revert", it would be these two. Cf. different kinds of reverts.) This DOES NOT mean that every delete or undo conveys a negative sentiment from the sender to the receiver (a correction of a single word can be simple friendly correction). It is just an aggregation and classification for easier handling.

"Weight" indicates how many tokens were affected by the action, "delay" is an optional indicator for how old the tokens were that were affected (e.g., in revision 4 author D deletes 2 tokens from author B that have been introduced in revision 1, hence delay=4-1=3)

editor interactions derived from the above-listed example revisions
revision	sender	receiver	type	weight	delay
1	B	A	delete (-)	2	1
2	C	B	delete (-)	6	1
3	D	C	undo (-)	6	1
3	B	B	reintroduction (+)	6	2
4	C	B	delete (-)	2	3
4	C	B	undo (-)	2	1
4	C	C	redelete (+)	2	2

These can be computed rather efficiently with our algorithm using the full-history dumps. As for the format I will comment on the "formats" thread above.

The plan is to integrate the output format with the project's needs and we would compute the interactions for the articles of whatever Wikipedias the project decides.

Further extensions / variations of the interaction extraction[edit]

Variations:

The 2 (-) and 2 (+) interaction types can and should imho actually be aggregated to only (+) and (-) for graph visualization approaches (for sure) and for analytics as well (probably), as otherwise, it gets to messy. For recording them, I'm not sure if that level of granularity is actually required.
...?

Possible extensions:

Apart from weight and delay of an interaction, we could also record more features per interaction:
- Did the sender's edit affect all of the actions of a specific former edit? ("action" defined as each distinct interaction created by an editor changing a word, corresponding to the "weights" in the table) Than we could mark it as "full undo" or "full restore". E.g., D's reintroduction of the 6 tokens deleted by C in our example would be a "full undo" (or "full revert" if you will), thus the entry in the table for revision 3, D->C would be marked with a "1" in an additional column "full".
- Did the edit create an identical revision as seen before (known as "identity revert" in other contexts) ? Than we could mark it in a new column/variable "identity revert"
- Other metadata relating to the changed tokens are imaginable, such as, e.g., the average length of the tokens changed in the interaction, their total length, to what extent they were stopwords, etc....
More interaction types could be introduced (although I'm unsure re: the usefulness of the granularity and feel they might be very ambiguous/hard to define):
- E.g. if two editors add words next to each other (define "next to": could be in a paragraph, <40 chars apart...) without antagonizing each other (define how long afterwards), we could infer that they work together and add a "collaboration" tie or the like. This is related to Kerry's comment. (Although hers would also include looking at networks of co-editorship in different articles as far as I understood, which is a complementary extension one level up)
- ... ?

Visualization[edit]

Using the interactions extracted like shown above, we are currently also working on a D3 (hence browser-based) visualization of the graph between editors in an article over time. So far it includes only the antagonistic edges. The implementation is based on the nice model proposed by Brandes et. al (paper here) and is a custom graph drawing approach for Wikipedia and negative edges. This is still very alpha, so I can only provide this screenshot so far (nodes are editors, will include also a lot of meta-info on editors, edges). It also features a slider to navigate the network as it changes over revisions/time.

During the project I (with some researcher colleagues of mine) would work on that together with Pine probably to see how we can integrate it with other visualizations.

--Fabian Flöck (talk) 15:01, 18 October 2014 (UTC)[reply]

testable hypotheseses[edit]

There are a number of theories of changed editor interaction that it would be possible and very useful to have tested by this. Which of these do you think you can design into your project?

the post 2007 drift from fixing each others work to templating it for hypothetical others to improve is driving away new editors.
The shift in recent years from adding citation needed where you think something is dubious to simply reverting unsourced edits is less likely to educate new editors into citing their work than previous interactions.
Editors who add unsourced information are much less likely to stay in the modern wiki than editors who add cited information.
Conversely, fixing someone's work and building on it is likely to improve retention and teach newbies. So for example can we test the theory that if you add sections or wikilinks to someone's article they are more likely to stay and or start adding sections and links to their future articles than if you had simply templated their article as needing section or links.
Edits that cause newbies edit conflicts are likely to drive them away. So any interaction on a brand new article such as templating or categorising it in the first few minutes as opposed to hours or days later is likely to cause edit conflicts unless that article was created with separate sections. There are two competing theories in the community, one is that it is important to inform newbies of problems before they logoff because otherwise they wont be seen again, and the other is that we need to give new editors time to learn and that pushing them too hard on their first edits drives them away.
Women and Men tend to respond differently to a negative interaction, men treating it as a challenge to be overcome women as a place to be avoided. On this theory we should expect to see that some interactions have very different retention effects on men and women.
The problem is people owning articles and rejecting any edits to articles they have watchlisted or the problem is patrollers rejecting certain types of edit in a way that doesn't train newbies to cite edits. Forgive me if the article owning theory has already been debunked elsewhere, but from discussions in various places I think there are still people who believe that theory so it would be good to see it tested and debunked.
Joining an active wikiproject is more likely to result in people staying than if they join an inactive one (watched wikiprojects where there have been few recent edits but a talkpage request prompts fairly speedy response from people who are watchlisting it are probably a distinct intermediate group).
our policy on EN wiki of treating those who edit war more harshly than those who vandalise is losing us editors. IE an block without warning for edit warring is one of those experiences like an unsuccessful RFA that leads to people leaving then or soon after. WereSpielChequers (talk) 04:52, 19 October 2014 (UTC)[reply]

No one can speak for WikiProject Editor Retention as there is no official leader, but I'm confident I can speak for most of us when saying that WereSpielChequers has a pretty good bead on the problems we face there, including the lack of real data. There are lots of other questions to be asked, but his above questions can help us either detect patterns, or debunk myths about them. Even at the top of the main page, I see a bit of presumption as to the current situation, but I'm hopeful that you can find formulas to extract information that are reliable and useful. My worry is that if we approach the data with preconceived ideas as to what the outcome will be, we will find only those outcomes. This isn't unique to Wikimedia project by any means. Dennis Brown (talk) 19:27, 19 October 2014 (UTC)[reply]

Crosswiki[edit]

One flaw in previous studies is that they have allegedly missed the effect of other wikis, both in terms of people having negative interactions on one wiki then treating each other with distrust on another wiki, and of people who haven't yet grown beyond their home wiki having nowhere to go within the movement if they have a negative experience, whilst those who have already spread to editing multiple wikis have the option of staying engaged with the movement but at least temporarily withdrawing from one wiki. WereSpielChequers (talk) 05:13, 19 October 2014 (UTC)[reply]

Good point. --Nemo 08:20, 29 October 2014 (UTC)[reply]

Yes, thanks WSC. Aaron and I have discussed looking at Commons and other projects where the use of the English language is common. For us to attend to non-English interactions, we will likely need to bring on people with near-native or native fluency in those languages. That is a possibility for future expansion of the scope of this project. --Pine^✉ 09:10, 4 November 2014 (UTC)[reply]

I appreciate that not every editor has a unified account. But if they do you don't need to understand their language to be able to differentiate between someone leaving the movement and leaving an individual wiki. Also if both parties have unified accounts you can programmatically tell if they have interactions on other projects, though yes you'd need to understand the language to know if the interactions were similar in tone. WereSpielChequers (talk) 22:46, 16 November 2014 (UTC)[reply]

Aggregated feedback from the committee for Editor Interaction Data Extraction and Visualization[edit]

Scoring criteria (see the rubric for background)	Score 1=weak alignment 10=strong alignment
(A) Impact potential Does it fit with Wikimedia's strategic priorities? Does it have potential for online impact? Can it be sustained, scaled, or adapted elsewhere after the grant ends?	7.5
(B) Innovation and learning Does it take an Innovative approach to solving a key problem? Is the potential impact greater than the risks? Can we measure success?	7.8
(C) Ability to execute Can the scope be accomplished in 6 months? How realistic/efficient is the budget? Do the participants have the necessary skills/experience?	7.8
(D) Community engagement Does it have a specific target community and plan to engage it often? Does it have community support? Does it support diversity?	7.5
Comments from the committee: A great set of data and tools for researchers, if ideally implemented. Could have impact if communicated carefully and convincingly to the community. Would need to make sure to present and communicate findings carefully, so that the community will welcome the results. Strong belief in Epochfail's ability to execute. Some risk of "naming and shaming". This should be avoided by publishing only anonymized data. European users might be very concerned about this and it would be important to address such concerns. Hard to see real impact in near-term. Proposal in this stage seems to be focused more on the infrastructure than results that could lead to visible changes of the Wikimedia projects. Some concern it will never be adapted beyond en.wiki Some interesting data is likely to come from it, though ultimately it may show interesting trends that have nothing to do with new editors. Some risk that these measurements won't actually show trends we can act on.

Thank you for submitting this proposal. The committee is now deliberating based on these scoring results, and WMF is proceeding with its due-diligence. You are welcome to continue making updates to your proposal pages during this period. Funding decisions will be announced by early December. — ΛΧΣ²¹ 16:57, 13 November 2014 (UTC)[reply]

IEG proposal withdrawn for now; next steps for data sets and visualizations[edit]

WereSpielChequers and Kerry Raymond, in view of the light community support for the proposal, we are withdrawing it from this IEG round. However, pieces of this proposal will continue to move forward, although probably on a slower timeline.

Aaron and Fabian will work on editor interaction data sets on a volunteer basis. Pine may join this work in 2015.
Visualizations will be dropped from the work plan in the short term. Mockup visualizations may be produced in 2015 when editor interaction data sets are viable for creating meaningful visualizations.
Aaron and Pine will continue discussions about community uptake and use of research outcomes in general.
Depending on progress between now and March, an updated IEG proposal may be made in March 2015.

Thanks! --Pine^✉ 22:41, 17 November 2014 (UTC)[reply]

I agree that the data extraction needs to proceed the visualisations. And probably the visualisations may be dependent on the hypothesis being tested. Also, without directing this comment at anyone specifically, some of the above discussion tried to increase the scope too much beyond what is achievable in a small and most likely under-resourced project. In terms of the concern about whether there would be an actionable outcome, perhaps limiting the scope to new contributors has a number of practical benefits because:

attrition of new contributors is a matter of concern to WMF
new users (assuming some definition of < X edits) only have a small number of edits, touch only a small number of articles, most likely only on single wiki, and interact with only a small subset of the community, so there is less data to process if the focus is survival/attrition of new editors
having so few edits makes the patterns simpler to analyse
there are enough of them (an infinite supply it seems!) to generate useful quantities of data

The downside of studying new contributors is that we know little about them in terms of demographics, areas of interest, nature of edits, whereas we can know or deduce a lot more about long-standing contributors. On the other hand, those they interact with are probably long-standing players about whom we can know/deduce a lot more. So, we are probably more likely to be able to produce actionable results about:

the types of articles and the type of edits associated with new editors who do/don't survive
the characterisation of other editors with whom they interact

I am not sure that initially it is necessary to try to characterise the interactions as positive or negative for two reasons. Firstly, because this is qualitative and it's hard to get the data. Secondly, because a simpler hypthosis (which I don't thnk has been tested) is whether new contributors are frightened away by collaborative editing. So it would be interested to know whether new editor attrition is correleated to the volume of collaborative activity they experience. If new editor attrition is just as high among those who are "left alone", then it begs the question of whether interaction with others is the main issue. Kerry Raymond (talk) 01:16, 18 November 2014 (UTC)[reply]

Interesting and sensible proposal, that one thing is tested at a time. However, there is another hidden assumption in your proposal: that those who are left along can only have same or less "attrition". However it's possible that some people are instead de-motivated by the lack of any interaction, so we have contrasting forces which may cancel each other and be hard to gauge. --Nemo 12:12, 18 November 2014 (UTC)[reply]

added link to new visualization prototype[edit]

I added a link to the new visualization prototype we've done for editor disagreement interaction and the algorithm we use to extract the disagreement (and agreement) edges. --Fabian Flöck (talk) 16:43, 12 March 2015 (UTC)[reply]