Community Tech/Cross-wiki watchlist/Notes

From Meta, a Wikimedia project coordination wiki

Prelim analysis meeting, Dec 15th[edit]

On Tool Labs -- [1]. This keeps track of your notifications, with filters to remove bot edits and self edits. We can't build on Crosswatch, it's in Python.

This idea has been around since 2005, but SUL finally makes it more feasible to match up cross-wiki accounts.

This will be a big design project -- designing the interface. Not that difficult technically. We'll also need to work with design research.

The design needs to make it clear where you're going, no surprise when you're sent somewhere.

Performance problem? Depends how we implement. By API or backend access to other databases. It will increase load, but prob not a huge amount -- # people looking at watchlist is trivial compared to people looking at articles. No cacheing issues, watchlist is never cached.

Main thing to figure out: Do this in JS using API, or make this part of core and do it all in the backend?

We'll have to develop this as a separate page, Spec:Watchlist needs to keep working. It'll go through a process of test version -- beta feature -- switch out the existing Watchlist when + if the new version is considered better.

Dev summit, Jan 4th[edit]

Moriel suggested using Collaboration's cross-wiki notification API as a base. She realized it wouldn't be that easy, because we don't send watchlist items through notifications, but Ryan and Niharika think it may be possible to piggyback. Maybe set up an "invisible notification" that we can use.

Look at how Crosswatch handles filtering.

TCB has this on their wishlist as well: TCB notes, in German.

Talk page, Feb 2nd[edit]

Thoughts on the front-end, from User:° on Talk:Community Tech/Cross-wiki watchlist:

"With the recent activation of catwatch watchlists can become quite crowded. This will be even more the case with a global watchlist. Therefore a user should be able to have more than one watchlist, making it possible to group watched pages by individual criteria (for example watched categories, watched pages on german language wiki projects, watched pages on art, edited pages in the user namespace, ...)."
"... Something has to be implemented to make a global watchlist actually usable in real life (catwatch being important to be considered). This could be the "named watchlists" approach or the filters or something completely different. I prefer the named watchlists idea, but if filters are used, there should be a way to save your filters."
"Another thing: At the moment you can choose to have all watchlist changes mailed to you on a per wiki base, the named watchlists would provide an easy way to manage this on a per watchlist base (allowing for example to get emails for edits to watched pages, but not for categorizations; or to get emails for wikidata pages, but not Q-entries)."

We should think about saving/naming filtered watchlists. Separate mailings is probably outside our scope.

Investigation, Feb 8[edit]

From Niharika's investigation, T120853:

Existing tools:

1. Crosswatch on Tool Labs:

  • Allows to filter minor/bot/registered/unregistered edits
  • Allows to filter by namespaces and wikis.
  • Inline diffs and links to directly jump to specific diffs
  • Filter to view results from "last x days"
  • Link for more documentation and project reports: T92955

2. Gwatch on wikimedia.de:

3. User scripts: https://en.wikipedia.org/wiki/User_talk:Yair_rand/interwikiwatchlist.js and https://en.wikipedia.org/wiki/User_talk:Yair_rand/interwikiwatchlist2.js — Lets you add your watchlist from a one project to another project. The first version places the new watchlist of top of the existing one while the second version integrates the entries from both into one watchlist.

4. A proof of concept by Lego to achieve this completely client-side by making API requests to each wiki: https://github.com/legoktm/xwiki-watchlist (> 2 years old, proceed with caution)

5. French wikipedia gadget: fr:MediaWiki:Gadget-GlobalWatchlist.js

6. https://github.com/he7d3r/mw-gadget-CrossWikiWatchlist Another gadget. Based off Legoktm's tool (see 4).

7. An extension called GlobalWatchlist is in use by the Gamepedia wikis (see http://help.gamepedia.com/Special:Version). Couldn't find the source code though. We might want to talk to them about getting the code published.

Features requested:

  1. Implementation of a watchlist that will check for changes to watched articles on more than one wiki
  2. Covers Wikimedia projects at least (non-Wikimedia wikis is a nice-to-have) (from en:Wikipedia:Global,_cross-wiki,_integrated_watchlists#Features, ideas)
  3. Ability to switch between local and global watchlist
  4. Various filtering capabilities (possibly like what Crosswatch offers)
  5. Ability to group by project and order by time
  6. Consistent input methods (from mw:Micro Design Improvements/Watchlist UI)
  7. Choose your default watchlist view (from T35888

Stuff to explore while looking into possible technical implementations:

  1. Crosswatch repo: https://phabricator.wikimedia.org/diffusion/TCRW/browse/
  2. Using a global database: T5525#1085235
  3. A completely fronted implementation: https://github.com/legoktm/xwiki-watchlist/blob/master/xwikiwatchlist.js
  4. Each watchlist has RSS feeds which could be possibly used for generating a global watchlist

Note on dropdown diffs, June 13[edit]

In the first set of wireframes, we included inline diffs which could be viewed directly on the watchlist page with a toggle. We've decided that's out of scope for the cross-wiki watchlist, but there is some work that's already been done on dropdown diffs.

Dropdown diffs of last edits was on the Community Wishlist Survey, coming in at #72. On the Phabricator ticket (T120775), Quiddity notes that there's an existing userscript that enables dropdown diffs. You can test this in your common.js; Quiddity's example of the code is at https://en.wikipedia.org/w/index.php?title=User:Quiddity/vector.js&diff=584276453&oldid=584156013

If there's interest, I'm sure we'll see this idea coming back up for a vote on the next Wishlist Survey.

RFC meeting with Roan and Matt, Aug 2[edit]

Notes from Etherpad

https://phabricator.wikimedia.org/T126641 -- [RFC] Devise plan for a cross-wiki watchlist back-end https://tools.wmflabs.org/meetbot/wikimedia-office/2016/wikimedia-office.2016-07-20-21.00.log.html https://meta.wikimedia.org/wiki/Community_Tech/Cross-wiki_watchlist https://en.wikipedia.org/wiki/Special:RecentChanges https://en.wikipedia.org/wiki/Special:Watchlist

EN WP RC - 7 million rows, the rest combined won't be bigger. Will the watchlist table be bigger?

Why do we want the megatable? So we don't have to worry about replicating across the shards We want to be able to have a table that says for the global user Kaldari - here are all the pages regardless of wiki, join that against table of RC to see which ones have been updated. That's the minimum use case. You should be able to do the whole thing w one join query.

Certain weird wiki-specific things won't be included, the basics should be able to do w one join. Not clear if it's possible to do it otherwise, since the dbs are spread across the db servers. Any app server can talk to any db. But you'd have to do multiple queries. It would get you murdered by Ops. Not if you do the 10 limit, but prefer the megatable.

Concerns w megatable:

  • You either have both the mega and the local tables, or have the mega that replaces the use cases of the local table.
  • That one is super risky, but the other is redundant
  • For Rc, we could have mega - there's already duplication already, and there's not that much better. but for WL it seems a bit insane to duplicate the whole thing.
  • At least we're not going to be writing to that table very often, way less than RC.

Roan will run stg later that will sum up everything, to see how big it really is -- benchmark is English WP revision table. Are there limits for size of a table? Prob a bad idea to have a table w 4 billion rows. It prob wouldn't break, but I'd be a little hesitant to ask that.

Once expiring WL items is a thing, the size of these tables might go down. We shoudl check in w Addshore and TCB folks about WL expiry. https://phabricator.wikimedia.org/T100508 Prob not going to go down, bc people won't go through and weed out, but the rate of growth would decrease. It may increase if we have Flow structured discussion -- WL village pump with structured disc would cause WLs to rapidly increase.

Are there alternatives to megatable that wouldn't make Ops really upset? It's prob the best way to go for now, good to talk to Jaime some more. If you look at table sizes, which he knows how to do -- the whole thing combined he estimates - assuming Enwiki is unreasonable large - you'd only have about 200 gigs of stuff, which could go on a normal db server and all fit in memory. You could have a fake local RC table that's a black hole. It has a trigger - when stg is inserted, insert it in the megatable.

Heard there have been stability problems w Central Auth? A good idea to keep both copies? Maybe. In the short term, def desirable. Semi-clearly not going to cause probs bc the local copies already exist and global copies are already on the server. We can defer figuring out ways to migrate. There's also a period when we're going to opt in, so there'll be both simultaneously. Stability: yes, CA has had probs, bc the CA db isn't sharded. It's scary to Jaime to host this db and CA db on the same server. If we screw up and XWL hammers the db, we've broken login for everybody. Not a good idea. But how do we do that? On the ohter hand, if you have to do local to global translation, you need that info replicated. Maybe you could work around replication. It's not RC caliber, but CA does change bc users are created all the time. But ID mappings are constant. We don't have to worry about that.

We should prob have a new server or set to host these megas w some high-end magic (talk to Jaime) which makes the megas - initially don't kill the local tables - prob replicate enough of the CA db over into this db so we can do the lookups w/o hosting them Probs w migration - tag filter on RC page now, you can look up RC only for VE edits. it's difficult to see how that would work, if the RC table doesn't exist locally. Should we see how big that table is? If you globalize the tag table, you'll infect evtg. The history table also has tag functionality. That will go crazy quickly. History would pull in tag. The namespaces - every wiki has its own set of ns. That means you can't really do a ns filter on the XWL, bc it doesn't translate across wikis. Can we do ns filtering on a one-wiki WL in new universe? RC has integer, that's okay. Minor bots Apart from tag filter, all RC features are safe. For some, you might have to do fake joins against local tables manually, but that's ok as long as you're not filtering. The only things you're filtering on are things in the table, or that are tags. So tags are a problem. W about bots? Does CA have botflag? There's an RC bot field. it's per edit, not per user. if you're a user w bot flag, you have the right to make your edits bot. not all bots are nec bots. some humans have a bot flag, like on wikidata - flooders, so you can mark some as bot. Dont' think there's actually a UI feature that helps you do this, only supported by API (flooding). https://www.wikidata.org/wiki/Wikidata:Flooders

Agreed that we'll have to have global user ids in the global WL bc nothing else makes sense We have the global user table w global user ids - the case we're worried about is a user that's not globalized. There are a few of those. That would be bad. They exist. There's a class called CentralIdLookup (<https://github.com/wikimedia/mediawiki/blob/master/includes/user/CentralIdLookup.php>). it will either give you an answer or zero. The few users who are unattached wouldn't be a prob, don't give them the feature. With current db schema, it's less easy than it shoudl be to find the global id, but in the RFc mtg, they said it was something they should fix.

Could we add local user id -- id to name mapping has to happen on local wiki. That's what Tim said was a stupid design flaw that we should fix. Make it a column and start populating it. Everyone said local user. Brad said he agreed we should add local user id, and also possibly add global id. But local id would already be good.

We need to talk to Jaime - he wrote on the task. it's worth getting more detailed thoughts. have to fiugre out what to do w tag problem. if we keep local RC tables, not that much data, then it wouldn't be prob. We should talk to Addshore, to see where he's at w WL related things. They were adding a new column w a name, could be sliced in difft ways - expiry, or potentially multiple WLs. don't know if they're still pursuing that? Or a new props table? They've already checked in a change.

Adam wants a timestamp field? There's already one that shows the last time you saw the notification, so we don't send you an We should find out what's up w WL schema changes, Ops will prob want to replicate the new structure after they make Adam's change.

Agreements:

  • Add local user ID to localuser table in centralauth DB. Probably global ID too.
  • Need global megatables for RC and watchlist
  • Colocating megatables with CA tables is a bad idea; so perhaps replicate localuser table to the megatable DB for lookup purposes

Open questions:

  • Do we remove local tables in favor of global tables? Hopefully not, because:
    • If we remove the local RC table, how do we keep the RC tag filter feature working?
    • If we remove the local watchlist table, how do we keep watchlists for unattached users working?
  • Can we clean out the watchlist table, e.g. by expiring inactive users' watchlist items? Maybe after expiration is implemented?
  • When is the wl_id schema change gonna happen?
  • Is the wl_timestamp thing still happening? What about the watchlist_props table?

Action items:

  • Roan to calculate total number of rows for both watchlist and RC.
  • Roan to talk to Jaime about details of DB plans, including megatables and watchlist schema change plans
  • Kaldari: Ping Adam Shorland (addshore; WMDE) and set up a meeting to talk about expiring watchlist and schema change plans
    • Pinged in the Phab task
  • Kaldari: File a task about CA table changes --> CommTech will prob have to do the table changes. (including backfill maintenance script)
    • Done - T141951