Talk:WMDE Technical Wishes/Gendered Categories

From Meta, a Wikimedia project coordination wiki
Jump to navigation Jump to search


Presumably this is heading towards a technical RfC at some point? I'm not sure what value this page adds ahead of that process. Certainly, anything that will require a database schema change in core must go through that workflow and will be the best way to get input on strengths and weaknesses, especially on performance/scalability concerns which I imagine you will have. Jdforrester (WMF) (talk) 15:05, 5 August 2019 (UTC)[reply]

Thanks for the reminder! I'd be happy to present this as a technical RfC as needed, but didn't feel that we had anything concrete enough. Implementation "B" seems far-reaching and heavy-handed, and I agree that we should have discussions about each of the hook integrations. Implementation "A" however already has the precedent of Template:PhabT, which makes me think that the hard redirect side of the feature could be implemented without an RfC. The second part of the feature, my proposed workaround where an "alias" category can be used to permanently used to change labels on an article feels like a big community discussion and a small technical discussion, but the discussions would be in that order regardless so also not quite ready for an RfC. Please feel free to correct my assumptions, of course. Adamw (talk) 08:01, 6 August 2019 (UTC)[reply]
@Jdforrester (WMF): There's a bit more to say: we're converging on implementation "A", and I believe I found a safe migration path. If we recommend that early adopters of category hard redirects only tag articles by using a Lua module (here), then potential rollback of the feature won't result in broken articles. If we have to rollback, we first toggle the aliasingEnabled boolean which causes the module to look up category redirects and substitute the target category into article wikitext. Adamw (talk) 09:10, 26 August 2019 (UTC)[reply]
@Adamw: Converting Category inclusion into a complex Lua call feels like it has major performance concerns, which is exactly why you would want to do an RfC before proceeding, surely? Jdforrester (WMF) (talk) 14:50, 26 August 2019 (UTC)[reply]
Just to be clear, when the new redirect behavior is enabled, the Lua module only performs string concatenation. If the redirect behavior is deployed, adopted, and then undeployed, that's when the Lua module is switched over to the fallback mode in which it performs expensive redirect lookups. Aside from the Lua module, there are other performance concerns which may be worth a technical RfC such as the additional redirect lookups from inside the parser, so I'd say I'm still exploring whether the proposed implementation is scary enough to need extra review. One more consideration on my mind is that we probably want the community RfC ahead of any technical RfC, because it's possible that this will be rejected for user-facing reasons. Adamw (talk) 15:27, 26 August 2019 (UTC)[reply]

Multi-language categories[edit]

Multi-language categories on Commons seem like a closely related feature. The only thing that differs from a user experience perspective is the ability to see the category name in the user's language.

Unfortunately proposal A in its current form cannot address this requirement by itself - there needs to be a way to indicate the language of a category. Since you consider proposal B as risky, perhaps we can find some way to indicate from wikidata the association between languages and Commons categories.

Even if you won't implement the per-user display as part of this project, please make sure that you don't take any technical decisions that would make it hard to implement this later on. Strainu (talk) 16:26, 10 August 2019 (UTC)[reply]

@Strainu: I had been imagining that proposal A would also serve for multi-language categories, so thank you for pointing out the challenges. The redirect categories would make it possible to tag an article with a few non-English category aliases, but I see what you mean, the desired feature would selects automatically between category names based on the user's interface language and we aren't planning to provide anything like that, yet.
In theory it's possible to implement both features orthogonally, in a way that won't cause interference between the two, but that's very difficult to anticipate. I think the current proposal A does have some overlap with the (very reasonable) Wikidata lookup you're suggesting, because we would have to decide whether we should look up the Wikidata label for the aliased category, the label for the target category, or one as a fallback for the other. How do you feel about that, does it seem harmless or like a serious obstacle to deploying this redirect feature on Commons?
There was a related presentation at Wikimania which you might enjoy, 2019:Technology_outreach_%26_innovation/Let's_completely_change_how_wiki_links_work. Amir Aharoni suggests that MediaWiki page titles could be generated from Wikidata labels and descriptions, which might be especially useful on Commons. Extending the same idea to categories will give the multilingual category behavior you're looking for, I believe. Adamw (talk) 09:26, 26 August 2019 (UTC)[reply]

Database-level representation[edit]

How would either of these be represented at the ParserOutput and database levels? For example, if you ask the API for the categories of the American director's biographical article with prop=categories, will it say "Theater director" or "Theatre director" (or both, like with template redirects)? Similarly, if you use generator=categories, which title(s) will it generate? If you ask for list=categorymembers for "Theater director", will it return nothing, or only the Americans, or follow the redirect before returning the members?

For "Implementation B", I also note that the syntax you suggest is already being used for specifying a non-default sortkey. As for using Wikidata, keep in mind that non-Wikimedia wikis without a local Wikidata available do still exist. Anomie (talk) 18:02, 13 August 2019 (UTC)[reply]

@Anomie: I appreciate the encouragement to refine my proposal, it helped me find a flaw. Here's a summary of the implementation for proposal "A". We're abandoning proposal "B" so I won't go into detail about that.
  • The Parser will follow category redirects, and will store the redirect targets using ParserOutput#addCategory and finally mCategories as if they had been included directly.
  • The Parser will also store the redirect information into a new ParserOutput field, mCategoryRedirects. These will identify the source and target of each included redirecting category.
  • OutputPage will calculate category links by deduplicating the redirects against the list of categories, preferring redirect source categories. In other words, the aliased category name will show up in category links, rather than the target. Clicking on the link will redirect to the target category, of course.
  • The page query API prop=categories will list the mCategories which include target categories but not the aliases.
  • The generator=categories API for the main category "Theatre director" will return all pages using either the main or the alias category.
  • The generator=categories API for the alias category "Theater directors" will return the empty set. Ideally, we could rewrite to follow the redirect and return the same as "Theatre director", but there are no hooks in that API module.
  • Searching for incategory:Theatre director will return all pages.
  • Searching for incategory:Theater directors will rewrite the query to follow the redirect, and will return all pages just as if the main category had been specified instead.
I'm on the fence about whether it makes sense to index articles under both the source and target categories, please do share your thoughts about that. Adamw (talk) 13:55, 26 August 2019 (UTC)[reply]
The fact that you're intending to have the categories list from action=parse (via ParserOutput) and the list from prop=categories be different seems likely to be a source of confusion. I suspect a feature request will be to have some way besides reparsing the page to get the alias used for each article, much like you can now get the sortkey, and returning that in the same format in all three modules (action=parse, prop=categories, list=categorymembers) would eliminate the potential confusion.
People may also want to query for just pages using a particular alias, but we could reasonably make that a second-class feature (see other things with the "Note: Due to miser mode, using this may result in fewer than $1limit results returned before continuing; in extreme cases, zero results may be returned." message) if we provide it at all.
I don't think that returning the empty set for the alias category would be much of a problem, assuming the target category is returned as "the" category from other queries. That would mean action=parse should return the target as "the" category with the alias used being indicated along the same lines as the sortkey, as mentioned above.
You seem to have confused generator=categories and list=categorymembers, where you said the former you seem to have meant the latter. But if prop=categories returns the target category, then generator=categories would logically do the same.
You mentioned "there are no hooks in that API module" as a problem; is being planned as an extension rather than being in core? I had assumed it would be in core, which would seem the cleaner implementation. Anomie (talk) 21:17, 26 August 2019 (UTC)[reply]
Great, it looks like action=parse should be extended to expose the new (and as yet unnamed) mCategoryRedirects field. What to return for prop=categories is still an open question, if we decide to develop something like I've proposed, I'll be sure to ask your opinion again, on how to make the APIs consistent. Thanks for these suggestions!
Agreed that core is the right place to make the changes. We would be fixing a core bug, and once changed it would be destructive to let wikis revert simply by disabling an extension. That said, I do expect to guard the new feature with an internal feature flag to allow for rollbacks during a provisional period. Adamw (talk) 09:23, 27 August 2019 (UTC)[reply]
In case example would help, here are some outputs of action=parse, prop=categories, and list=categorymembers. I'd envision those to remain largely the same after the change, just with "alias" included alongside "sortkey"/"sortkeyprefix" and such (when the relevant new value is added to clprop or cmprop, probably). Anomie (talk) 13:12, 27 August 2019 (UTC)[reply]
BTW, something else to keep in mind: If an editor decides to make "Theater directors" no longer be a redirect to "Theatre directors", all the articles using the former would need to be reprocessed to update the categories along the same lines as if someone changes a template redirect. You'll need to make sure that happens correctly. Anomie (talk) 13:16, 27 August 2019 (UTC)[reply]