Community Wishlist Survey 2020/Wikisource/Template limits

From Meta, a Wikimedia project coordination wiki
Jump to navigation Jump to search
Random proposal ►

 ◄ Back to Wikisource


  • Who would benefit: Every text on every Wikisource is potentially concerned but obviously the target is the text with a lot of templates (either the long text, the text with heavy formatting or both).
  • Proposed solution: I'm not a dev but I can imagine multiples solutions :
    • increase the limit (easy but maybe not a good idea in the long run)bad idea (cf. infra)
    • improve the expansion of template (it's strange that "small" template like the ones for formatting consume so much)
    • use something than template to format text
    • any other idea is welcome
  • More comments:
  • Phabricator tickets: not exactly the same but there is phab:T123844
  • Proposer: VIGNERON * discut. 09:28, 24 October 2019 (UTC)

Discussion[edit]

  • Would benefit all projects as pages that use a large number of templates, such as cite templates, often hit the limit and have to work round the problem. Keith D (talk) 23:44, 27 October 2019 (UTC)
  • for clarity, this is soley about the include size limit? (There are several other types of template limits). Bawolff (talk) 23:14, 1 November 2019 (UTC)
    @Bawolff: What usually bites us is the post-expand include size limit. See e.g. s:Category:Pages where template include size is exceeded. Note that the problem is exacerbated by ugly templates that spit out oodles of data, but the underlying issue is that the Wikisourcen operate by transcluding together lots of smaller pages into one big page, so even well-designed templates and non-pathological cases will sometimes hit this limit. --Xover (talk) 12:02, 5 November 2019 (UTC)
  • @VIGNERON: unfortunately, various parser limits exist to protect our servers and users from pathologically slow pages. Relaxing them is not a good solution, so we can't accept this proposal as it is. However, if it were reformulated more generally like "do something about this problem", it might be acceptable. MaxSem (WMF) (talk) 19:32, 8 November 2019 (UTC)
    • @MaxSem (WMF): thank for this input. And absolutely! Raising the limit is just of the ideas I suggested, "do something about this problem" is exactly what this proposition is about. I scratched the "increase the limit" suggestion, I can change other wording if needed, my end goal is just to be able to format text on Wikisource. And if you have any other suggestion, you're welcome 😉. VIGNERON * discut. 19:54, 8 November 2019 (UTC)
  • The problem here is that almost all content on large Wikisources is transcluded using ProofreadPage. I noticed that the result is that all the code of templates placed on pages in the Page namespace is transcluded (counted into the post-expand include size limit) twice. If you also note here that except the templataes, Wikisource pages have a lot of non-template content, you will see that Wikisource templates must be tiny, effective, etc. And even long CSS class name in an extensively used template might be a problem.
@Bawolff and MaxSem (WMF): So the problem is whether this particular limit has to be the same for very large, high traffic wikis like English Wikipedia as for medium/small low trafic wikis like Wikisource? I think that Wikisources would benefit much even if raising it for 25-50% (from 2MB to 2.5-3MB)
Another idea is based on the fact that Wikisource page creation idea is: create/verify/leave untouched for years. So if large transclusion pages hit a lot parser efficiency, maybe the solution is to use less aggressive updates / more aggressive caching for them? I think, that delayed updates would not be a big problem for Wikisource pages.
Just another idea: in plwikisource we have not pages hitting this limit at the moment due to a workaround used: for large pages we make userspace transclusions using {{iwpages}} template, see here. Of course, very large pages may then kill users' browsers instead of killing servers. But I think this is acceptable if somebody really wants to see the whole Bible on a single page (we had such requests...). Unfortunately, this mechanism is incompatible with the Cite extension (transcluded parts contain references with colliding id's - but maybe this can be easily fixed?). Also, a disadvantage is that there is no dependencies to the userspace transcluded parts of the page(s) (but maybe this is not a problem?). Ankry (talk) 20:04, 9 November 2019 (UTC)
Yeah, depending on just exactly what the performance issue that limit is trying to avoid is, it is very likely a good idea to investigate whether that problem is actually relevant on the Wikisources. Once a page on Wikisource is finished it is by definition an exception if it is ever edited again: after initial development the page is supposed to reflect the original printed book which, obviously, will not change. Even the big Wikisources are also tiny compared to enwp, so general resource consumption (RAM, CPU) during parsing has a vastly smaller multiplication factor. A single person could probably be reasonably expected to patrol all edits for a given 24-hour period on enWS without making it a full time job (I do three days worth of userExpLevel=unregistered;newcomer;learner recent changes on my lunch break). If we can run enwp with the current limit, it should be possible handle all the Wikisourcen with even ten times that limit and barely be able to see it anywhere in Grafana.
Not that there can't be smarter solutions, of course. And I don't know enough about the MW architecture here to predict exactly what the limit is achieving, so it's entirely possible even a tiny change will melt the servers. But it's something that's worth investigating at least. --Xover (talk) 21:50, 9 November 2019 (UTC)
@Ankry and Xover: thanks a lot for these inputs, raising even a bit the limit may be a good short term solution but I think we need more a long term solution. I think the most urgent is to look more into all the aspect of the problem to see what can be done and how. Cheers, VIGNERON * discut. 15:01, 12 November 2019 (UTC)