Consolidating vs breaking up

From Meta, a Wikimedia project coordination wiki

I was reading cunctators response to the suggestion of deletion of orange juice in wikipedia. He said "it is always better to break up than to consolidate". I'm not so sure, and I have thought about that several times. If this has been discussed already somewhere, please put a link to that place. What is better, to have six half pages articles about some related issues, or one three page one?, with the others redirecting to the big one? Several articles are good for our "article count", but i think we can work past that objective, and really do what is more useful to the public reading the wikipedia. Could we have a "optimum size" for an article, so we begin to break it up after it has passed by some amount that threshold? AstroNomer

Looking at how Wikipedia has evolved architecturally, I think seeking a single "optimum size" for an average article would be misleading, because the only good answer is, "well, it depends." One could quip "articles should be long enough to describe their topic, but no longer," however, that isn't very helpful at answering the question.

Personally, and as I've written provocatively before, I find short pages (less than a couple paragraphs) to be undesireable. Others have written about what the "minimum" length for an article to be considered an article. Find or fix a stub for instance. One could conclude that if one hasn't more than a paragraph worth of content to write in a given topic, one ought to add it to an existing page rather than start a new one. Similarly, in a really long article if there is a good solid run of several paragraphs that can be distinguished from the main topic - such as a very detailed "History of Foo" section within the "Foo" article - that would make a good topic to break out. If one followed this practice, then the rule could be "it is always better to break up than to consolidate", as consolodation makes sense mainly when you need to eliminate redundant (or vestigal) stub pages.

As to the more general question about appropriate page length... hmm. I think, looking at the existing articles, you can divide them into three rough 'styles' of articles: "Portal Pages", which are high level, generic umbrella topics that refer to a large number of sub-topics; "Subject Pages" which are the classic 'article' that has a single, specific focus to describe; and "Essay Pages" which go into great length explaining a theory or outlining in great detail a lot of aspects of the topic.

With Portal Pages, it feels like the desire is for "concise completeness". A good Portal Page would provide an informative introduction to the topic (more than just a dictionary definition, but not rambly). It would include sublinks to Essay Pages that expound on general aspects of the topic (e.g., "History of Foo", "Philosophy of Foo", or "The Foo Charter"), Subject Pages describing subtopics/fields/subdisciplines (e.g., "Foo engineering", "Archaeofoology", "Foo poetry", or "Foo commerce") resource lists ("Listing of Foo Practitioners", "Listing of Foo Cooking Recepies", or "Listing of Noteworthy Foo Examples"), and related/allied topics (e.g., "Bar", "Buzzwords", or "Technical jargon", not to be confused with "Fool" or "Spoo"). Mature Portal Pages appear to be on the order of 2-3 "pages" (if printed out), with about half prose, half list of links. An important purpose of these pages is to provide jumping off points into subtopics or allied-topics. Sometimes these are provided as lengthy bullet lists, other times they're interspersed within prose, and sometimes they're provided in categorized groupings. Personally I like the third approach the best, but there are plenty examples of each style, and they all seem to fulfil the purpose of providing jumpoff points adequately. Some Portal Pages strive to provide a complete list of all potential topics within them, whereas others narrow their list to only topics that have articles written for them. I tend to prefer the latter approach (as I've written in Improving Portal Pages, but there are pros and cons to each approach.

Subject Pages differ from Portal Pages in that they focus on the topic almost exclusively, and seek to scope out the "width and breadth" of it. Links are less important than they would be on Portal Pages, and tend to just be interspersed within the body of the article, perhaps with some "See also" links at the bottom to appropriate Portal Pages. A good, thorough Subject Page could be 2-4 pages, but would be mostly running prose. The length of the page obviously should correspond to the "importance" and "richness" of the subject. Subject Pages that run long may very well be good targets for breaking out further, perhaps even resulting in the Subject Page metamorphizing into a Portal Page if . Or, perhaps the topic is already narrow enough, and the Subject Page could be further researched and grown into an Essay Page.

Essay Pages are like greatly elaborated Subject Pages. Whereas Subject Pages may strive to stick to "Just the Facts", Essay Pages deliberately delve into the "How"s, "Why"s, and "What-fer"s, and their length will bear little correlation to the "importance" of the subject. An essay on some obscure 16th century poet might run into the dozens of pages, whereas an essay on the Theory of Relativity might call it good with 3 pages and a goodly list of "recommended reading". The critically important requirement of an Essay Page is _research_ ; they are not mere opinion pieces (which belong in Talk areas or here on Meta). Assertions, ideas, thoughts, and extrapolations should be provided with backup data and evidence. A good Essay Page will also include a listing of source references and indications from whence various thoughts and ideas were derived (to allow other reviewers/editors to do fact and copyright checking).

I would imagine it's doubtful that one could make a hard distinction between these three types of pages, and that especially since one style can evolve into another in time, there's probably bunches that are in transition from one form to another. But in thinking about "How long should an article be?", perhaps this notion of three styles would be helpful in figuring out and planning where to go with an article.

Also, in my opinion, I think all three types of pages are vital to Wikipedia's success. Portal Pages are of obvious utility in an index/contents sense, to help you browse to the topic you're interested in. Subject pages are good for answering basic factual questions, and are the "meat and potatos" of an encyclopedia. Essay pages are the real gems of an encyclopedia, though, in that they take the work beyond being merely a glorified dictionary and actually providing some scholarly enlightenment to the reader.

-- BryceHarrington - Sept 2002


There's been some discussion on Wikipedia-L about whether splitting long articles at their headings is better than providing in-page anchors. Either of these would allow parts of an article can be pointed at independently.

Maverick wrote:

[anchors] may not be subpages but they will most certainly lead to the creation of needlessly long pages. There is nothing more intimidating to edit than a huge page therefore we shouldn't encourage their creation.

While I agree that ease of editing is an important consideration, I believe that a split page is significantly more intimidating to edit than a long page. I've made a start on a list of points below. Matthew Woodcraft


Editors' point of view[edit]

  • Reasons to split
    • Long pages may have fuzzy topic boundaries
    • If the page covers a contentious subject (politics, religion, etc.), and the page is long/broad, arguments about the subject will get mixed in to the subject itself
    • short pages have clearly defined subject boundaries
    • Long pages are intimidating to edit
    • The edit history can get inconveniently long


  • Reasons to consolidate
    • Moving text from one page to another requires two synchronised changes
    • Creating a new page is harder than adding a heading
    • Removing a page is much harder than removing a heading
    • Renaming a page is much harder than renaming a heading
    • The history for a series of related edits is visible in one place


Readers' point of view[edit]

  • Reasons to split
    • If pages are split, it's easier for other articles to link to the relevant bit (unless in-page anchors are implemented).
    • Short pages download more quickly


  • Reasons to consolidate
    • One long page is easier to export for use elsewhere, or make into a useful printed form, than a collection of separate pages.


Technical issues[edit]

  • Reasons to split
    • Articles start running into browser limitations at 32k [1]
    • Implementing in-page anchors is feature-creep in the software
    • Edit conflicts are less likely with shorter pages
  • Reasons to consolidate


[1] ...the 32k limit with many browsers running on MacOS, which hits 83 articles, 60 article talk pages, and a number of user: and wikipedia: pages (Brion VIBBER, October 2002)


Questions[edit]

Can we come up with a guideline for when a page is 'too long' (a number of words, or number of bytes)?

Can we come up with guidelines for what sort of headings are usefully split away, and what sort are better in the main article?

an example of when to merge:

  • when the minor pages just reiterate what is said in the main page -- for example en:Gules just says it is a colour in heraldry, and the Heraldry article says that too
The rule of thumb that I suggested on the mailing list is based on the following:
    • Don't subdivide an article with fewer than 10,000 bytes.
    • Look for ways to subdivide when the article is longer than 20,000 bytes.
    • When subdividing look to do it in large chunks. The article that brought the issue to my attention was the one on Winnie the Pooh which links to separate articles for each of the characters (Tigger, Eeyore, Kanga etc.), each with fewer than 600 bytes. Assuming that the article would fall into the above criteria for subdivision (which it didn't) it would have been more reasonable to split of Characters in Winnie the Pooh as an entire article. Eclecticology