Talk:Article count reform

From Meta, a Wikimedia project coordination wiki

I don't think many people agree with my idea but my proposal is estimate legitame number of articles from the size of database rather than count actual articles. The most important advantage is it is really simple to calculate. The simplicity is important because we don't want to increase or decrease number of articles in the future. Besides, it is proof to ad hoc technique to manipulate the number of articles like putting comma. -- TakuyaMurata 01:21 12 Mar 2003 (UTC)


What does mean "at least one vote from the first class wikipedia edition" ? What is first class ? I suppose I'm not in the 1rst class. Arno

I haven't a clue about what it means either, and I've been around for a year. Eclecticology 07:51 12 Mar 2003 (UTC)

I think it just means "major language-versions" without being clear how to make that judgment. A better way would be "at least one user from # different language-wikipedias should participate." The number has to be chosen arbitrary, still. If without broad participation, though, this issue could spring up again as other language-versions grow, and we might have another round of discussions and votes. Tomos 09:02 12 Mar 2003 (UTC)

I still like my idea of an <ARTICLE> Tag where that article will be included in the article count.

-fonzy


I assume all variants do not count REDIRECTions? It is not mentioned, as far as I can see, but several of the schemes may include them unless they are explicitly not counted.

Yes, IMHO redirects should always be excluded. --Eloquence

I'd like to see something like a 100 bytes as a threshold for most discussions at our pump involved a 100 or 200 bytes threshold. The options proposed right now are maybe quite stretched with what some of us thought the best solution.

Second point : I'd like the day limit (tomorrow) to be a bit delayed (at least one more day). 24 hours is very short to translate the proposition if needed, and is also short to translate propositions made in our languages back to english, if the proposer does not speak english. This is doubly important because the server is basically stuck in our day time, and mostly evening time, where most of our contributions occur. If people are not able to connect today, they won't have the opportunity to comment anything before vote start. This is a major problem

anthere

Do we really need a translation? That would only further complicate the voting process. --Eloquence
We don't *need* a translation. But unless you want to restrict the voting process to english-readers, we have to allow time for those who don't read it to ask questions and comment. I didnot put any translation on the french wiki, but, I roughly explained the object of the vote here, and indicated that meta page for more information. However, if one person tells us she would like to participate, but doesnot really succeed to follow the process, it might be fair not to exclude her on her poor ability to read english. It does not complicate 'anything' for english speakers, it is just to us to manage this on a case to case basis. It just requires a little more time. This is doubly true with Wikipedia working so bad.

Yendred asks 'Why not using a different system for each Wikipedia ?'

and what about this idea ?

How will the voting for the Further Restrictions section work? Some people may think that more than one option should be adopted, i.e. the comma criteria and the non-whitespace criteria, but if each option is just going to get a numerical average then how do we choose. If the six options end up with 5.4, 3.6, 2.8, 1.9, 4.5, 3.4 respectively then which ones do we adopt? Ams80

If necessary we will have an additional voting stage. --Eloquence

This comment applies to the whole article size issue so I'm not sure where to put it, please feel free to move it somewhere less intrusive if you wish. There seems to be an issue with compactness of different written languages. If we choose to define articles by being a certain size, i.e. 250bytes, then perhaps there should be a scaling factor for each language. To do this we could take a passage in English with a certain size, say 2500bytes and get a native speaker of each language to translate it to their language. If in Japanese (for example) the translation is 1500bytes then the criteria for an article in Japanese should be 150 bytes instead of 250bytes. [Or, idea just came to me, find a very common text, War and Peace or a Dickens novel or even the Bible and use that as the base comparison.]


Thank you (whoever it was) for clarifying that the average to be used is mean. I happen to think that the method used is a poor one, given that if an option is comprehensively ignored by everyone except a single supporter who rates it best, then it'll win. There's also the possibility of confusion as to whether "1" is the highest or lowest score one can give to an article - important in a multi-language situation, I would have though. However, I'll be happy to be proved wrong on this - just noting my objection for the record... ;-) MyRedDice

Obviously, any voting system needs a certain participation rate to work properly. We will just use common sense here, but if we choose to use the same voting system in the future, we will have to define a minimum number of votes.
As for the order, 1=very good .. 6=very bad is the system used by many schools, so some people are already familiar with it. Lower=better is also reminiscent of approval voting, which we have previously used. -- Eloquence

Any objections against droping the "comma" criterion and including the pros and contras in "Language-dependent punctuation"? The intention ist the same, so why make it even more complicated? --Kurt Jansson 16:29 13 Mar 2003 (UTC)

At least one french suggests to keep the comma method...could it be available in first choice ?


Someone also says he doestnot understand what this proposition "Frequency Table by vigintiles" is. I fear I absolutely don't see ever. Explanations ????

Basically this would sort the set of articles by size, then chop it up into 5% (1/20th) chunks and report the range of sizes in each chunk. So you might show a table something like:
        5% of pages are <= 100 bytes
        10% of pages are <= 200 bytes
        15% of pages are <= 350 bytes
        ....
        95% of pages are <= 32000 bytes
       100% of pages are <= 99999 bytes
(I made up the data). --Brion VIBBER 19:27 13 Mar 2003 (UTC)



I really don't like this type of voting. Anybody who votes after many others have will be influenced by the preliminary results. This, IMO, is evil and can only lead to groupthink. Hiding the preliminary results from voters is the only way to go - but that requires a software solution. I for one am not voting because it is already clear that the non-zero option is going to win and my idea of having a Template:HEADINGARTICLECOUNT option set on a per wiki basis didn't make it on the ballot (I can't edit from home right now and my time at work is very limited for that sort of thing). So, why bother? --Maveric149


Transparency works both ways -- it may encourage groupthink, but it also lets you use strong votes to express dissent with options you dislike that seem to be winning, esp. in a preferential system. In my experience, hiding votes sounds like a good idea at first, but works badly in practice because what is often snobbishly called groupthink is really valuable information that you don't want to do without once you lose it. Should I bother reading this option if all people I trust have rejected it? Should I maybe give this a closer look if people I don't trust have buried it in negative votes? I also find it funny that the argument "groupthink" should be used against open voting, whereas we try to strive for consensus without voting whenever possible, a process which is much more likely to encourage groupthink ("Gee, I don't want to stop the consensus!").
Secrecy is necessary for real world votes, but here we can afford some openness. It's the wiki way. As for your idea, next time we do this we need to allow more time for suggesting the options, but I think that esp. with a preferential system that relies on high participation, we need to stop accepting options once we move to voting. --Eloquence 23:35 14 Mar 2003 (UTC)
In proper elections we have opinion polls and exit polls, so it's not wildly different to have public voting in terms of groupthink and strategy and the like.
A very good reason to have public voting is that on the internet it is easy to stuff ballot boxes if they are kept secret. IE, create a new account, vote with it, create a new account, vote with it, repeat until your option has won. When the voting records are public the possibility of ballot box stuffing is a lot easier to detect, though difficult to prove of course. MyRedDice

Im assuming all this multiple voting is to create a curve, with the top three choices or such becoming the real ways of doing it. As far as democracy, I dont see whats wrong with it... I do want to know whats wrong with radio buttons, and form submitted data... no stepping on each others toes with edits, and larger voting pool, potentially. This way is so... 1985. -Stevertigo

Obviously wiki voting is very cumbersome. (Of course it reminds us that we really need a better way of handling edit conflicts.) However, before we implement anything in software, we need to agree on a process. Wiki allows us to experiment with various ideas without actually hacking something. "Just use forms" sounds simple enough, but the question is, how do we implement this in wiki syntax? Should votes get their own namespace? Should we support different types of voting? Who can create polls, when can options be added? Are polls announced somewhere? Do we maintain a vote history and if so, how comprehensive? How is the deadline determined? Should actions based on polls (e.g. page deletion) be taken automatically by the software? Should there be anonymous voting and if not, are there any other criteria? And so forth, and so on. --Eloquence 23:41 14 Mar 2003 (UTC)

Yours is a curious comment Mav. Why are you not voting because the solution you don't like is not gonna make it ? That's no justification for not voting, not saying your word. Especially if you really care about the outcome, and are worried of w:groupthink. On the contrary, that's the perfect moment to clearly state your opinion. Who knows, some might follow you by habit :-) jk.
Let's make a deal...tomorrow, we both vote 6 for the 0 bytes option. and both vote 1 for the 100 bytes. 250 and 500 are hopeless anyway. Then, we just rely on a last vote at the right place...

OK all the above comments have made me decide to vote and retract my accusation of groupthink (exit polls in real-world elections are still evil, evil, I say). ;) --Maveric149


I still don't see where the community decided by consensus to accept the results of this vote. With that in mind, the talk of various proeblems with the voting procedure isn't too important. Anyway, here comes my vote, with the usual proviso that I'm providing my opinion for information and that my vote doesn't count if the vote results are to be accepted as authoritative. -- Toby 21:11 16 Mar 2003 (UTC)

Somehow the community said it would agree with the result in agreeing to vote, and not claiming that this vote was a foolish idea (is foolish right here ? - I just mean nobody said "eh, voting, no way !"). It was an experiment right ? The outcome is not very important (though...I confess, I started a stub hunt), what is important is the lesson we had. What is working, what is not, what is unpractical, delay before voting, length of the voting period, if this type of voting would be acceptable for any type of problem. Much to say.

I don't think that it's fair to say that the community agreed that the vote should count because it voted. People may instead believe that the vote would count and didn't want to be left out. That's the only reason that I vote in political elections, for example. But yes, it was an experiment, and an interesting one. -- Toby 02:48 23 Mar 2003 (UTC)

There is one thing I can say right now. If the second part of the vote appears to be rather consensual (rather), the first one obviously is not. When one compute the different results of each proposition, and apply some stats interpretation on it, given the size of the sample, statistics show there are quite a number of results that are statistically not different. Say...if options show a result ranging (I didnot check after the last votes) between 3.2 and 3.6, I think it is a *fantasm* to pretend that this is a "consensual" result. If the 0 vote had got an average 2 and the 250 had an average 6, I would have said the outcome was quite acceptable. For a *vote*. Here I don't think it is. This type of situation should lead either to a revote between the two or three proeminent ones, or rediscussion round, or new propositions. There could be a minimum level for the first one to be accepted. In any case, such a voting technique is ok for this type of issue, but won't be for any major issue. ant


Since nobody else does, I just calculated the # of votes and points for each size-option. Hope someone can check them -I'm sure I've made errors somewhere...

option, score, # of votes

0, 124, 39
5, 159, 36 
20, 145, 38
100, 128, 37
250, 149, 37
500, 171, 37
dyn, 179, 35 *
comp, 151, 33
  • I excluded "62.255.64.6" on "dynamic" as invalid. If counted, the numbers would be 185, 36.
  • I counted Toby's votes as valid, in spite of the reservation he expressed above. If not counted, the results would be:
0, 123, 38
100, 123, 36

(others ommitted)

In any case, the result seem to indicate zero is the most preferred choice by small margin.

I don't know how to deal with the "further restrictions" since those options are not mutually exclusive (two or more can be implemented) and there was no discussion how many we would use, etc. Tomos 04:22 18 Mar 2003 (UTC)


Here are the votes for the further restrictions. I have counted all votes:

  • No further restrictions: 32 votes, sum 160, average 5 (bad)
  • Comma: 38 votes, sum 212, average 5.6 (very bad to bad)
  • Language dependent punctuation: 35 votes, sum 166, average 4.7 (bad to rather bad)
  • Link: 36 votes, total 114, average 3.2 (rather good)
  • Stub flag: 34 votes, total 156, average 4.6 (bad to rather bad)
  • minimum edits: 32 votes, total 164, average 5.1 (bad)
  • minimum contributors: 34 votes, total 164, average 4.8 (bad)
  • two paragraphs: 32 votes, total 165, average 5.2 (bad)
  • <ARTICLE> tag: 34 votes, total 176, average 5.2 (bad)
  • independent systems: 30 votes, total 132, average 4.4 (rather bad to bad)
  • data base size: 30 votes, total 163, average 5.4 (bad to very bad)

The question is: What do we do with these. I see two options:

  • Choose the link option, since it scored much better than all other choices
  • Find a method that does give a clear decision that can also combine systems. The options "comma", "minimum edits", "two paragraphs", "<ARTICLE> tag" and "data base size" are being rejected (1) because they scored worse than "no further restrictions", the others are pitted once again, but now with the option of choosing any number. This is not as bad as it seems, as 'no further restrictions' is simply the choice of doing none of the others, while 'independent systems' is, I think, just an option of its own, but still the number of options will be quite large:
  1. No further restrictions
  2. Independent systems
  3. Language dependent punctuation# At least one link
  4. Introduce a stub flag
  5. Have a minimum number of contributors
  6. Punctuation and at least one link
  7. Punctuation and stub flag
  8. Punctuation and minimum number of contributors
  9. Link and stub flag
  10. Link and contributors
  11. Stub flag and contributors
  12. Punctuation, link and stub flag
  13. Punctuation, link and contributors
  14. Punctuation, stub flag and contributors
  15. Link, stub flag and contributors
  16. Punctuation, link, stub flag and contributors

(1) I'm regretting that I gave '5' to "two paragraphs"; I have begun to like this option since then. My comfort is the thought that even if I had given it a 1, it would still be on the drop-out pile. Andre Engels 14:01 18 Mar 2003 (UTC)


It's quite simple: We will pick the best rated option from each stage and choose that one as the winner. The link option can be perfectly combined with the non-zero option and should make those people happy who objected to the non-zero option.

Thus, the result is:

1) An article is counted if, trimmed of all trailing whitespace (blanks, newlines etc.), it is longer than zero bytes (non-empty) AND

2) it contains at least one link.

As for whether we should abide by that result, yes, we should. Jimbo has decided that we will use voting to determine what to do with regard to the article count. Unless he judges the experiment a failure, this is what we will do.

I didn't notice Jimbo saying that. In any case, since you are (I think) the only person bothering to do the programming for this, I don't consider it a violation of sacred wiki principles that you chose to decide your actions by this vote. I just wouldn't want it to catch on. ^_^ -- Toby 02:48 23 Mar 2003 (UTC)

Please see my message to Wikipedia-L for further analysis and the full tabulated results for both stages. --Eloquence 14:22 18 Mar 2003 (UTC)


Actually: part 1 could be removed, since any article with at least one link will have [[ and ]], so it will necessarily be longer than zero bytes after removing whitespace. Andre Engels 16:06 18 Mar 2003 (UTC)


I do not agree with the interpretation of the voting recommending "zero bytes + at least on link". The whole "other restrictions" part scored worse than the "byte counting" part, which I would interpret as meaning that people preferred a "byte counting" over a "combined with other restrictions" model. But then, the models were not well defined in the first place. -- Schewek 17:10 18 Mar 2003 (UTC)

I think the interpretation is fairly obvious: Few people thought that "no other restrictions" besides the byte count would be a good idea (the score for that option being 5.1875). So it's clear that we should pick the best rated of the other restrictions. That's the reason the "no other restrictions" option was there in the first place: To determine whether we would pick any of the other options. --Eloquence 19:18 18 Mar 2003 (UTC)
True, there was the "no other restriction" rejection. But apparently none of the suggestions was attractive. - Well, such is life... -- Schewek 20:35 18 Mar 2003 (UTC)
That option proposed counting pages outside of article space, which doesn't make much sense. That at least is why I voted against it. If that was not the intention, then the results of the vote are invalid. --Brion VIBBER 02:23 19 Mar 2003 (UTC)
The vote header notes: "Please note: All article count proposals (including non-zero) do not count articles that contain nothing but whitespace (blanks, tabs, newlines), and do not count redirects." So your interpretation does not make sense. --Eloquence 10:12 19 Mar 2003 (UTC)
The contra list for that proposal clearly states, "we still must not count redirects, user pages, talk pages, etc." If that's a contra to "No further restrictions: Only the above size criterion should be used", then the only interpretation that I can find is that it is proposing that redirects, user pages, and talk pages be counted, which would be generally regarded as insufficiently useful a count -- hence the contra statement. Can you provide an alternate interpretation or explanation of what was intended? --Brion VIBBER 21:23 19 Mar 2003 (UTC)
I didn't write that counter argument, it is incorrect. Sorry, but I can't respond to every incorrect statement that is made in a discussion. I wrote the clarification on top of the vote, which should have been sufficient, but apparently was not. --Eloquence 18:10 20 Mar 2003 (UTC)

I've thought of the obvious way to find the best "middle position" on size of article - though it's too late for that now. Simply ask people to put down a number as their preferred minimum size, and then find the median (not mean) of all the votes and use that as the minimum size. I'll remember for next time.

That would make more sense. They should do that in political elections too ^_^. -- Toby 02:48 23 Mar 2003 (UTC)

I think we should have a vote as to whether we should accept the results of the vote!

  • Accept the vote outcome as an acceptable compromise
    • +: matter settled for the time being user:anthere :-)
    • -: voted solution may not be best
    • -: gives credence to voting as a decision-making process, which some disagree with
    • (1): MyRedDice
  • Do not accept the vote outcome
    • +: may find a better compromise
    • -: takes more time to reach decision
  • Accept the vote outcome where it indicates a clear consensus but not where the results are at all close
    • +: shows where voting is useful and where it isn't
    • -: some things may need to be gone over again
    • (1): Toby
I don't think we need such a vote. The results are reasonably clear even if some people misunderstood the "no restrictions" option. So do we need a vote on whether we need a vote to accept the results of the vote? And a vote on that one, too? Or can we just get on with trying to build an encyclopedia?--Eloquence 18:10 20 Mar 2003 (UTC)
Well, my vote is only on the assumption that this vote isn't considered binding either. ^_^ -- Toby 02:48 23 Mar 2003 (UTC)