Meta talk:Language proposal policy/Community draft/Archive 1

From Meta, a Wikimedia project coordination wiki

Unique and workable criterion

The following discussion is closed: The proposed criterion was not used.

I propose the following modification to the language policy in order to unifying the criterion for allowing wikis.



  • what kind of languages can have wikis?

any that has a standarized writing system and at least 100 writers/readers worldwide to form a viable community and audience.



1.- wikimedia proyect is basically a writing project. so, every language must have a written form. and that written form has be standarized. otherwise it's imposible to make.

2.- native speakers, it is no longer mandatory. so, they could be natural, artificial or classical languages.

3.- writers and readers are linked and interchangeable concepts. a writer could be a reader and viceversa.

[C.L.] unsigned by 01:56, 25 June 2008

"any that has a standarized writing system and at least 100 writers/readers worldwide to form a viable community and audience." - It appear to contradict (or at least make it difficult for fulfilling) one of the messages in our Fundraising 2007

“Wikimedia Argentina has a very ambitious agenda in front of us. Next week we are getting together with native tribes, hoping to stimulate the creation of Wikipedia in their own languages. There has been a lot of discussion between the tribes on how to write these oral languages. The irony is that we are not only introducing Wikipedia to them, but also introducing the wiki way of consensus decision making, so that they can figure out how to best do this,” he says.

( from Fundraising_2007/Testimonials/The_ideal_Wikipedian). In any case what "standardized" means needs to be refined. - Hillgentleman 02:06, 25 June 2008 (UTC)
standardized writing system is not a useful criterion. I have not counted, but my quick guess, looking at Special:Sitematrix, is, that about 50 % of the existing projects will fail this criterion. And standardized writing system is not a criterion answerable with yes or no nor is it measurable. What is standardized? There's no easy answer.
My proposal can be found at User:Slomox/Languages. It asks for 1000 real speakers for a Wikipedia and has own criteria for the other project types. Additionally it names criteria for the activity of the test projects.
Speakers are perfectly measurable and test project activity too. --::Slomox:: >< 02:24, 25 June 2008 (UTC)

The proposal seems OK to me. I don't think that deciding which languages are standardised need cause any problems if we specify. There should be something that scholars of the language see as a literary standard for the language; if their literature does not make it clear that there is at least one standardised form, proposers should be able to show us dictionaries and grammars of the language as well as some works of non-fiction literature to give examples of a written standard. If a language has more than one literary standard, as does modern English, the proposers should show us what policies they have to deal with this; accepts both major written standards with less difficulties caused than would be caused by having to choose one.

I completely agree with consideration 2 (although I would say that ;-) What matters is that people can understand what is written, not whether it is their first language. For hundreds of years, Latin was the main language used by academics in Europe even though it had no native speakers; under the current policy, had the Internet and wikipedia existed in 1600, Latin would not have been allowed to have its own wikipedia. Even today, Vicipaedia probably imparts much more knowledge around the world than many of the wikipedias that are in smaller living languages, whose bilingual readers, just like the readers of Vicipaedia, are all able to get their information from wikis in other languages. Many, if not most, of en.wikis reader do not speak English as their first language, and if every first-language speaker of English died tomorrow it would still be a useful project; it is useful because it is understood, not because it is anyone's first language. And finally, if the language subcommittee don't think that wikis in widely understood artificial languages conflict with the aims of the wikipedia project, their is no reason for them to think that wikipedias in "dead" languages which are taught in hundreds of schools and universities around the world do conflict with them.

I would say that we need more that 100 writers/readers to make a viable wikipedia; if the test projects can create 100 or more decent articles and attract 10 or more editors (obviously, far more people will edit the actual wiki than will edit the test project; the higher the profile of the project, the more people will edit it) then we will know that the language will be able to support a wikipedia. LeighvsOptimvsMaximvs (talk) 19:39, 26 June 2008 (UTC)

I would decouple readers and writers

I don't think it is accurate to say that the number of readers and writers is necessarily even closely linked and interchangeable. I can easily imagine there are scripts that are even widely read but nobody or nearly nobody writing them. -- Cimon Avaro 20:21, 26 June 2008 (UTC)

a little change

i would change for "at least 100 literate people worldwide in the language proposed". it is the universe of potential editors/consumers (or potential writers/readers) of the project. all of us begin reading articles in wikipedia before we tried to contribute. the nub is that we had to be able to understand the language in the writing form (consumers) for futheremore to be able to contribute (become editors). without literate people, i think is inviable any wikimedia project.

Eligibility based on literacy (not native users), ISO codes for extinct languages

The following discussion is closed: Dovi gave kudos to the change to "literate people", and tweaked the ISO code requirements to meet GerardM's objections about extinct languages (GerardM continued to object).

I think #4 is excellent:

The proposal has a sufficient number of literate people worldwide to form a viable community and audience.

Cudos to whoever came up with this fantastic formulation. This is both correct and fair: What we want is to form "a viable community and audience" (otherwise a wiki environment is inappropriate). And what is takes to do that is literacy. No more but also no less! This meets much of the debate that was going on at the foundation mailing-list head on, and I tend to think a well though-out formulation like this will find a lot of support.

Regarding "a reasonable degree of recognition as determined by discussion" -- I think this formulation should be tightened up a bit, but I am not sure exactly how. Any good ideas? Since this is a community draft making suggestions to the language committee among others, I don't think we need to parenthetically note that "this requirement is being discussed by the language subcommittee."

I tweaked the text a bit, and added a sentence about classical languages to the existing sentence about artificial languages. Feel free to comment, modify, improve. Dovi 10:12, 27 June 2008 (UTC)

Regarding #2, the requirement for a valid ISO code, I agree and think most others do as well. The problem is interpretation, namely: Gerard's vocal objections to using an ISO code for a classical language in a modern environment. We should come up with a formulation that meets his objection. Dovi 06:46, 27 June 2008 (UTC)

I have added text to #2 that is meant to meet Gerard's objections. Please feel free to comment, modify, improve.Dovi 10:10, 27 June 2008 (UTC)
My objection is in two parts; historic content needs to be clearly be marked as being different from modern writing in the historic language. This type of information will certainly be available in the ISO-639-6. I understand that a request will also be considered by the ISO-639-3. When people do not take the effort to apply for this code, when they reject this objective argument on the basis of "too much work", "there is no need", then I am off the opinion that the proponents are not sufficiently motivated to get their language a Wikipedia. GerardM 08:49, 28 June 2008 (UTC)
Gerard, please explain what the different versions of ISO mean and why they are significant in your opinion to this discussion. No one is applying for the code because no one, besides yourself, considers it the least bit necessary. Please remember that your demands on this point were not supported by others in any previous discussions, not even by those opposed to classical languages! You are entitled to your opinions, but others are entitled to disagree.
In any case, the language will be applied "in the most appropriate way for a wiki project" even if it has not been used in a similar context before, a formulation that makes it fully clear ("marked") that it is being used in a new way and under different circumstances that its classical usage. Dovi 19:07, 28 June 2008 (UTC)

Localization requirements for final approval

The following discussion is closed: No change was made to the localization requirements.

Regarding #2:

Localisation statistics are available which describe the current availability of translations for the MediaWiki interface into different languages. The group statistics at BetaWiki are the most up-to-date information.

I have long thought that this is too harsh for small communities. The requirement here, according to the links, is for the 500 most used messages in Mediawiki. This highly exaggerates the number of basic messages needed to get started. A much shorter list, including those messages that normally appear to readers on the screen when they read articles, and those encountered in basic editing, is all that should be required to start a wiki. The community can continue to translate more of the interface as time goes on. Dovi 07:09, 27 June 2008 (UTC)

You are wrong when you suggest that this list should be shorter. We had someone who was of this opinion go over it and he came with additional messages to be added. This list is substantially shorter then it used to be. This list has not changed in quite some time, MediaWiki has substantially more messages now... GerardM 08:24, 28 June 2008 (UTC)
I think you miss the basic fact that a great many communities can get by initially with plenty of untranslated (English) messages, especially for things that are not extremely basic. They can translate the rest wiki-fashion as time goes on. The incredible, wonderful environment that you have made available in Betawiki makes this even easier than ever to do wiki-fasion over the longer term. Dovi 18:55, 28 June 2008 (UTC)
Given the limited amount of time that it takes to translate the minimal required messages, it is not onerous to require them to be localised. As there is also a requirement to create an initial corpus that allows us to check if the language is indeed the language it is said to be, a requirement that takes time, it means that these two requirements should be done at the same time. The requirement of localisation is because it is such a powerful way of engaging readers; it signals that we take their language serious. Once the initial messages are localised for the first project, we expect further work on the localisation. It is for this reason that a subsequent project requires full localisation. GerardM 07:25, 30 June 2008 (UTC)
I agree with everything you say, except for what the word "minimal" should mean. Having done this myself once in the past (for an extant language in a new wiki, when things weren't yet centralized at Betawiki), I think 500 messages is far too large a task. A "minimal" show of seriousness should mean translating well under 100 messages, in itself still a fairly large task for a beginning community. Once again, I think Betawiki is awesome, and one of the greatest contributions ever to both Mediawiki and to multilingualism. Dovi 16:31, 30 June 2008 (UTC)
I too have done thousands of translations for messages and it's an onerous work. 500 messages are no peanuts. But I guess, they are not the more complex ones, with several lines of text and full sentences, but mostly the one-worders, so 500 seems to be an acceptable amount of work.
It would be interesting to see a live wiki with exactly these 500 messages translated. For example the 500 most used messages in English and all the rest in Amharic by default. This would give a good impression of how it feels to use a wiki with basic translation only. And it would make it possible for people, who did not participate in the process of choosing the 500 most used, to give feedback about messages which should be added to or kicked out of the 500 most used. Gerard, is it technically feasible to add such an function in Betawiki (or to create an extra wiki with this feature)? And is there any page which describes how the 500 most used were chosen?
And one last point: I don't like the requirement for full translation for the second project in a specific language. If 500 most used is good for the first, it should be acceptable for the second too. I did complete translation of the core messages for nds and I have to say, it's an horror. 2000 messages or what's the total number at the moment? It's really annoying. Masses of EXIF messages which are almost untranslatable (I suspect that even many of the translations in big languages like German or French are actually nonsense, cause the translators didn't understand the exact meaning of the EXIF properties [but that's no problem, cause no user of the wikis ever reads the EXIF property section in image description pages anyway ;-)], my own nds translation may do so too, I don't know). If we want to set the margin higher for second projects, 80 % would be more realistic, I guess. --::Slomox:: >< 14:09, 3 July 2008 (UTC)

As a practicality, I suggest we leave things as they are in that the "minimal" number of most-used messages is decided elsewhere, not in this policy. However, the minimal list should of course be open to discussion and community consensus at that location outside of this policy. Dovi 05:24, 8 July 2008 (UTC)

Because of comments on the Foundation mailing list, I am beginning to reconsider this. Perhaps the policy should clearly state that community consensus in needed to determine minimum requirements. I personally do not believe that interface translation need be a requirement at all. Better that it be done wiki-fashion as the project proceeds. Do others agree? But if others feel strongly that it must be a requirement, then the quantity of work that must be done cannot be left up to individuals without community input. Dovi 14:51, 22 July 2008 (UTC)

Artificial languages

The following discussion is closed: The proposed exceptional requirements for artificial languages were not implemented.

"If the proposal is for an artificial language such as Esperanto, it must have a reasonable degree of recognition as determined by discussion." This is unworkable. There are people who are vehemently against any artificial languages and there is as a consequence no discussion because we operate on the basis of full agreement. What I would like to see is an observable norm for artificial languages. I have suggested in the past:

  • Full localisation of MediaWiki and the WMF extensions
  • 250 articles in many domains of a full length (no addition requirements like quotes as this does not say anything about the language)
  • At least 5 active contributors to the project would also be a must. When a project outside the WMF exists that is in the MediaWiki format, it would be considered as positive (Lingua Franca Nova has one) and as a substitute for Incubator activity.

GerardM 08:43, 28 June 2008 (UTC)

Agree with need for observable norm, and I think your suggested requirements are within reason, though they might be lessened. Dovi 18:01, 1 July 2008 (UTC)
I don't get tired to state it ;-) : Take the number of speakers (literate people). If a constructed language has a considerable number of speakers (1000 is the number I have chosen to be sufficient) it can easily meet (and will meet them soon, after they got their own project) the requirements named by Gerard. Cause speakers of constructed languages are much more eager to engage in the langauage than normal native speakers (that's cause they have chosen to learn that language actively and by will, which is an act of commitment). But if you use the named criteria without any speaker requirement, a constructed language with five speakers only would be acceptable if all of them decided to participate on the project. --::Slomox:: >< 13:45, 3 July 2008 (UTC)
So you are saying we should add a "potential population" figure to Gerard's list of requirements. Agree. You might want to make it "per project" as discussed below. Dovi 15:35, 3 July 2008 (UTC)
Potential population? I'm not quite sure, what is meant by "potential population". What I intended: We should add project specific numbers of people fluently able to communicate in the requested language to the "Requisites for eligibility" section. This number should be independant from the status of the language (artificial, non-native speakers only etc.). There should be no additional workload for specific types of languages (e.g. constructed languages only can get an own project, if they do full localisation of Mediawiki instead of 500 most used). Such additional rules would form some kind of discrimination against these languages. That was what I intended to suggest ;-) --::Slomox:: >< 19:57, 3 July 2008 (UTC)

All artificial languages?

The following discussion is closed: The proposal to disallow artistic languages was not implemented.

i review the requisite for ancient languages (some modern uses: in teaching, liturgical, etc. it includes languages like latin, greek, sanskrit and a few more). and i think about artificials languages. is desirable to have wikipedias in tolkien's languages?. well. artificial languages can be classified in auxiliary languages (that pretend to be used in a hypotetical human comunication, internationally o scientifycally: esperanto or lojban), and the estetical languages that unically pretends an artistical function (mainly in literature: star trek or lingua ignota for example). if the desire of the community is restricte, i think only ban estetical languages and allow the other class of artificial languages.

another idea is not to restrict, and delete the restriction of ancient language, too. what are your opinion?

i added artlangs bannig. see it.
I don't fully understand the comments here, but I don't think we are ready yet to mention "banning" anything in the text of the policy. Dovi 17:57, 1 July 2008 (UTC)

Some questions and comments

All ISO-639 codes are valid for wikis. What does this mean?

See discussion above. This is to counter Gerard's idea that an ISO code for a classical language may not be used for a Wikimedia project in that language. Dovi 02:29, 30 June 2008 (UTC)

there must be an extensive body of works in that language Wikipedia and the other Wikimedia projects are about (among other goals) bringing the knowledge of the world to the peoples lacking resources. We shouldn't exclude languages with few existing resources.

Agree. Dovi 02:29, 30 June 2008 (UTC)

The MediaWiki interface is available in that language '500 most used' should be directly named in the policy.

Agree. See discussion above, I think the number should be much smaller. Dovi 02:29, 30 June 2008 (UTC)

There is an active test project on the Multilingual Wikisource (Wikisource only), Beta Wikiversity (Wikiversity only), or the Incubator (all other projects at present). A project must start on one of these wikis. The Low Saxon Wikinews test project for example runs on the Low Saxon Wikipedia cause that's were the potential readers are. Nobody would find the test project in the Incubator and the news would be old before anybody had read them. We should recommend using these incubating projects, but we shouldn't request it as a must criterion.

Agree. Dovi 02:29, 30 June 2008 (UTC)

Artistic languages are not allowed. I don't think, this is wise. It's an criterion based on the origin of a language and not based on the state of the language. It poses no problem for any existing language at the moment. But if some Middle Earth fans decide to establish a commune where they raise their children with Quenya and Sindarin (or the black language of the Orks) as native language, it could become a problem. That's unlikely, but not impossible. And a good policy should cover all possible cases.

Agree. Dovi 02:29, 30 June 2008 (UTC)
Removing as per previous section too. Dovi 17:57, 1 July 2008 (UTC)

We should avoid making criterions, that are not decidable. Every criterion should be decidable and it should be possible to answer the question is the criterion fulfilled? with Yes or No. For example: The proposal has a sufficient number of literate people worldwide to form a viable community and audience. Some will argue that ten is sufficient, others may state that one million is sufficient. (When Wikipedia went multilingual in 2001 there were some few, who said, Wikipedia should stay English-only to concentrate effort on the most useful and widespread language. So it seems, there are even people, who think that hundreds of millions are insufficient.) Who will decide whether language X has a sufficient number of literate people worldwide to form a viable community and audience? Nobody can decide this. But if you change the criterion to The proposal has a number of at least 1000 literate people worldwide. it is perfectly possible to take some census data or research works from scholars or to just look it up on Wikipedia. It is decidable. Okay, numbers of literates are harder to find than numbers of total speakers, but it should be possible.

By the way, naming a specific number of speakers makes special rules for classical, extinct, dead, artificial, constructed, artistic, whatever languages unnecessary. Cause any language, whatever origin or whatever history it may have, should be eligible, if it has a considerable number of speakers. --::Slomox:: >< 22:06, 29 June 2008 (UTC)

Agree, but let's avoid the word "speakers". The current formulation "literate people" does much better justice the the reality of wikis. Dovi 02:29, 30 June 2008 (UTC)

Ancient languages

I guess, we could be less restrictive with ancient languages. i suggest replace the current requisite for another: if they are well or bad attested. some ancient languages are (they are viables), and another aren't, even their basic vocabulary (inviables).

I tend to think it has to do with ongoing, unbroken cultural significance. Akkadian and old Egyptian were lost until deciphered in modern times. Proto-Semitic is a hypothetical construct with nothing "attested" at all. Dovi 14:21, 2 July 2008 (UTC)
It all boils down to speakers (or "literate people"). If "number of people able to use the language in communication" is the main criterion, it's really easy. Proto-Semitic has zero users nowadays, Akkadian and Old Egyptian both have zero literate people or speakers too (there are some scholars working on those languages, I guess, who have basic abilities in expressing themselves in those languages or reading them, but they won't be fluent speakers, writers or readers). Gothic has a Wikipedia, but the number of able or fluent speakers, writers and readers is very low, clearly fewer than 100 people, I guess. Klingon, which once had a Wikipedia, had 12 fluent speakers in 1996 according to the English Wikipedia article. All the constructed languages which have Wikipedias, except for Esperanto (these are Ido, Interlingua, Interlingue, Volapük, Lojban [did I forget any?]), have very low speaker numbers. No exact numbers are known, but I doubt that any of them have more than 100 fluent speakers. Unbroken cultural significance correlates to the number of able speakers (literate people). Latin has hundreds of thousands literate people, Greek has some ten thousand literate people at least, Sanskrit too. Just ask for the number of speakers. No "unwanted" languages like joke languages, really dead languages, artistic languages, unsuccesful constructed languages will ever have speaker numbers higher than perhaps some hundreds. There are not enough "weirdos" that learn "useless" languages to catapult a "useless" language in the range of thousands of speakers. With a minimum number of 1000 speakers (literate people) you can easily sort out every "unwanted" language. Of course 1000 or 10.000 or even 100.000 speakers doesn't mean that these people are actually interested in creating a Wikipedia or whatever project at all, but that is another point. Interest in creating a project is determined by test project activity. --::Slomox:: >< 13:33, 3 July 2008 (UTC)
I never understood why the LangSubcomm considers the goal of providing (open-source) knowlege to the human race being decoupled from the "implementation" of Ancient/Classic Languages for writing compendia and 'paediae (I do mean everyone of them that has been taught as part of school-curricula; consider Archaic Japanese, Ancient Japanese, Ancient Greek (the only incubator's dead language I personally have worked on so far), Old Norse, Middle English, Middle Dutch). Allow me to cite here verbatim a comment by Massimo Macconi that appeared in the "Latina Wikipedia Closing"- discussion: "I observe that also the knowledge of an ancient language (e.g its grammar and vocabulary) is a form (I would say a precious one) of knowledge and therefore revive or maintain ancient languages does not go from my point of view against the Foundation's goals. Besides gives the possibility to connect in an easy way to a lot of Latin resources and this is, of course, another form of knowledge spread" (I italicized the "replaceable" words). To that I would like to add a somewhat related (though admiddetly weaker) argument. The Foundation's goal is providing kowledge to its users instead of demanding it as a prerequisite from them; but the way most ancient languages are documented in the Wikiprojects actually contradicts, methinks, this goal; the scattered info about them in the Wikipedia, some "glosses" in the wiktionary, a series of dry rules in the Wikiversity, and a few texts (in most cases so far) in the Wikisource (generally about obsolete or trivial matters) "lead" to the sterile archivization of this corpus of knowledge which cannot be utilized if someone is not already familiar with it, usually via uncopyleftizeable sources such as reallife lang.-composition classes or even copyrighted entertaining books (yes, I mean Harry Potter and Asterix translated to Classics). Speaking from experience, I would say that only by using a language actively him-/her-self in a variety of contexts, one can gain appreciation about the classic oeuvres written in this language and about its grammatical technicalities (let alone that in our case, we are talking about doing this the wiki way). Thus, for me the condition that (from now on, hopefully) a language has to have "a sufficient number of literate (or better, as SPQRobin proposed, fluent at it in any of its forms) people worldwide to form a viable community and audience" being the main criterion (granted of course that the lang. has a somewhat standard form, an ISO-code, and a good level of Localization) for the approval of a test-Wp-project (instead of the lang.’s deadness) can certainly "remedy" the current "condition" (it is, nevertheless, still true that writers and readers should not be deemed as entirely interchangeable concepts). Omnipaedista 21:14, 4 August 2008 (UTC)

Important normative considerations

before we approve the draft. i consider, to avoid unpleaseant surprises, add two rules:

1.- first, the rule of temporal aplication of the policy. i suggest, all the proposal that would start since this policy proposal would be effective, will put down for its rules.

2.- and in case of doubt the proposal fulfill all ther requirements. should we approve it?

I honestly don't understand what you are trying to say here. Please try to express yourself more clearly. Dovi 14:21, 2 July 2008 (UTC)
Please, see the end of the draft.
the ancient greek wikipedia experience is illustrative. requested when the policy allows old languages, declared eligible, a dynamic community of editors (now, unfortunatly and inevitably dissolved), a promising incubator (approved). then, langcom required unically meet the last requirement (the localization). and finally the sudden change of policy (not classical languages any more), and the senseless request rejection. isn't it unfair?. that is the sense of the message. what do you think?
It's my impression, that the new policy will be less restrictive (well, of course the draft is not final, I'm judging on the current state of the draft and of the direction of the discussion). For most proposals the new policy will make no difference or be more favorable. Or do you know any proposals which could be negatively affected by the new policy? --::Slomox:: >< 14:33, 8 July 2008 (UTC)

Requirements for different Wikimedia projects

When talking about how much population and how big a potential community is needed for a wiki in a new language, we sometimes forget that different projects have different levels of viability. User:Slomox addresses this nicely at User:Slomox/Languages.

Though not everything he says can be put into this draft (e.g. criticism of Wikiquote & Wikiversity as projects is not relevant here), but the basic idea is sound: A useful Wikipedia is far easier to sustain than a Wikiversity. A useful Wikisource is far easier to sustain than a Wikinews.

This should be addressed in the policy, with gradations between projects for the number of participants, extant articles, etc. Dovi 17:13, 2 July 2008 (UTC)

It is really nice when you have this academic qualification. For someone who is to administer it, it is woolly, you want clean and clear criteria. If the criteria are not clear, then the basic rules should be simple and it is then left to the administrator to use his common sense. If you want to millimeter the rules, you will lose what makes the current policy function. It is not optimal but it works. GerardM 20:08, 2 July 2008 (UTC)
Regarding "clear and clean" I agree. I am suggesting (like Slomox on his page) that we require different numbers for different projects. Numbers are clear. Dovi 02:54, 3 July 2008 (UTC)
By the way, I am indeed critical to Wikiquote and Wikiversity (in two quite different ways, cause for Wikiquote I'm sure it is not useful and for Wikiversity I just don't know what exactly is the aim of the project). But of course I only mentioned this on User:Slomox/Languages, cause criteria for these two projects are lacking in it and without precise visions of the goals of the projects I am unable to define criteria.
Well, the most obvious case, that made me defining different criteria for the different projects, is Wikinews. Wikipedia content is ageless, source texts are ageless, just like quotes or dictionary entries or Wikiversity seminaries or learning materials. At least they only outdate on a rather long scale of time. But an proverb says: nothing is older than the newspaper from yesterday. Wikinews entries are of few interest to readers after few days (depends to some degree on the concurrence. If there are many good news sources in a specific language, news will be old after some hours or the next day. If Wikinews is the only news source for lesser resourced languages, news can stay new for a week or even a month.). A Wikinews loitering in the Incubator for months or even years will most likely die off, cause nobody will find the news there. And writing news that nobody reads is really pointless and discouraging for editors. This in addition to what Dovi already said.
The projects have very different aims and our rules should reflect that.
Gerard, you are speaking of common sense of the deciding administrator (administrator in the general sense, not in the sense of sysop, I assume). I am no fan of common sense. Rules are better. (Well defined) rules are clear and decidable. But common sense brings a certain degree of uncertainty. Cause common sense is not as common as the name may suggest. I explicitly state "well defined rules", cause badly defined rules which are executed regardless of being badly defined are a nightmare (I know, what I am speaking of). But a well defined rule always beats common sense. If common sense produces a decision adversely to a user, the user will scream arbitrary decision! faulty! If a rule produces a decision adversely to a user, you have a much better position to defend. You can just point to the rule. Common sense is vague, rules are clear. --::Slomox:: >< 13:06, 3 July 2008 (UTC)
I spoke of the special situation of Wikinews. My ideal vision of Wikinews would be editions of Wikinews in e.g. African languages, that have too few speakers to have commercial newspapers and other media of its own. Some thousand or ten thousand speakers perhaps. With many monolingual speakers. Wikinews would produce news articles online. But the news would be accessible for few speakers only if they stayed online. So Wikinews would go in print. Well, basically just one Wikinewsie printing out a news digest on sheets of paper with his home pc and printer. And distributing these flyers in the neighborhood and the nearby villages, in which the language is spoken. Or perhaps doing a radio broadcast on a regional station (depends on the literacy rate among the speakers). Wikinews editions like that would be much more useful than even English Wikinews, which mostly recycles news freely available from other sources (of course I don't want to put down the work of Wikinews, it is still useful. But the English [German, French etc.] news market is full of good content, Wikinews cannot add much to it and cannot compete with professional media). English Wikines is bringing news source number 101 to the people, but indigenous language Wikinews editions can enter virgin soil in bringing news to the people.
A print Wikinews would have no problem with being stuck in the Incubator. We should keep in mind alternative models like that, when thinking about requirements and conditions for new projects. --::Slomox:: >< 20:24, 3 July 2008 (UTC)

The language must have a valid ISO-639 1–3 code?

I am not sure this is a good idea as this will prevent new simple english wikis (like the simple English Wikinews which was rejected for this reason despite a strong majority of people commenting at the request page supporting the idea) Anonymous101 16:38, 3 July 2008 (UTC)

Well, Simple English has a valid ISO code: en. The requisite says The language must have a valid ISO-639 1–3 code. This is fulfilled for Simple English. But this code of course is taken.
Well, I guess, the case is out of the scope of this policy. Cause Simple is no language edition of an existing project. It is better described as a fork of the English Wikipedia that has a different opinion about the target audience, but still being hosted by the foundation. Or something like that. I don't know. I think it would be best to ask the foundation to make a statement about the nature of this exception in the organisational structure of the Wikimedia projects. I think a clear statement by the foundation is the only possibility to sort out, whether or not we want to have simple editions of other projects or simple editions of other languages. If it is intended to have simple editions of all English projects, we should add simple as an exception to the ISO code rule in this policy. If it is intended to have simple editions of all Wikipedia editions, we should hande simple projects like another type of sister projects. If it is intended to have simple editions of all projects in all languages (I guess this is not intended), we should fit all our policies to this. If it is not intended to add any simple projects other than the English Wikipedia one, we should keep it all like it is. Simple would stay a historical exception. At the moment there is no word spoken about the future of simple projects. We only stab around in the dark. It needs a decision of the foundation on this. --::Slomox:: >< 19:45, 3 July 2008 (UTC)
We do have simple: and a few others but now no more can be created. Anonymous101 09:15, 4 July 2008 (UTC)
When a "simple" is to be had, we can technically register them as "en-x-simple". There is however a strong sentiment against "simple" projects. There have been people requesting the end of the simple English project and consequently it is not clear what the argument would be to allow for such simple projects. Thanks, GerardM 12:30, 4 July 2008 (UTC)

good draft

I guess, the draft is now good. what do we need for the langcom endorse it?

I still think it needs more cleaning up. But the bulk is good. As far as endorsing it, the only thing that might work is community support... Needs to be adverstised. Dovi 03:27, 8 July 2008 (UTC)
You are not answering the question asked. And the question if the draft is good is debatable. GerardM 05:06, 8 July 2008 (UTC)
"Debatable" -- so feel free to debate. By "good" I'm referring to the bulk, not to the latter sections that in my opinion need to be rewritten or deleted entirely. And even the bulk still needs more work.
I did answer the question to the best of my knowledge: Besides community support, can you think of anything that might get the language committee to endorse it, Gerard? Dovi 05:17, 8 July 2008 (UTC)
I don't think the draft is good like it is. In German there is the expression ein großer Wurf. I don't know it's English equivalent, but it is semantically related to Armstrong's one giant leap. For example the German Grundgesetz (German constitution) adopted in 1949 was a großer Wurf by giving Germany a working democratic system after the time of National Socialism. This draft is not a großer Wurf cause it is just a bit of rephrasing the old policy. There are no real new ideas in it. --::Slomox:: >< 13:16, 8 July 2008 (UTC)
But there are definite policy changes in it. As far as new ideas, especially your own idea regarding different numbers for different projects, please see the section I just added below. Dovi 13:20, 8 July 2008 (UTC)

What has changed

I made a compare with the policy, these are the differences at the moment:

No need of special codes for "Modern versions of classical languages".
living native speakers changed literate people.
New clause demanding ongoing literary use for classical languages.

That's it. The rest are changes of more formal nature with no effect on the outcome of proposals.

The only change with real positive value is, that we skipped the demand for native speakers. The other changes are only a clarification of what was policy already before and a rebuff of one single user's opinion. --::Slomox:: >< 13:32, 8 July 2008 (UTC)

What remains

Besides some editing of the final sections, two outstanding discussions on this talk page still remain:

  1. "The proposal has a sufficient number of literate people worldwide to form a viable community and audience" -- So should we give actual numbers for this, as discussed above, or not? And should the requirement be more or less strict for different projects? (If we open a Wikipedia for a language that has 10,000 literate people in the entire world, does that mean we should also open a Wikinews for them?)
  2. "If the proposal is for an artificial language such as Esperanto, it must have a reasonable degree of recognition as determined by discussion." As Gerard correctly pointed out, this is highly contentious because significant numbers of people are dead-set against any artificial languages. How can this be tightened in a way that is measurable and clear?
  3. We also haven't resolved the "simple" issue as in discussion above.

Maybe it would be best to suggest various possible formulations here on the talk page, and then what seems best can be put into the draft. Dovi 13:18, 8 July 2008 (UTC)

Well, if we adopt a specific number of literates as prerequisite this will affect the whole policy. Many clauses can be skipped completely then, cause they are needed no more (for example ongoing literary use for classical languages. If there is a considerable number of literates, there of course is literature too. Every classical language is written.). The same is true for artificial languages. No more need of "recognition as determined by discussion". Just count the literates. So that clause should be eliminated too.
So it's not what remains, but if we decide to do that, we have to start right again.
I support it (of course I do). --::Slomox:: >< 13:43, 8 July 2008 (UTC)
As to the "simple" issue: I don't think, we can resolve this issue here. It has to be discussed on a larger scale. --::Slomox:: >< 13:45, 8 July 2008 (UTC)
I think we should add an explicit statement that the "simple" issue is one that is undecided and left open by this policy, and will need to be resolved through community discussion in the future. I don't think we can decide it here and now. Dovi 14:19, 11 July 2008 (UTC)

"SIMPLE" Wikimedia projects,only if it is INTERNATIONAL LINGUA FRANCA

Simple version of a language only have sense if it is currently the international LINGUA FRANCA; of an indisputable and absolute manner.

You want to hear my argument? Okay: just close simple English Wikipedia. It's the easiest way. People who accept to learn a foreign language as lingua franca instead of using their native language in professional or educative spheres will learn English on a level, that makes them capable to read real English articles. For people who cannot read English well, effort should be concentrated on the Wikipedias in their native languages (or those languages they can read well). And Simple English is not distinct enough from normal English to be really useful. If it really would serve the "lingua franca" audience, Simple English Wikipedia would have more than the 31,000 articles it has at the moment. If it really was accepted by the "lingua franca" English speakers it should have several hundred thousands articles. The "non-native English language community" is in the Top Ten of the biggest language communities of the world (depending on the estimates anywhere between place 8 or even the top position), but their wikipedia is at place 44 of the biggest Wikipedias. My opinion: Just close that project. Problem solved.
But aside from my opinion: This is not the right place to discuss this. Ask at foundation-l or create a page Future of Simple projects to discuss this. Get a general votum of the whole Wikimedia community whether we want to have a family of simple projects or not. This question cannot be answered by this policy. --::Slomox:: >< 19:42, 10 July 2008 (UTC)

Requisites for all kind of wikipedias

see the proposal page. there is a little modification. remember the requisites are general for any kind of projects. wikisource in Syriac is good. the policy have to forsee this posibility

From Node_ue

If a language still has native speakers, rather than just having literate people, we should require at least two native speakers. We have had projects in the past built and run by people with only a very tentative command of the language in question, and this has led to trouble down the road (for example, mi.wp).

Also, despite constant requests on the talk page for the addition of some sort of numerical criterium in re: "potential readers" or "potential editors", I strongly reject this argument for modern natively-spoken languages. As long as a community can produce articles and has enough editors to meet the other requirements, I think it is sufficient.


While reading this proposal, I found there was some contradiction in the proposal:

  • Even if the language indicated by the code has not been used in a similar environment before, such as in the case of a language whose writing is not standardized[1] or of a classical language, the expectation is that the community will decide how to apply it in the most appropriate way for a wiki project.
    • ...with a reference to a case where people need to develop a written form of the language. This means that they are not literate in the language in question.


  • The proposal has a sufficient number of literate people worldwide to form a viable community and audience.
    • Requiring literate people, thus requiring a written form of the language.

The only case I see is where people are literate in a second language. Though this 1) can still exclude language and 2) seems pointless to me.

Please clarify me if it is not a contradiction.

Greetings, SPQRobin (inc!) 00:19, 20 July 2008 (UTC)

I understand literate as people able to express themselves properly in that language and able to understand that language. Not necessarily written language. A Wikipedia completely consisting of spoken articles or maybe signed in sign language on video would be useful too in my opinion. In my opinion literate expresses, that we don't want abs-1 speakers being the only contributors to the Ambonese Wikipedia. Their language skills should be fluent and not just basic. Not necessarily native, but a contributor should be able to go to a Ambonese (or whatever language) village market and talk to the women at the booths without stammering or speaking with his hands and feet. If we need a different word to express that, I'm fine with changing it. Perhaps fluent user of that language? --::Slomox:: >< 12:03, 29 July 2008 (UTC)
I agree, but according to Wikipedia, literacy is "the ability to read and write, or the ability to use language to read, write, listen, and speak". So I changed the description. SPQRobin (inc!) 12:41, 29 July 2008 (UTC)

ISO 639

It seems to me that ISO 639 already tries to not include language variations, thus criteriums 2 and 3 can (and I think should) be combined. My proposal:\

  • The language should have an ample existing body of written texts
  • The language should be sufficient distinct from other languages, not a language variant
As a default, we consider a language to have these two properties if and only if it has an ISO 639 code. Exceptions in either direction can be made if there are particularly good arguments.

- Andre Engels 07:45, 20 July 2008 (UTC)

The current policy is that a language must have an ISO 639 code in any case, exceptions can't be made. I think that requirement should stay.
But indeed, criterium 3 could be removed, it's mostly based on the ISO 639 code anyway.
SPQRobin (inc!) 11:38, 20 July 2008 (UTC)
I agree, that an ISO code should be required. But I don't think, that an ISO code is sufficient to accept a project. If we would go after code only, we had to split up the nds-nl project into seven projects, cause the nds-nl language area is represented by seven different codes. But the nds-nl Wikipedia project does very well with no problems in understanding the seven respective varieties. The ISO distinctions obviously are overly fragmented. ISO knows eleven codes for Low Saxon varieties, but for Wikipedia a division into three different wikipedias would be sufficient to perfectly serve all readers. --::Slomox:: >< 13:47, 29 July 2008 (UTC)
While some insist that the existence of an ISO639 code should be an absolute pre-condition for accepting a project in a proposed language, this seems to be in the form of a categorical statement that does not provide any reasoning about why that should be the case. Acceptable projects without an established code are likely to be rare, but we should be able to proceed in some circumstances without having to resort to external bureaucracies. Eclecticology 06:31, 3 August 2008 (UTC)
I'm sure there has been reasoning stated before about why an ISO 639 code. Its most attractive feature is that it's an absolute shutdown to a thousand and one bogus requests. It's like being able to use a speedy deletion criteria instead of AfD in Wikipedia. I can't imagine a viable project coming from a language that can't produce the minimal level of documentation needed for an ISO 639-3 code; we're not talking about Elvish or Klingon here, as they're sjn, qya and tlh, and we're not talking about any modern native language here, as ISO 693-3 is pretty exhaustive on that level. It's not terribly comprehensive on historical languages. If you don't have an ISO 639-3 code, you're talking an obscure (or new) artificial language, historical language, and weird dialects of an existing language.
I think getting an ISO 639-3 code for those is probably far easier than getting a Wikimedia project set up. In the case we actually want to set up an Wikimedia project in something that can't get an ISO 639-3 code, I see nothing wrong with going through the bureaucracy to set aside for this guideline, and I think that level of effort is going to be--or should be--necessary to set up such a project in any case.--Prosfilaes 02:21, 15 August 2008 (UTC)
The most negative aspect of the proposal is that it is an absolute shutdown to the thousand and second request that is not bogus. If that legitimate request is so difficult for you to imagine then you have nothing to worry about. If it proves itself legitimate, the outside bureaucracy can come later. Eclecticology 20:02, 16 August 2008 (UTC)
Removing this requirement does have costs, though. It's an incredibly clear and unambiguous requirement, which means that proposals that fail it can be dropped quickly and with little arguing. You give no idea of what type of proposal might fail this requirement and yet be a valid Wikimedia language. It's not a natural language, and it's not one of the major constructed languages. Yoda-speak or Simple French? Maybe one of the more obscure historical languages? I think a good thing that these proposals, that Yoda-speak gets absolutely shutdown at the start. They can either explain up front why, despite being unable to meet this simple requirement, they deserve a Wikimedia project, or the shit can wait to hit the fan until interwiki links are showing up on English Wikipedia, and then try and prevent their project from going the way of the Klingon and Toki Pona Wikipedias. (One of which would have failed this requirement.)
Furthermore, I think your argument is illogical at its heart; any requirement can be attacked by asking "but what of the worthy projects that fail it?" Without specifying what form this worthy project might take, it's pointless. I see no more reason on those grounds to reject this requirement as opposed to the other two.--Prosfilaes 06:02, 17 August 2008 (UTC)

Btw, I added the sentence "The ISO 639 codes already tries to exclude dialects, so in most cases this requirement has already been fulfilled." SPQRobin (inc!) 09:06, 5 August 2008 (UTC)

Classical languages

In the classical languages there is not to be a wikipedia, but why do have a wikibooks? To me they are quite similar in their intended content. Also, I would think in this case a wikisource, wikiquote and wiktionary would have a restricted scope - only works, quotes, words that are from the language itself. - Andre Engels 07:57, 20 July 2008 (UTC)

a little modification has been done. see it. it could be improved
Judging by Vicilibri, specialized dictionaries containing translations of (modern or not) terms from various (modern or not) languages to the classical language (glossaries, in other words) at hand might well be the purpose of Classical Wikibooks. Ideally, the task of glossary-compiling should have already been undertaken by the corresponding Wp-project or Wiktionary, but in practice this is not the case. So either the LangSubcomm should allow Wikibooks in ancient langs or the Wikibookians should (in general) be "admonished" to compile this kind of dictionaries (from ancient langs to modern ones and vice-versa) with the help of the Wiktionary; e.g.: Old Prussian to Pāli or Middle Japanese to Ottoman Turkish. In either case, this would be a very useful reorganization of the linguistic corpus already containined in all the various Wikiprojects. Omnipaedista 15:15, 5 August 2008 (UTC)


The most used MediaWiki messages for the first wiki is good. But for the second wiki, they need to translate all other MediaWiki messages and the extensions used by Wikimedia.

I propose:

  • first wiki = most used MW messages
  • second wiki = other MW messages
  • third wiki = extensions used by WMF

So they have more time for translations. SPQRobin (inc!) 15:04, 24 July 2008 (UTC)

I agree. more flexibility is good
I agree that this is an improvement. But why should less-used messages be a requirement for setting up a wiki at all? Better to let people develop a community, which can translate the messages as time goes on. By making it a condition, you will never have the community to translate those messages... Dovi 07:53, 25 July 2008 (UTC)
This assumes that the core messages are more important to a wikipedia then the extensions... What nonsense. The SUL messages for instance are essential from the start. Consequently totally rubbish GerardM 10:09, 25 July 2008 (UTC)
I know that some extensions are more important than some parts of the core. But as far as I know there aren't any clear message groups to define this. I don't think it's a good idea to say "For the second wiki, translate messages X, Y, Z, etc and extensions X, Y, Z, etc and for the third wiki translate messages X, Y, Z, etc and extensions X, Y, Z, etc." Or we can switch the second and third wikis, but than we are probably also excluding important parts of the core... SPQRobin (inc!) 23:33, 25 July 2008 (UTC)
As it is clear how arbitrary this all is, I think this idea stinks. GerardM 07:44, 26 July 2008 (UTC)
Could be, but I can imagine that many people starting a wiki really don't like to translate hundreds or thousands of messages. More than 2 thousand messages have to be translated for a second wiki (see below). Would you like to translate them, just to open one wiki?
Maybe we can create two groups:
  • one of important extensions that users often see and the largest part of the core messages
  • a second one for extensions used by only some of the WMF wikis (and those like Oversight, CheckUser, ...) and core messages like those of RevisionDelete, and the (hundreds & almost unused!) EXIF messages and perhaps the "right-" messages and perhaps the "import-" messages.
Anyway, the current situation stinks more than any of these proposals. Certainly when I take a look at the EXIF messages... SPQRobin (inc!) 23:11, 26 July 2008 (UTC)
You have one thing wrong. When a first wiki exists, there is a need to continue with the localisation. This need exists because it is this that makes a Wiki usable / readable. In essence a second project should not have to do much localisation work at all. It should have been done already. It is only when the first wiki does not keep up the necessary localisation work that it may be considered a huge burden. GerardM 10:18, 27 July 2008 (UTC)
Yes, I totally agree, but
1) It seems to me that people are not enough informed (actually "adviced") that they should continue the localisation. I think once their wiki is created, they will automatically stop translating and then focus on the wiki
2) And nevertheless there still are too many messages which they rarely (or even not) get seen, as I said above. So the benefit of some parts is too low. SPQRobin (inc!) 16:04, 27 July 2008 (UTC)
I have 4500 edits on Betawiki (although I had translated almost all messages already before I started using Betawiki directly via the message file [well, it was Mediawiki version 1.4 or something like that then, so few messages from those times remain in the current system]). And I am still not finished with my nds translation. One third of the extension messages is still absent (after the removal of Flagged Revs, before it was much more). After more than three years of continously working on the translation. Of course I could have finished it much faster, but it's really not a fun job. I needed hours of research to investigate the actual meaning of some obscure EXIF messages (and I'm quite sure that many of the existing translations of those EXIF messages in other languages do translate the words of the message, but not the actual technical meaning [who has the technical knowledge to understand what MediaWiki:Exif-spatialfrequencyresponse is about?]). I'm a Wikipedian and Wikimedian since 2004, but in all that time I had a look at EXIF data about, I don't know exactly, perhaps 10 times, maybe it was even 20 times. At least it was not very often. And when I had a look at EXIF data, it was always for the date of creation or perhaps one or two other important fields. But never in my whole wiki career I saw anything like "Spatial frequency response" or "Color sequential area sensor" or most of the more than 200 other EXIF messages. It's useless to require them to be translated. The one who created the list of the most used messages should post the data of the frequency of use of all messages. That would make it possible to define a group of "less relevant messages" which could be excluded from the required messages. Requiring more than 2000 messages (2500 if most-used is not finished [which is the case for many existing projects]) to be translated is too much. --::Slomox:: >< 11:46, 29 July 2008 (UTC)
That reminds me of my experience with the Ancient Greek project. There are an awful lot of messages to translate; for smaller languages translating the interface is probably harder than creating a good Wikipedia. Before a first wiki in a language is created there will only be a few editors (editors will come in greater numbers after the first wiki in a language is created); having to translate all of those messages is really too much to expect and it is likely that before they are all done many editors will become bored or worn out, subsequently (and understandably) losing interest.
I think that it is more realistic to require only that the "Most used" messages are translated before a Wikipedia is created. When a Wikipedia exists there will be a large enough community to get all of the messages and extensions translated. If we require thousands of messages to be translated first, many projects with great potential will be killed off before they begin (aborted, perhaps), as editors will lose the will to continue with the work after spending months or even years translating rarely-used messages that many of the larger languages (e.g. Korean) without seeing the fruits of their labours. LeighvsOptimvsMaximvs (talk) 22:10, 31 July 2008 (UTC)


As of today (26 July), this are the number of messages:

  • Extensions used by Wikimedia: 666 msgs
    • (Flagged Revs - recently removed & not counted here: 252 msgs)
  • Core: 1946 msgs
    • Only most used: 486 msgs
    • Core without most used: 1460 msgs
    • EXIF: ~ 220 msgs
  • Core without must used and extensions used by Wikimedia (Second wiki currently): 2126

Put a number of literate people

let's put a number of literate people, number that permited at least one wikiproject can succeed.

i tentatively suggest one thousand of literate people worldwide.

fixing a number of literate people, we can solve all the problems of interpretation of the policy.

If the number is one thousand, we would be excluding too many languages. There are really many languages with less than thousand worldwide speakers. SPQRobin (inc!) 00:10, 20 July 2008 (UTC)
Agree. Plus all of these numbers are arbitrary. Do we really need to suggest concrete numbers at all? Dovi 15:22, 22 July 2008 (UTC)
well, not a specific number. but why not a percentage in native languages?. At least the 10% of the speakers of a native language must be literate, it would form the necessary critic mass to make a project a success.
All numbers are clearly arbitrary. But we have the choice between either arbitrary specific numbers or arbitrary personal judgements of the people processing the proposals for new project. I like arbitrary numbers better than personal judgement.
A number of 1000 would exclude many languages. That's true. But obviously very few of these languages are able to create a meaningful encyclopedia. My proposal at User:Slomox/Languages explicitly states, that languages with fewer speakers or literates are not excluded per se, but that we resort to common sense if the language has good arguments to be added but des not meet the criteria. At the moment there is just one active project (well activated is better to say than 'active', the project has no activity but bot edits at the moment) with less than 1000 speakers, that's Norfuk Wikipedia pih:), Norfuk having about 600 speakers. The main advantage of a specific number is that we can easily exclude all types of unwanted languages like artificial and extinct languages which usually have less than 1000 speakers.
If a language has less than 1000 speakers its going down the road to extinction. A language facing extinction will hardly produce a useful Wikipedia. If their is real effort to keep the language up and use it actively and maintain it, then they will surely create a useful test project, which would be a good argument to make an exception to the 1000 speaker rule. --::Slomox:: >< 10:48, 29 July 2008 (UTC)
The requirement of literate people contradicts another statement in the draft: "Even if the language indicated by the code has not been used in a similar environment before, such as in the case of a language whose writing is not standardized[1] or of a classical language, the expectation is that the community will decide how to apply it in the most appropriate way for a wiki project."
See #Contradiction? about this. SPQRobin (inc!) 11:32, 29 July 2008 (UTC)

a few observations

  1. While having an ISO639 code is great as a prima facie criterion, it should not be an absolute one either for or against having a language. Expecting the person proposing a project in such a language to "obtain one" is a perversely unrealistic expectation if it means that the person must navigate his way through international bureaucracies. In practical terms most requested projects will already have such codes, so in most cases this will not be a problem.
  2. Having a body of literary works, except in the case of Wikisource, is not a meaningful criterion. Languages are mostly determined by their spoken form, not by their written one. For most languages languages the written form was only an afterthought grafted onto the language by European colonizers and proselytizers.
  3. Simple (and perhaps other) forms based on a definable subset of a language should be viewed as valuable additions to the general educational mission of Wikimedia. Again, I expect that these will be very few, but they still provide an important service if they have an editing community that is sufficiently strong. Eclecticology 06:44, 30 July 2008 (UTC)
  1. I disagree... Please discuss this at #ISO 639. And btw, obtaining a code is not "international bureaucracy", you must simply submit a form (though, there are disadvantages: it's only in English, many people complain about that, and it takes a very long time (about one year!) before your request is processed).
  2. The requirement of literary work is only for classical language (see # 4).
  3. Well, if the requirement of an ISO 639 code stays, this won't be possible... And I also disagree here, because I do not see much difference between "simple" English and the normal English Wikipedia. And then there were requests for e.g. a British English Wikipedia; at the end we will have dozens of English Wikipedias! SPQRobin (inc!) 07:38, 30 July 2008 (UTC)
1. I have added comments in the referenced section, as requested. You say that it is not a matter of a bureaucracy, but then proceed to describe a bureaucratic process. Perhaps then, if we go so far as to find adequate other reasons for having a project in an otherwise uncoded language, the language committee should undertake the necessary paperwork to establish a code for the project in question.
3. Not necessarily. "Simple" would be based on a language4 subset with relevant subcodes. The subset could be easily defined (e.g. based on the 1000 most common words of the language, or a Japanese without Kanji). There are valid pedagogical reasons for such subsets. The important thing for subset projects is definability. Even with a separate ISO code for British English I don't see much support for such a project. Can it even be easily defined without endless arguments about what is or isn't British English? Your mention of "dozens" of English Wikipedias seems to be based more on alarmist speculation than any kind of factual basis. For now, I find it difficult to imagine any viable English subset other than Simple, but I prefer to retain an open mind to the possibility that someone might just present an interesting proposal. Eclecticology 07:02, 3 August 2008 (UTC)
in fact, if we want to speak about a defined english subset, it is Basic English
If we cannot express in our meta codes what language it is, we have a problem. The way to solve it in a credible way requires cooperation with the relevant standard bodies. The current practice of using ISO-639-3 codes for our new projects removed the discussion whether a language exists or is a mockery like we had with our Siberian Wikipedia disaster. There are currently 7000+ languages recognised in the ISO-639-3, the ISO-639-6 allows for over 30.000 linguistic entities. The ISO-639-3 allows only for languages and this is what we concentrate on. When the discussion is about is what we propose a language, you get into debates that are highly political, that are not what you can reasonably expect a committee of the WMF to pronounce on. There is a body that does have the expertise to determine these things and as it removes a lot of the needless discussions and POV warring from the WMF, it is best to retain the requirement for recognition under the ISO-639-3. Thanks, GerardM 06:13, 16 August 2008 (UTC)


What is needed for langcom endorse the draft? All the draft?

and, if the current one is still incomplete. it couldn't take the clauses that are accepted for the community, now? for example: replacing "native" requirement for "fluent expression". it is possible to implement the community consensus, staggered, as they appear?

I don't know anymore where, but I remember that Pathoschild (langcom member + started this draft) said the community can decide about implementing this. Though, I haven't seen a time frame or something. SPQRobin (inc!) 02:12, 8 August 2008 (UTC)
This is at best a private iniative by Pathoschild. It is not at all accepted as such in his way by other members of the langcom. I know several members who see it as a way of him to bypass the need for consensus in the committee. Thanks, GerardM 07:13, 10 August 2008 (UTC)
So the langcom has all power over creating new wikis... I thought Wikimedia was a community-based organisation? If that's true, the consensus among the community should be more important than the consensus among the few members of a committee. I hope I don't sound offensive or something (I can't find a correct word..), but the current situation just seems a bit strange... Greetings, SPQRobin (inc!) 15:39, 24 August 2008 (UTC)
GerardM and Pathoschild rarely agree on anything. They also happen to be the two most active members of the committee. I think that explains a lot.--Shanel 19:19, 24 August 2008 (UTC)
The language subcommittee is an advisory body to the Board of Trustees. The subcommittee recommends approval to the Board, and thereafter notifies system administrators about projects ready for creation. The subcommittee is not absolute; system administrators can create or refuse to create wikis without subcommittee approval (and this is occasionally done for special wikis), and the community may contact the Board of Trustees directly for approval.
The subcommittee reports to the Board, but is not directly subservient to the community. However, the subcommittee members are primarily community members, so that a community consensus without subcommittee consensus is very unlikely. Should it occur, the Board of Trustees can implement policy itself. If there was consistent community consensus for an objective policy, of course, there would be little need for a subcommittee in the first place.
(As for GerardM's statement, I know only of GerardM who sees it as "bypass[ing] the need for consensus in the committee". I think other members can speak for themselves.) —{admin} Pathoschild 19:27:11, 24 August 2008 (UTC)


So, we have some requirements for eligibility. Then we have two major requirements for final approval: Active test project and a localisation. The first one is quite obvious. But the second one is disputable. Some want more to be translated, and some want less to be translated. Therefore, I propose we make a little sub-policy which can be changed independently of the other requirements. We can for example then first decide to implement this community draft, but not yet include the localisation requirements. I've at least made a draft for that: User:SPQRobin/Sandbox. I hope this draft (both, actually) will be implemented soon... SPQRobin (inc!) 23:56, 9 August 2008 (UTC)

There are projects that do really well localising in their language and there are languages that do not. Several of the bigger languages do not do as well as languages like Southern Baluchi. There is a distinct benefit for having a good localisation and this benefits those that we do this all for.. our readers. We support our editors by only requiring a subset for the first project but it is in the interest of the first project to maintain the drive for the localisation. When a complaint is made that the localisation has not been completed when a subsequent project is requested, the problem is not with the policy but with the lack of involvement in the localisation in the community of editors. Thanks, GerardM 06:03, 16 August 2008 (UTC)

Arabic, a non native language

Reviewing the requirement of current policy, i thought in standard Arabic language, and the inevitable consequence is: this language cannot meet the requirement. Standard Arabic isn't speak anymore as first language. it's based in Religious Arabic language, it's archaic, and it is necessary to learn in school to understand it. its situation is similar to medieval Latin. Then, reject any project in this useful language.

on the other hand, there are several native languages, all daughters of classical Arabic, like Egyptian Arabic (or Masri; ISO 639-3: Arz), whose proposal has been approved, precisely for its native condition.

This is bogus. There is a standard Arab code and it is spoken by people in their daily life. GerardM 06:04, 16 August 2008 (UTC)
it refers to code below, the language of written documents in Arabic world. if we want write about Physics, it is the language used. and it is non native at all.
I am afraid this is a valid argument, nothing bogus about it. Modern Standard Arabic (which is equivalent to 'Classical Arabic') has no native speakers. It is used in written communication, in the media, in speeches and the like, but there is not a single Arab who has MSA as his/her first language or indeed uses it in everyday life. People's mother tongue is invariably a dialect, which is not just a different way of pronouncing MSA with a couple of slang expressions thrown it, but roughly as remote from MSA as Dutch is from German.-- 13:55, 8 November 2008 (UTC)

Fluent expression is already consensual

What clauses or criteria are already consensual?

  • "Fluent expression" instead "native" requirement. it is already consensual. there is not real opposition to this clauses, and no arguments against it.

i you consider another consensual criterion, you can add them. What is the goal? We can require langcom to implement them, of a staggered way, as they appeared.

The main aim of the wikimedia project is to provide information and help learning. Information received through the medium of one's non-native language is as good as that received in one's mother tongue. What matters is not how many native speakers a language has, but how many people can understand it well enough to get something from it. I have said on another page that if every first-language speaker of English disappeared tomorrow, the English wikipedia would still be very useful, since there would still be people who could use it and get as much from it as a first-language speaker of English could. However, according to the current policy, if every first-language speaker of English were to die tomorrow and an English language wikipedia had not yet been created, a proposal for it would fail to meet the criteria. LeighvsOptimvsMaximvs (talk) 21:44, 19 August 2008 (UTC)
"if every first-language speaker of English were to die tomorrow and an English language wikipedia had not yet been created" doesn't strike as an interesting alternate universe; if an English language Wikipedia had not been created, there probably would have been reasons for it not to be created. People go to non-native Wikipedias primarily because they are larger and more comprehensive than their native tongue. There's nothing out there without a Wikipedia that has a chance to be useful in that way; people will continue going to one of the top 25-30 Wikipedias.--Prosfilaes 01:30, 20 August 2008 (UTC)
A rhetorical alternate universe does not have to be a plausible one ;-) ! The point is that the current language proposal policy is wrong in saying that a language needs a living native speaker community for it to have a viable audience. A viable audience consists of anyone who understands the language and wants to read Wikipedia articles in it. It is not important at all whether those people have the language in question as their first language. If there were no native English speakers there would still be a viable audience for an English Wikpedia. There are no native Latin speakers, yet there is still a viable audience; the Latin Wikipedia has many articles which are better than those in all but the largest wikis. Esperanto has hardly any native speakers, yet has a larger Wikipedia than languages as widely spoken as Arabic or Korean. Clearly a viable audience can be entirely made up of non-native speakers. LeighvsOptimvsMaximvs (talk) 00:41, 26 August 2008 (UTC)
A completely illogical alternate universe is a lousy rhetorical universe. IMO, all the non-native languages that have a viable audience have been created, and it's a safe and useful requirement to require other languages to get their Wikiprojects by special procedures.--Prosfilaes 18:29, 27 August 2008 (UTC)
I agree 100% with Prosfilaes. The majority of English speakers in the world are secondary speakers, not primary native ones. If all native english speakers were wiped off the map English would still be one of the largest spoken languages on earth.
The native requirement is, I think, not helpful. It's easy to calculate, and I understand that makes it attractive on the part of decision-makers who want easy and direct metrics to use for comparison. However, the numbers we most need to look at are the amount of potential producers and consumers in a language. Producers are people who write the language and would be willing and capable of contributing to our project. Consumers are people who would be willing to obtain information from the project in that particular language.
Consider Arabic, where even if there are no primary speakers (I don't know this to be a fact, I'm just referencing a conversation above) it's still a communications vector through which people of diverse arabic dialects can communicate in common. Latin might be in a similar category, depending on how many people are fluent in it from religious usage. Esperanto too can help groups intercommunicate who otherwise would not be able to. --Whiteknight (meta) (Books) 13:54, 24 August 2008 (UTC)
It is said that the requirement for native speakers is no longer required. It is said to be a given that there is a consensus on this. This is however not the case. Consequently all follow on arguments are a house of cards.
When you want to have it demonstrated that sufficient ability exists for a language to be used in a Wikipedia, the quality of the Incubator may provide enough of a corpus. Currently the effective requirement is that the texts are of a sufficient size so that we can deduce that it is indeed that language. With this requirement added, there will be a need to provide bigger articles that demonstrate the ability to articulate the language in a more quantitative an qualitative way.GerardM 08:43, 26 August 2008 (UTC)
To be pedantic, even currently there is no absolute requirement for native speakers. The current requirement (in the case of proposals for a new Wikipedia) is for a language either to have native speakers or to be artificial. Andrew Dalby 20:17, 29 September 2008 (UTC)

How to measure

The subcommittee is currently divided on the change, because it would be very difficult to objectively measure it. Has there been any discussion regarding how "a sufficient worldwide number of people able to express themselves at a fluent level" would be measured? —{admin} Pathoschild 15:05:02, 24 August 2008 (UTC)

The sufficient part is unmeasurable. But sufficient is part of the current policy too, so that can't be a problem. people able to express themselves at a fluent level can be measured. Native speakers are people able to express themselves at a fluent level per se. If there are fewer native speakers than what would be sufficient, the proposers have to give sources about the number of non-native, but fluent users of the language. For example numbers about people teaching those languages (I guess it is fair to assume, that teachers are fluent). --::Slomox:: >< 22:53, 25 August 2008 (UTC)
You propose we continue using native speakers as a metric, but allow requesters to provide reliable sources about fluent speakers as a last resort? —{admin} Pathoschild 00:10:54, 26 August 2008 (UTC)
Remember the complete clause is "The proposal has a sufficient worldwide number of people able to express themselves at a fluent level, in the written, spoken or signed form". so the fluent expression refers in any of those forms. In the case of modern spoken languages not just the native speakers, also people that understand them as second, third, etc. languages; and people that understand them only in the written form (there are many people that understand written english but they are unable for participating in a simple conversation; despite this, english is still useful for them!). In a classic language people that are able to read or write the language (teachers or advanced students, clergy, etc). And please remove "native" as reference. unsigned by 03:40, 26 August 2008 (UTC).
But nobody has explained how we'll count fluent speakers yet. Even if we assume every test project editor is fluent, that would only tell us that there are at least three or four fluent writers in the world, which is surely not a "sufficient worldwide number". Assuming native speakers are fluent (native speakers are routinely measured in databases such as Ethnologue) solves the problem for most living languages, but other languages (which the change is intended to accomodate) are left hanging. —{admin} Pathoschild 14:10:03, 26 August 2008 (UTC)
Written languages, as ancient, can be measure easy, the numbers of teachers or scholars (active or retired) and students in a country can be measured for example with official educational statistics. Religious languages, for their number of clergy. unsigned by 03:40, 26 August 2008 (UTC).
You forget that many languages are not taught in school. GerardM 18:46, 26 August 2008 (UTC)
Well, languages with no native speakers, not taught in educational institutions and not used by religious clergy are unlikely to be used by anybody. Perhaps except constructed languages, but constructed languages in most cases have very tightly woven social networks and the community usually knews well, how many people are fluent in the language. --::Slomox:: >< 22:42, 26 August 2008 (UTC)
There are written languages with a large number of native speakers who learned the oral language as a child, but who may never have learned how to write in that language. An example will be for immigrants to, say, the US, where the language continues to be spoken in their home for at least a few years, but who never actually received an education in it, formally or informally. It would also apply to children born in the new country where the language was used at home for their early childhood, but very little later on. Both possibilities were in fact the case for various of my relatives (with respect to Yiddish). I also know people where this was the case for Italian. DGG 00:18, 5 September 2008 (UTC)

Ancient Greek

Languages that are considered dead like Ancient Greek would still be a problem because they need to express with the vocabulary of this language. Adding new words to these languages will invalidate them as being understood to be these languages. Thanks, GerardM 08:43, 26 August 2008 (UTC)

This may be true for most ancient languages, but the Ancient Greek language, since so much academic work was done in it, does have sufficient resources for a Wikipedia. Sometimes we have to resort to slightly awkward phrases (as do many smaller modern languages), but not to anything that is not Ancient Greek. LeighvsOptimvsMaximvs (talk) 20:58, 26 August 2008 (UTC)
Can you cite some of that work? Esperanto has w:Monato, among others, and Latin has the continued support of the Vatican, but I'm not aware of any significant modern original writing in Ancient Greek.--Prosfilaes 18:29, 27 August 2008 (UTC)
I was referring to Greek writings from the Classical, Hellenistic and Roman periods, amongst which there are many more works in advanced mathematics, science, technology and the arts and humanities than in many modern languages, and far more (to the best of my knowledge) than in any other ancient language: a few examples are the works of Aristotle, Euclid, Philo of Byzantium and Hero of Alexandria. One can write about atoms, elements, various kinds of machines and all sorts of other subjects in Ancient Greek (I managed to write about computers and a little about televisions without using any words that are not attested in Liddell Scott's Greek lexicon; the word μηχανος-machine/mechanism comes in very handy). It is unique amongst ancient languages for its ability to communicate information about the arts and sciences thanks both to the amount of work written in it (and the nature of that work) and its unusual flexibility. And, apparently, more has been written in the language in modern times. LeighvsOptimvsMaximvs (talk) 21:28, 28 August 2008 (UTC)
Of course, one example: the outstanding book "Astronautilia" 6575-line science fiction epic poem, an odyssey in classical Homeric Greek, written by Jan Křesadlo. Crazymadlover
Certainly an interesting artistic work, on the order of the version of Blue Suede Shoes in Sumerian, but hardly a work to show that the basis for a serious non-fiction project has been laid.--Prosfilaes 20:18, 27 August 2008 (UTC)
Well, there are news sites in ancient Greek as Acropolis world news and another from Greece, i remember. and if we talk about neologisms, the famous "lexicon recentis latinitatis" doesn't brings any new. when refers a new concept uses huge circumlocutions, large phrase (those aren't neologism!!!); or simply loan the foreign word. disappointing!!!. Ancient Greek always has the capacity to built complex words when the mean is clear, Latin doesn't. Ancient Greek makes better. Crazymadlover
There is modern academic (or at least scholar-like) work published in Archaic Greek, actually. Greek Eastern Orthodox documents and theological texts are all written in a mixture of Koine and Katharevousa Greek (just as Catholic documents are written in a mixture of Classical, Vulgar, and Neo-Latin); moreover there are numerous philosophical/mathematical/scientific dissertations by Greek scholars that are written (from the 1790s untill today) in Katharevousa and contain many established neologisms for modern concepts that are based-on/compatible-with Koine Greek, so practically all modern philos./scientif. terminology is translated and ready to use. The "archaicity" of the language they use varies a lot, of course, but one could say that it resembles the relationship between Archaic Bokmål and Old Norwegian or that between Neolatin and Classical Latin, but still it serves the purpose we are talking about. It is also true that nowadays only writers with a theological or right-wing agenda usually write in it, but this also irrelevant, since in an encyclopedia only the fact that a sufficient vocabulary exists matters, not the ideology that may have produced it (exactly as Revived Latin is historically related to the Catholic Church but Vicipaedia has nothing to do with its agenda; see also Icelandic, Flemish, or Revived Sanskrit neologisms that originated from a purist ideology but serve the purpose of being usable in a 'pedia). I can also cite some very interesting articles of the Grecan part of the Incubator dealing with very modern and technical concepts, where Late-Attic/Early-Koine Greek proves itself to be very sufficient and self-contained. Finally, I reserve the right to postpone a little bit more any links or citations to the modern scholarly works in Archaic Greek, I've been talking about, untill my next post (alas, the most serious of them are scientific papers lying in Greek University Libraries waiting for their digitalization, thus it is a bit hard to cite them right away in a public debate).
Omnipaedista 14:59, 28 August 2008 (UTC)

Please see this request. Crazymadlover

The links I promised; ideological: [[1]] (Koine Greek messages from the Greek Ecumenical Patriarch to his flock), [[2]] (Katharevousa Greek message of an ex-President of Greece to his followers, published in a Katharevousian blog); scholar (sites are in ModGreek): [[3]] (the Programme of the Greek National Foundation of Research for the digitalization of Medieval and Katharevousian Greek scholar works and their internet-wise accessibilty), [[4]] (the official site of the Academy of Athens; AoA regularly publishes official neologisms for modern scientific concepts in both Kath. and Dhimotiki, eg. alcohol > alkoole, black hole > melane ope, etc. There is also this report on the corresponding Literature [[5]]). As for the practical application of the latter, see my [[6]] mathematical articles in the grc-Incubator, where most of the language is Attic/Koine but when modern terminology was needed, I used the conventions of the Academy which are, in general, compatible with the Archaic semantics. Thus, I believe that the only reason for the "Grecan" 'pedia to be still on "stand-by" is the requirement for native speakers because of the current policy. I only hope that the recent rejection of the request for opening the project (for the third time) does not constitute an infamous precedent for new language proposals in general. Omnipaedista 20:14, 3 September 2008 (UTC)

Neither medieval or Katharevousa Greek are Ancient Greek (grc). ISO 639-3 has a pending and probably successful application for Medieval Greek as grm[7] and Katharevousa is clearly modern Greek.--Prosfilaes 16:36, 4 September 2008 (UTC)
contemporanean latin, could be consider not as ancient as "classic latin". the question is that both languages have a millenarian tradition, they are used to write documents. Medieval Greek and Katharevousa are the continuing of this archaic tradition, it is not spoken as native by anybody, is the archaic language with neologisms. Standard modern greek is the demotic (vernacular, familiar or native) greek with some katharevousian loanwords. in Katharevousa tradition, there are writers that his style could classified as Koine-like and others as Attic-like, the degree of archaicity. The basic idea of katharevousa is write in the way that Xenophon or Plato can understand the text if they would resuscitate.
The first link demonstrade that ancient greek is still the language of Greek Church
I disagree with the proposal of a separated code for medieval Greek. It is basically the ancient Greek for the use of the Byzantines, as medieval latin was for the medieval western Europeans. another thing is the languages of "Chronicle of Morea" that was the real vernacular spoken language, different of that academically is called "Byzantine or medieval Greek" (archaic). Crazymadlover
Meseems I wasn't very clear as to why I gave the links above. The first one indicates that Koine is still in use by the Greek Church which uses only medieval Greek (gkm) pronunciation, not medieval vocabulary (which is heavily based on the Vulgar Latin) or grammar (which is much simpler than the grammar of Koine). The rest of the links just indicate that the coining of well-established Archaic Greek neologisms for modern concepts never ceased to exist since the emergence of Katharevousa in the 1790s, so if someone wants to write in Attic or Koine about a very modern concept that didn't exist before the Middle Ages, (s)he doesn't have to coin a term for it him-/her-self but (s)he can use the corresponding Katharevousian one which as I said above fits the ancient semantics either as it is or with slight modifications which are thoroughly discussed in the corresponding Talk-pages of the grc-Project (whose articles, by the way, are all written based on the traditional pronunciation, morphophonology, and syntax of Late Attic/Early Koine, with the addition of some vocabulary from the Aeolic dialect when it is absolutely necessary). As for the question of whether scholar works are written in grc any more (apart from the theological ones), as I said, there are doctoral students who write in Kath. but in some cases their language is so archaic it is unintelligible by a Modern Greek speaker or even someone who knows the Byzantine vernacular, as if they were writing in a form of Revived Koine (the analogy is again Neolatin contrasted with Vulgar Latin or, say, Archaic Italian). Omnipaedista 02:57, 5 September 2008 (UTC)
Just for the sake of completion and before this discussion gets archived, I'd like to add two links to news websites in grc (originally proposed to me by Crazymadlover): (non Greek writers), in grc (written by Greeks). --Omnipaedista 01:52, 18 January 2009 (UTC)

Willing is more important

Native condition is really irrelevant for starting Successfully a project, a lot of native languages (even with million of native speakers) fail now, and going to fail in the future. The will of people is more important. Many people prefer to use another language than his first one, and it is and historical constant, and going to be being. Prestigious is the reason. Why? because people find a lot of advantages in certain tongues that don't find in another, economic or cultural advantages.

Willing explains why Esperanto (artificial) or Latin (dead) are more successful than Hindi, Afrikaans or Cantonese (with million of speakers). unsigned by 15:36, 27 August 2008.


I have edited the policy draft in response to Pathoschild's request for comments on foundation-l. My changes are:

  • Removal of the ISO 639 requirement. Technically, all that is needed is an RFC 4646 code (like zh-min-nan). The language code identifying the localisation is traditionally RFC 4646, as is the HTML "lang" attribute. And there is no reason why multiple wikis can't share a language code -- there are plenty of language group codes which provide for miscellaneous minor languages and dialects. They just have to have a unique domain name. The domain name does not have to be the same as the language code.
  • Reduction in the criteria for classical languages. Study of languages of literature should be encouraged, as I said on foundation-l.
  • Removal of the requirement to localise MediaWiki in the language in question. Many languages have a speaker population with 100% bilingualism, that is, all speakers of the minority language are also fluent in some dominant regional language. In such cases, it's not necessary for the interface to be localised in order for the wiki to thrive. Localisation is an onerous task; a bureaucratic hoop to jump through in order to prove that the requester is sufficiently motivated. I don't think it's necessary.

-- Tim Starling 02:25, 5 September 2008 (UTC)

I, personally, agree. Crazymadlover
Hi, I just saw Pathoschild's note about how work on this community draft has "stagnated." Any "stagnation" on my personal part is do to traveling overseas throughout August, but also to the fact that I thought the draft was more or less in a spirit I agreed with even before I left.
Before Tim's major overhaul (which I think is positive and will discuss in a moment), the draft was basically a modification of the existing policy. It took the basic elements of the policy that stands, and changed them enough so that the people working on it (myself included) could accept it. That was good enough for myself and some others, which is probably why it looked like there was some "stagnation."
Tim's overhaul is novel in that instead of modifying the current policy, it rewrites everything and even jettisons parts. I happen to agree with nearly everything in Tim's draft (largely because I agree with his overall philosophy on Wikimedia languages in general), and would be happy to see it become policy.
What is probably needed now is for other people who have worked on this page (or would like to) to comment on Tim's draft, and indicate which of the two versions they would like to work from. I would like to work from Tim's. Dovi 07:31, 5 September 2008 (UTC)
I was sceptical about the need of full localisation for the second project as I already mentioned earlier in this discussion. It was much too much work. But I do see need for at least basic localisation. If you say second language interface is good enough I'd say second language Wikipedia is good enough too. What's the use of a X Wikipedia when even the X Wikipedia project itself does not think language X is useful enough to present X content in X? It detracts from the earnestness of the project. It's like a webpage advertising PHP written in Perl or like buying a bag of "Original German Pretzels" with "Made in the Czech Republic" written in small letters at the back. It's like a Cadillac authorised car dealer driving a BMW.
Please, no projects without basic localisation. --::Slomox:: >< 14:45, 5 September 2008 (UTC)
The advantage (or I should better say "difference" cause it has disadvantages too) of ISO 639 is, that ISO 639 does some presorting for us. They only accept "languages" (that is: sufficiently different language variants). With RFC 4646 it is easy to create codes for British and American English. Or for Swiss German and "German German" (the main difference being [apart from administrative terms of the respective administrations] using ss instead of ß in Swiss German). ISO tries to only accept languages (sometimes they fail, but at least they try) that are so different, that they are not or not usually mutually understandable. With ISO the requester had to give proof of the distinctiveness of the language to ISO to convince them to apply a new code. With RFC 4646 _we_ (actually the requester has to give proof, but he _will_ give proof and we have to sort out whether his arguments are valid) have to proof, that it is not distinct enough to deserve a project of its own.
(On the other side I too do know the disadvantages of ISO. For example there was a request for a Westphalian Wikipedia which was declared eligible by Gerard, although in the case of Westphalian ISO applied a new code notwithstanding Westphalian is not that unique, that it deserves being treated different from Low Saxon. In Gerard's eye ISO was stronger an argument than my arguments against. It's hard to contest ISO, even when they are wrong.) --::Slomox:: >< 15:11, 5 September 2008 (UTC)

Objective versus subjective policy

Tim Starling's changes bring up the question of objectivity versus subjectivity.

The current policy was intended to be as objective and measurable as possible. By objective I mean that any reasonable person would reach the same conclusion, regardless of nationalism or bias. Does it have an ISO 639 code? The answer doesn't depend on how much you want or don't want the project to succeed. This ensures that subcommittee decisions are fair, unbiased, and universal. If someone disagrees with a subcommittee decision, they can point to the policy as the public document of principles and rules the subcommittee follows. Even so, there are occasional complaints that the subcommittee is unfair/evil/conspiratorial.

Subjective criteria require judgment and opinion, which means the approval of Wikimedia projects will depend on the arbitrary opinions and guesswork of the few subcommittee members. Examples of such criteria are "sufficiently unique" and "sufficient worldwide number of people able to express themselves at a fluent level". What is sufficient? The subcommittee would decide, on a case-by-case basis. If the subcommittee members personally don't like a request, there's nothing stopping them from interpreting the criteria a little more negatively (or conversely, more positively for requests they favour).

The current policy was intended to maximize objectivity and minimize subjectivity, while allowing some room for judgment in unusual cases. Even the subjective criteria were tempered with more objective measures (for example, "an active test project" is footnoted with "It is generally considered active if the analysis lists at least four active, not-grayed-out editors contributing meaningfully over multiple months"). However, the community draft is increasingly minimizing objective criteria and maximizing subjective criteria. This includes discarding the language code requirement (objective) and native user requirement (objective and measurable), and adding a fluent user requirement (subjective, no reliable measures or counts of fluency).

Do we really want a small group of Wikimedians approving or rejecting projects for debatable reasons based on their own interpretations of subjective requirements and their opinions and guesswork? If you think the subcommittee is unfair/evil/conspiratorial now, wait until they can rationalize the approval and rejection of any request at will. —{admin} Pathoschild 19:55:36, 05 September 2008 (UTC)

I would also add that even with all the objective criteria that are currently in place, we can still spend pages upon pages of emails arguing (contrary to popular belief, we actually disagree from time to time :O). I can't imagine what would happen if those criteria were not there. Even if there were no langcom and the approval process for new languages requests were completely community-based, it would require a big investment in time and energy for someone who wanted to participate. Objective criteria makes things much easier for everyone involved.--Shanel 20:16, 5 September 2008 (UTC)
I agree, that the policy should be more objective. Right at the start of this whole discussion I created User:Slomox/Languages with the aim to define objective and measurable criterions.
But I don't think, that fluent versus native makes it less objective. Native is not as objective as you may think. Native in its etymological sense means by birth. Well, at birth nobody speaks any language. We aquire it in the first years of our live. Another term for native language is mother tongue, so the language learned from the mother (parents). There are many people that learn to understand the language of their parents, but do not actively speak it. They adopt the dominant language of their surroundings. Other people learn to speak their mother tongue in young years but after being transplanted into another linguistic setting some of them loose their mother tongue completely. Many migrants do speak the language of their new home area much better than their old native language. And take Hebrew. How many speakers of Hebrew are actually native speakers? I guess many of them are today, but hundred years ago there hardly were any native speakers. Jews from all over the world with dozens of different native language immigrated into Palestine and adopted Hebrew as common language. According to Wikipedia the British accepted Hebrew as one of three official languages of Palestine in 1922. I don't know exact numbers, I guess there were some native speakers of Hebrew in 1922. But there was hardly a sufficient number of them then. If Wikipedia and the current policy would have been around already in 1922, the Hebrew Wikipedia would be refused. Hebrew had a status similar to Greek ot Latin then. Used in ecclesiastic contexts since centuries and even milleniums, but not spoken on a day to day basis. Fluent is much more objective (its a basic principle of modern liberty that we are not determined by birth, but are free to choose.). And I guess, much statistical data about native languages actually stems from censuses not asking about native languages but preferred languages. Fluent speakers can be measured just as good as natives. sufficiently is much more a problem. --::Slomox:: >< 20:57, 5 September 2008 (UTC)
Your "objectivity" is a form of bureaucracy. You reject projects capriciously, and you discourage people who wish to participate in the project. Two of the rules you chose were especially arbitrary, which is why I removed them. Nobody has ever said that we should have a Wikipedia for every language in the SIL catalogue. If you think that you can objectively decide which of the SIL listed languages should be created, then why can't you objectively decide which of the non-listed languages can be created? I think we can decide on language proposals objectively, without the help of those two arbitrary rules. -- Tim Starling 03:31, 6 September 2008 (UTC)
Do we really need a policy at all, if we follow your logic? Should the community consider the requests on a case-by-case basis, as was done before the policy? Given that the requirements left are unqualified and subjective (the wiki must be "sufficiently unique" with a "sufficient worldwide number of people able to express themselves at a fluent level"), there is little difference between this policy and a few open guidelines. —{admin} Pathoschild 04:35:17, 06 September 2008 (UTC)

Pathoschild, as I said at the foundation-l, strictly applying a measure of having an ISO [whichever] code is a bad bureaucratic measure and far from being fair. You are moving the issue to ISO, which is a political, not an expert organization. And even LangCom is able to make a more fair decision than ISO by using minds of LangCom members instead of using minds of politicians from ISO. There are a lot of examples where ISO fails to be fair. I mentioned South Slavic and Hainan areas at foundation-l and I am sure that I am able to find such problems at a number of other places. --Millosh 01:05, 7 September 2008 (UTC)

Edit war

Hm. It seems that we need some experience from Wikipedia ;) Gerard, if you want to reach consensus, then reverting Tim's edits is not a good way to do that. If you don't want to do that, then we should vote about it. --Millosh 13:46, 6 September 2008 (UTC)

I do not vote on things that are demonstratively wrong. When you insist to vote anyway, do you by inference make it right ? Thanks, GerardM 18:06, 8 September 2008 (UTC)
I think that you are wrong, you think that I am wrong. I would be completely content if your position gets majority because we would have community supported decision. --Millosh 02:44, 9 September 2008 (UTC)
In a consensus building process we should talk about one by one issue. And, if we can't reach consensus, we should vote about particular issue. --Millosh 02:44, 9 September 2008 (UTC)
The other way which I may see is to move this discussion to experts. And if you have some better idea, I want to hear it. --Millosh 02:44, 9 September 2008 (UTC)

Proposal for principles

I think that it is possible to formulate simple and working principles for making new Wikimedia project. So, I'll try to do that here: --Millosh 14:32, 6 September 2008 (UTC)

  • General criteria. --Millosh 14:32, 6 September 2008 (UTC)
    • General criteria cover all languages (natural, ancient and constructed). Out of the scope of analysis are art languages, because supporting them is against the primary goals of Wikimedian Foundation. However, if such language becomes a relevant media for daily, cultural or scientific communication; they may be considered again, when it would be treated as a conlang. --Millosh 14:32, 6 September 2008 (UTC)
    • Project has to be educationally useful. --Millosh 14:32, 6 September 2008 (UTC)
    • Project has to have a relevant amount of contributors behind itself. Language subcommittee decides about measures for proving that. --Millosh 14:32, 6 September 2008 (UTC)
    • Language should have ISO 639 (1-6) code or it is classified as a subject of some of them. --Millosh 14:32, 6 September 2008 (UTC)
    • Language has to be significantly different than any existing Wikimedia language. This means that it is not possible to make a (relatively) simple conversion engine between two languages. --Millosh 14:32, 6 September 2008 (UTC)
      • It's not possible to make a correct conversion engine between en and en-fonipa, despite the fact that the difference is merely one of alphabet. Nor can en-US and en-UK be reliably interconverted; think bonnet and biscuit. On the other hand, Norwegian, Danish and Swedish can be interconverted relatively well.--Prosfilaes 02:14, 8 September 2008 (UTC)
        • Actually, conversion engine between no, da and sv is not so simple (just some kind of conversion engine [not so good] exists between Nynorsk and Bokmal). And even it is possible, that means that those three projects should be one if we are discussing about their creation now. We are not discussing about existing projects. The goal of WMF is to spread knowledge and if it is possible to make conversion engines between any number of languages, it would be a great thing: more people would work on one project, instead on a number of them, by doing a number of times the same thing. BTW, you really don't know linguistic situation if you think that it is more possible to convert Scandinavian languages than en-US to en-UK and vice versa :) Also, there are existing exception markup (Chinese, for example), so I don't see a problem with a small amount of them. Also, I would like to know how many English speakers are using IPA for writing. If there is a significant amount of them and it is really not possible to make a conversion engine, I don't see a reason why not to create such project. --Millosh 15:10, 8 September 2008 (UTC)
    • A language which passes those criteria may apply for any Wikimedia project. It is up to Language subcommittee to decide (and/or to make some kind of measuring) does one community has enough of human resources to build another Wikimedia project. --Millosh 14:32, 6 September 2008 (UTC)
  • Criteria related to endangered languages, languages without written standard and languages without official recognition. --Millosh 14:32, 6 September 2008 (UTC)
    • If a language is a natural one and it doesn't fulfill general criteria, such language has to pass the next criteria: --Millosh 14:32, 6 September 2008 (UTC)
    • It is a unique language. It is not possible to make a simple conversion engine from another existing language. --Millosh 14:32, 6 September 2008 (UTC)
    • Language subcommittee is responsible for analyzing is the language a real one or it is a hoax. If the language has ISO 639 (1-6) code (or it is classified as a subject of some of them; note that ISO 639-6 is not yet published), LsC would decide positive. If it doesn't have, LsC should analyze the situation in cooperation with relevant scientists and institutions. --Millosh 14:32, 6 September 2008 (UTC)
    • Such language may get Wikipedia, Wiktionary, Wikibooks and Wikiquote as a separate project or inside of the Compendium (if created). --Millosh 14:37, 6 September 2008 (UTC)
  • Criteria for ancient and constructed languages. --Millosh 14:32, 6 September 2008 (UTC)
    • If a language is an ancient or constructed one and it doesn't fulfill general criteria (Sumerian), such language may get Wiktionary and Wikiquote as a separate project or inside of the Compendium (if created). --Millosh 14:32, 6 September 2008 (UTC)

Developing criteria for arguments

There are a number of not so precise criteria in my proposal above. (As well as having strict criteria is too restrictive.) Because of that, LangCom should develop criteria for arguments.

For example, one criterion is "Is the project in that language educationally useful?". Arguments pro and contra may be very diverse.

For example, "Esperanto is not educationally useful because it doesn't have native speakers." or "Esperanto is useful because it is partially used as a lingua franca." I wouldn't disqualify any of those two arguments (and a lot of similar ones). If the arguments are true, they should be used together in some formalized way. The job of LangCom is to decide how strong are some arguments and to make a formalized way for the process of (dis)approving projects.

LangCom should gather such arguments (including arguments made by LangCom members) and classify it publicly. Thanks to a public work, interested parties (and other Wikimedians) would know where the process of (dis)approving a project stays. This is a level of transparency which is necessary for any Wikimedia body. LangCom shouldn't have a power for decisions out of the eye of the community and using personal opinions as dominant in the decision-making process.

I know that LangCom has two strict principles: ISO 639-3 code and a level of localization. However, such strict principles are highly bureaucratic because of a number examples which were mentioned already. --Millosh 15:10, 6 September 2008 (UTC)

When you base your argument on the exception to the rule, does it prove that your argument is correct ? Thanks, GerardM 18:08, 8 September 2008 (UTC)
As I said to you a number of times, I don't have a problem with the general ability of LangCom to make a decision related to a WMF language. The next step which should be passed in development is considering the fact that the linguistic situation of homo sapiens is more complex than restrictive standards are trying to explain. And this is based on exceptions of those [restrictive] standards. --Millosh 02:58, 9 September 2008 (UTC)
Society is by nature a complex event. Trying to make strict rules when all (or the most of) variables are not known is not the best idea. There is a wide range of clear situations: Any living language with a written tradition will pass existing LangCom criteria and it should become a WMF project if demands defined by LangCom are fulfilled; art languages will not pass LangCom criteria. However, there are a number of border cases and we are talking now about them. --Millosh 02:58, 9 September 2008 (UTC)
When people are of the opinion that the ISO has it wrong, they can talk to the maintainer of the ISO. The language committee can help them with that if we think it has merit. If people do not want to do this they forfeit the argument. Remember we are not in a rush, typically it takes quite some time as well to get the localisation, the incubator project done.(GerardM, not signed)
I am a Wikimedian, not a part of ISO. What ISO is willing to do, it is up to it and, while I have a general interest in work of that organization (as it is an international one), their work in particular are not a matter of my interest. A matter of my interest, as it is the case of all Wikimedians, is Wikimedia and its bodies. I want to have in-house solution for our problems and I don't want that we depend on some external organization. --Millosh 06:25, 9 September 2008 (UTC)
How can you on one moment argue for the BCP and the next argue that you are not interested in maintaining this "best practice"? When we publish our content, we need a language code that makes sense. If the WMF is to establish its own "languages", it does not fit in the wider world, hardly a best practice. I am sorry for you but we are part of a wider world, and this wider world is where we are heard when we have a case to make. Thanks, GerardM 06:53, 9 September 2008 (UTC)
One thing is that, let's say, Langcom sends to IEEE suggestion "please, add this as a language, Wikipedia is using that because of ...", the other is to spend enormous time and energy in arguing with some politician from the country XY which can't understand that there are two separate entities: language and ethnicity/nationality. --Millosh 07:59, 9 September 2008 (UTC)
Again, you are saying that I said something which I didn't. Wikimedia shouldn't establish its "own languages" because all of the languages which I mentioned have not so bad linguistic descriptions; actually, it seems that a number of them are inside of ISO 639-6 standard. Again, there are a lot of linguists and literature which describe everything which we need. For the most of them, the best source for basic description and further readings is Wikipedia in English. --Millosh 07:59, 9 September 2008 (UTC)
For me, "the wider world" are humans who are speaking different languages; but it seems that "the wider world" for you are different bureaucratic organizations who are putting stamps on their documents and declaring what is a language and what is not. No one has the right to put such stamps, you should know it; and Wikimedia has the right to decide which language edition of which project it will support. --Millosh 07:59, 9 September 2008 (UTC)
And, at the end, we are at the edge of established standards because we are dealing with real languages, with real people; not with fictional codes. And we are responsible for moving them forward. --Millosh 07:59, 9 September 2008 (UTC)

We will be told to talk to ISO by organisations like the IEEE. At ISO we will be talking to people who maintain the standard, in the case of languages it will be [thing is that, let's say, Langcom sends to SIL]. The ISO-639-6 will explicitly not call something a language; it will deal with "linguistic entities". It is at the level of the ISO-639-3 where things will implicitly be considered a language.

Wikipedia in English is as far as I am concerned notorious for the POV pushing re languages. It does not have any relation to what allows us to use meta tagging for our content. When nobody has the "right" to call something a language, how can the language committee do exactly that? What makes you think that either en.wikipedia or the language committee has the maturity to be considered a best practice?

We have been pushing the envelope of the best practice because we have adopted the ISO-639-3 for a number of years now. Your characterisation of our current practice is plain wrong. You assume that things are in a particular way and sadly you cannot talk from experience. GerardM 10:15, 9 September 2008 (UTC)

Some changes

i have made some changes in the following ways. 1.- trying to reduce a little the criticized subjectivity. 2.- establishing objectives criteria for conlangs, the previous has really been subjective, too. 3.- Putting back localization, it is really necessary to prevent people are coerced to understand English. Crazymadlover

Quite frankly, I have serious trouble evaluating all the major changes made time and again, when nearly the entire texts changes or sometimes goes back to what it was before. Would be better to discuss first... (not just for your edits but for major changes made previously too).
There is no agreement on the current requirements for localization (which you put back in). Some (like Tim) think there should be no requirement at all (I sympathize). Some think minimal requirements make are acceptable as a test of community will (I can live with this). Some think that the current requirements are good. But I personally think they are highly exaggerated. For goodness sake, no one is being coerced into learning English. The contrary: We are giving people a real chance to build a community that can, over time and wiki style, translate as much of the interface as they deem fit. The current requirements make it necessary to translate many hundreds of messages that are not needed for doing basic work on a wiki and to get started building a community. My hunch is that minimal localization requirements are something everyone could live with. Dovi 09:15, 7 September 2008 (UTC)

RFC 4646 is not useful for our purposes

The RFC 4646 is not useable for our purposes. It excludes many languages that we have already recognized, languages that are part of the ISO-639-3. This RFC will not include any of the ISO-639-3 languages because it will be replaced by an RFC that will include them. The next RFC will include the ISO-639-3 recognised languages. Another problem with the RFC is that it recognises dialects and orthographies, all of them will not be recognised by the WMF.

Consequently the only list that does supply us with languages that are recognized as such is the ISO-639-3. The RFC 4646 will be replaced and all ISO-639-3 languages will be included. There is no merit to replace the ISO-639-3 standard with something that is this clearly disfunctional for our purposes. GerardM 10:50, 8 September 2008 (UTC)

Hehehe, you are using the same tools, again. RFCs are in the constant process of development and new RFC may stay that it obsoletes the old one. Saying that something is based on RFC XXX means that it will be replaced by RFC YYY when the next obsoletes the previous one. So, this argument is no argument.
May you tell me what the dialect is?
And, again, the only way how to say to the browser that some page is written in some language is, AFAIK, by using BCP47. So, every ISO-639-3 code will have to be translated into BCP47 even if we are using ISO standard.
Besides that, again, you may use whatever you want, but, LangCom is more relevant than ISO for WMF projects. So, please, try to think with your head, not with ISO's head. --Millosh 15:18, 8 September 2008 (UTC)
You got it the wrong way around. Every entry of the ISO-630-3 will end up in the recognized in what is the next RFC. For you ISO is political and consequently unacceptable, hey think again, it is the only game in town that is of any use. GerardM 15:28, 8 September 2008 (UTC)
Being a political organization is not bad per se. It is bad because of a number of examples which I gave to you, and where it fails to describe linguistic reality. Which is a result of being a political organization.
RFC is more flexible and it is possible to avoid restrictiveness of ISO notation by using it. While such cases shouldn't be often, it would be possible to make WMF projects in languages which ISO doesn't recognize because of the political deals between governments. --Millosh 15:34, 8 September 2008 (UTC)
The RFC 4646-bis EXPLICITLY will include the ISO-639-3. It EXPLICITLY will not include any of the languages that would be new to the RFC that are in the ISO-639-3. There is no flexibility there. This has all been said before, now please explain why we should follow the RFC 4646.... Thanks, GerardM 18:11, 8 September 2008 (UTC)
I understand. for the first time i would agree with you!!!. RFC 4646 remains dependent on the ISO code, worse, now of the obsolete ISO 639-2, all the language tags are provided from ISO. and it will continue depending on what previously established ISO. RFC don't bring any new. actually it would happen that less languages would have access to Wikimedia, because they wouldn't be recognized. Crazymadlover
  1. RFC 4646 will be obsoleted by a new RFC and LangCom should move to the new RFC when it would be published. --Millosh 02:33, 9 September 2008 (UTC)
  2. RFC is extensible, unlike ISO, so we may have temporary generated (even a private) code for one language if such doesn't exist in ISO 639-3. If such exists in ISO 639-3 and doesn't exist in RFC 464-bis, we may use ISO 639-3 code, of course -- as a reasonable measure in the transition to the new standard. --Millosh 02:33, 9 September 2008 (UTC)

I am not arguing that we should use one or another system; I am arguing that we should think about our best interest. --Millosh 02:33, 9 September 2008 (UTC)

Also, you didn't respond to my question at foundation-l: If someone ask LangCom, let's say, for the project in Chakavian or Kajkavian -- what will be the answer of LangCom? Note that both languages (treated as "dialects") have written tradition and different language system than Croatian standard based on Shtokavian language system. What would LangCom decide about the Tai-Kadai languages of Hainan? If you want, I may find a number of such examples. --Millosh 02:33, 9 September 2008 (UTC)

What codes would RFC 4646 use for these languages?. Crazymadlover
How about, no? Why on Earth do we need two Croatian dialects? Why not American and Canadian wikipedias? We can declare those dialects as languages, too. Toss in Australian and New Zealand wikipedias, and we still aren't at the absurdity level of fragmenting Croatian. What a goal, to fragment the Croatian Wikis against the advice of professional linguists--and the designers of ISO 639-3 are known for splitting languages that other linguists would consider one! Weakening a smaller wiki by creating an couple tiny and horribly geographically limited Wikis out of it...I'm not seeing how that's in our best interest at all. To do so against linguist advice is foolish.--Prosfilaes 15:04, 10 September 2008 (UTC)
Your linguistic knowledge is at a very low level. Australian and New Zealand English have the same language system as American or British one. But, there is Scots Wikipedia and Scots is a distant dialect of English. Chakavian and Kajkavian are distant from Croatian standard, as well as Torlakian is distant from Serbian standard. --Millosh 11:38, 13 September 2008 (UTC)
I think blaagnorth is a more correct term than language system; neither have meaning, but the first makes that clear. Scots is generally considered a language by linguists, which is why it has its own ISO 639-3 code. Not that it's exactly a useful wiki, but it falls within the general rules for permissible Wikis. Chakavian and Kajkavian are not considered languages by linguists, as shown by the lack of an ISO 639-3 code.
Honestly, I don't think all these tiny Wikipedias are helping at all. Wow! "The Louvre Museum (French: Musée du Louvre), in Paris, Fraunce, is the warld's maist veesited airt museum, a historic monument, an a naitional seembol." (Complete text of article on 9/14/2008) At which point all the readers of that article who actually want to know something about the Louve hit the English iw link. (Okay, a few of the more educated ones hit the French iw link, to get the info from the horse's mouth.) But rather than have arguments over each and every language, we'll accept a general standard: ISO 639-3. (I'd rather use ISO 639-2 myself, but not worth fighting over.) I'm vastly opposed to moving that line so we create even smaller more specialized Wikipedias for spam and POV to breed on in absence of real users, and where even the most dedicated users of the wiki go to larger wikis when they want to actually look something up.--Prosfilaes 19:58, 14 September 2008 (UTC)
Regulating issues like spam and POV is one thing, the other is giving possibility to someone to have an encyclopedia (or whatever) in their native language. As I wrote below, there are possible principles on which we may base our decisions about which variety may have a separate project (standalone or inside of the Compendium) and which one should be solved on other ways. Completely other thing is to trust to the political institutions (including states) which have interests not to "promote" some variety into a language, while they have interests to promote some other variety into a language. As I mentioned a couple of times, it is possible to make a conversion engine between Serbian, Croatian and Bosnian standards, while it is not possible to make conversion engine between any of them with other three language systems at the same area (actually, it may be possible, but it would be a very complex task). --Millosh 15:08, 4 October 2008 (UTC)
Again please use blagnorth instead of language system; it makes it easier for readers to realize that it has no established meaning. Or better, use real words, that we can look up in an encyclopedia. I'm sure it's not really possible to make a conversion system between Serbian and Croatian, any more than you can do so between American and British. Heck, you can't make a conversion system between Southern and Bostonian that's correctly translate the differences.
It's nice to wave about political institutions. Of course they're all evil. Of course, ISO 639-3 is written by a particular organization, the w:Summer Institute of Linguistics, but actually saying something about them might verge on slander and actually be debatable and even disprovable.
I want Wikimedia to help the creation of encyclopedias for everyone. But I want them to be real encyclopedias that people actually use. Wikipedias for languages that are supported by states and academies, that have standardized spelling and are taught in schools, they are more likely to be large successful Wikipedias that people actually use. Languages that are supported by a tiny group of people with little to no official support, less so. Languages/dialects that the Summer Institute of Linguistics, a group known among linguists for dividing language groups into more languages then just about anyone else, doesn't recognize come with a huge POV chip on their shoulder, no linguistic standards, and a community that's used to writing in a standard language and is likely to use that language's Wikipedia unless they're political. Not a win for Wikimedia or anyone.--Prosfilaes 19:26, 15 October 2008 (UTC)
While Kaykavian is moribund, Torlakian is not moribund, but it has little amount of written literature, Chakavian is not moribund and it has a strong written tradition which lasted from 10th century to 20th century (active writing in Chakavian stopped around 1920s). While Chakavian[ Croat]s didn't express (yet) intention to create a separate project, ISO 639-2/3 based decision would disapprove them existence; even they could exist inside of the Compendium. (BTW, POV on such project would be the same as POV on hr.wp is (as POV on Torlakian project would be the same as POV on sr.wp). So, in that particular case nothing would be lost.) --Millosh 15:08, 4 October 2008 (UTC)

In fact, it seems that there is no merit to replace ISO-3 by RFC 4646. Crazymadlover

Not to replace, but to use it creatively. And RFC 4646 allows more creativity than ISO 639-3. Creative usage includes incorporation of ISO 639-3 codes into RFC 4646 layer. --Millosh 11:38, 13 September 2008 (UTC)


if we read the localization manual, we realize that for the first project only we need the 500 most used Mediawiki messages, This is a reasonable midpoint between total and null localization. The localization requirement can return.

500 messages is a huge prerequisite for a new, small project, and far too much to require as a test of community. Only a tiny fraction of these "most used" are needed for basic editing and navigation on the wiki. If we must have such a requirement, I suggest a maximum of 100 out of the 500 "most used." Dovi 18:57, 16 September 2008 (UTC)

Location of test projects; different writing systems.

There is not agreement about where test projects should ultimately be located, and this has been the subject of some debate in the recent past. This proposal is meant to define prerequisites for new languages, but it will not be able to decide whether the Incubator or another place is the best place for test projects, nor should it. If the text is left with directions where to put test projects and no explanation that this is common practice and not policy, then people will end up treating it as a law carved in stone, and reject perfectly valid test projects that are not located where they thing they should be located. That is why I completely oppose any attempt (especially the most recent one that gave no relevant explanation at all) to remove the italicized explanation.

One reason I oppose this so firmly is because, though it is rarely discussed, there are plenty of languages that cannot be supported on "regular" wikis. The case I am most familiar with is RTL languages, that can never be satisfactorily supported on LTR wikis such as the Commons, Meta, or the Incubator. So let's not have a policy that will provoke needless debate when, for fair reasons like this or others, a test project is not located where most people assume it should be.

Furthermore, while on the RTL-LTR topic, I see that the language on writing systems has been made much stronger (too strong in my opinion):

In all cases, this excludes regional dialects and different written forms of the same language.

Before, it was in general for most cases, but not in all cases.

However, in every wiki project where the same language uses both LTR and RTL writing systems, this policy has been a complete failure. I am familiar with Ladino and Kurdish, but there are several more. It simply doesn't work on a technical level. Therefore, the language here should once more be reduced to saying that writing systems get their own wikis only when there is a clear, demonstrable need, but in most cases should coexist on the same wiki. Dovi 19:08, 16 September 2008 (UTC)

Parts of this page make no sense

I realize that we are on an international project here and that parts of policy are going to be written by people for whom English is not a native language. It is an every-day reality, and it is acceptable that there are sometimes grammatical errors. However, on very rare occasions, someone writes something into a policy proposal - as User:Crazymadlover has done here - that I can't understand at all. [8] this paragraph in particular: "If the proposal is for a constructed language, this must be used at present as Engineered language or pretend to be used as international auxiliary one, additionally to have literary pieces written previously by people other than the creator of the language. They are excluded those which are only used at present with artistic purposes."

I'm sorry, but what does that even mean? Crazymadlover, if you wrote this paragraph in Spanish for me, maybe I could understand it better? --Node ue 22:24, 16 September 2008 (UTC)

Not approving, just trying to explain...
There are several reasons for constructed languages. Some are created, like Lojban, to explore how language affects how people think and interact; these are called engineered languages. Others are created as a way to help people communicate in a simpler (or at least neutral) language. These are called auxiliary languages. Artistic languages are languages like Sindarian and Klingon, that are for use in artistic works or to be works of art themselves. This policy excludes the artistic languages and demands the others have literary pieces written by people other than the creator of the language.
Personally, I find this arbitrary; why are engineered languages permitted and artistic languages excluded? Regional auxiliary languages (like a pan-Slavic language) are forgotten about completely. The "pretend to be used" is rude and silly; any auxiliary language worth an Wikimedia wiki have been used for two people without a common language to communicate in, if only as penpals.--Prosfilaes 00:43, 17 September 2008 (UTC)
First, International auxiliar languages aren't in all cases "World languages", If they can be used for communication among people with different languages, they would be international, indifferently if their use is global or regional; pan-Slavic is not excluded a priori. Second, I changed the clause, now is "Expect to be used", perhaps it could be "attempt to be used". about artlangs i will answer later. Crazymadlover.

Ausbausprache - Abstandsprache - Dachsprache

It is inevitable we have to use scientific tools, one of them is the Ausbausprache - Abstandsprache - Dachsprache criterion.

If GerardM critized the subjectivity of the clause "Sufficiently unique", we can add scientifical criteria in the draft. Crazymadlover

Where are we going to get the money to send researchers out to study the languages so they can apply that criterion? We don't use scientific tools at Wikipedia, we rely on reliable sources to pronounce on the issue. In this case, that's ISO 639-3.--Prosfilaes 23:05, 18 September 2008 (UTC)
We should not pretend to be capable to define what makes a language. We have an interest in the process, but defining them is ludicrous. GerardM 05:51, 19 September 2008 (UTC)
We should not pretend to be capable to define what makes a language That's true. But we are surely able to define what language variants are eligible to have an own version of one of Wikimedia's projects. And one thing I know for sure: We should not pretend that ISO is capable to define what makes a language. --::Slomox:: >< 09:01, 19 September 2008 (UTC)
No, we shouldn't even begin to pretend that a relatively well-funded group of linguists are best able to draw up a list of languages. What lunacy would that be?--Prosfilaes 03:21, 20 September 2008 (UTC)

What still remains

What do you think remains in the language proposal policy - community draft for its approval? remember, it does not matter our personal views as much, but the best arguments do. we can't afford to spend years bickering over minor details. please reply in the talk page. Crazymadlover.

The aims of the proposal, I take it, are to provide clear guidelines for applicants and for the subcommittee and to make long disputes unnecessary. The proposal is simpler than the current policy, and -- to that extent at least -- better. I am commenting on a few sections with the thought that we might get a text that is even clearer and even less likely to provoke disputes.
  1. In all cases, this excludes regional dialects and different written forms of the same language. But there is no linguistic consensus about what are "regional dialects" and what is "the same language". The sentence would invite endless disputes and should be deleted.
  2. The subcommittee does not consider political differences ... This can hardly be true: the subcommittee must talk about them sometimes. It would be more honest to say: "Political differences do not justify a separate project."
  3. The proposal has a sufficient worldwide number of people able to express themselves at a fluent level ... Some editors above have suggested that we should specify a precise number. But there is no linguistic consensus on what that number should be. So, if we imposed a number for our own bureaucratic reasons, it would invite disputes. In reality, the existence of a sufficient potential community and audience will be demonstrated by the test project. So this sentence should remain as it is.
  4. If the proposal is for a extinct language, it must necessarily be well attested in writing. Additionally, for any kind of wikimedia projects, it must: Be considered classical, still have a wide cultural influence and have a large extant corpus of literature ... If the proposal is for a constructed language, this must have be intended for use as an international auxiliary language or be an Engineered language. There must also be literary pieces written by people other than the creator of the language. Languages which are currently used only for artistic purposes are excluded. These guidelines are too complicated and they stray from the real purposes of the Foundation. If a guideline is needed it should be something like this: "If the proposal is for a language without native speakers, it will need to be demonstrated that it is well attested in written texts, and is in active use as a special, auxiliary or learned language." I wondered whether to put "active use internationally"; but that would contradict the Foundation's aims, which don't have anything to do with national frontiers. The general requirement for "a sufficient worldwide number of people" is already adequate.
And thanks to those who have developed the proposed policy to this point! Andrew Dalby 21:13, 29 September 2008 (UTC)
I implemented your #4 argument, review it!. The #1 and #2, needs additional discussion. Crazymadlover
Agree with all four of your points. Regarding #1, I've already argued above that this should be a general guideline but not absolute rule in all cases. Dovi 18:55, 1 October 2008 (UTC)
Sorry, I missed your argument above! I agree, #1 would work fine if it is a non-binding guideline. Andrew Dalby 21:13, 1 October 2008 (UTC)
I cannot accept this proposal as a community draft. It is a draft by some people, which of course is okay, but I do not want to be considered represented by them, and definitely not with this proposal, which the name 'community draft' does imply. - Andre Engels 07:57, 3 October 2008 (UTC)
To be more precisely, I have repeatedly and adamantly opposed the emphasis on translation of the MediaWiki interface as a criterion. Yet here it is not only kept in, but by extending the requirement to all rather than just the most used statements, and by basically removing any other requirements on the project (just keeping requirements on the language), it is made heavier and given more weight. - Andre Engels 08:05, 3 October 2008 (UTC)
So be bold and make the change! This was discussed above, you can continue the discussion and make and any changes that reflect the discussion. I also support ditching the requirement, though I would agree to truly minimal requirements in that area because a proposal with minimal ones would have a better chance of being accepted than one with none. Dovi 16:28, 4 October 2008 (UTC)

ISO 639-3 requirement

The addition of (if available) makes it a soft requirement. The consequence is that you invite an endless stream of discussions, discussions that are not welcome. When there is sufficient reason to recognise a language, this argument should be made at SIL ie ISO. Thanks, GerardM 18:32, 2 October 2008 (UTC)

So in other words, you are of the opinion that the min-nan Wikipedia should not have been created, because its creation would need discussion? I don't think discussions are a bad thing. And insofar as they are a bad thing, we can say that the committee can just cut them off at a point saying "we have heard all the arguments, and this is our decision". I'd rather have a bit of arguments every now and again than to exclude projects by bureaucratic criteria - or to include unwanted projects on similar criteria, by the way. I would therefore say 'go by ISO 639-3 in principle, but allow well-argued exceptions in either direction'. - Andre Engels 07:54, 3 October 2008 (UTC)
I suggest to add this sentence: If there is no valid ISO 639-3 code, the proposal will provide some of the material to present to the World Language Documentation Center, and a RFC 4646 code. Crazymadlover.
The ISO-639-3 is about languages. SIL is the right organisation to deal with such requests.. I am a member of the World Language Documentation Centre and so is a representative of SIL. When a language is accepted by SIL and it has been give a code, it will get a code under the RFC that will be valid at that time. Thanks, GerardM 13:19, 3 October 2008 (UTC)


Please, define what "dialect" means before you use it inside of the formal document. This sentence is unacceptable for me. "This generally excludes regional dialects and different written forms of the same language, especially when technical solutions make such alternative forms viable on the same wiki." --Millosh 09:32, 3 October 2008 (UTC)

well all Arabic varities that has been recognized by ISO-693 is really just dialects and I dont know which standard is applied right now here in wikimedia to distinguish between the language and the dialect ... I could suggest that any dialect could be considered kind of language if it has at least 4 or 5 complete published works in that "dialect/language" and it has kind of formal organization to standarize the rules of writing and scripting ... just as example Egyptian arabic WP memebers asks now for wiktionary just for standarize the unwritten dialects that has been verified here as language --Chaos 17:49, 3 October 2008 (UTC)
They aren't just dialects as long as they have armies. (Old linguist quip; see w:dialect if you don't understand.) They are only dialects in that they are perceived as being dialects; in objective fact, there are many languages--Portuguese/Spanish/Italian, Serbian/Croatian/Montenegrin. No linguist would bat an eye at Egypt declaring Egyptian (Arabic) to be its own language.--Prosfilaes 00:38, 4 October 2008 (UTC)

I'll try to be more constructive toward this issue and I'll try to explain what should describe the precise definition: --Millosh 22:20, 3 October 2008 (UTC)

  1. If varieties may be solved technically (i.e. if it is possible to make a full conversion engine), then conversion engine should be used, and no other project should be started. --Millosh 22:20, 3 October 2008 (UTC)
  2. If it is not possible to make a full conversion engine (it is not possible to make any sensible conversion engine or it is not possible to make a full conversion engine), then varieties have to be mutually understandable up to the level needed for reading one encyclopedia. Speakers of both varieties have to be able to write other variety at the basic level. If possible, some level of conversion should be applied. --Millosh 22:20, 3 October 2008 (UTC)
  3. In all other cases variety should get a project. (Inside of the Compendium or as a separate project; depending on a number of Internet users. Project from Compendium may evolve into the full project if a number of Internet users increase significantly.) --Millosh 22:20, 3 October 2008 (UTC)

The second case is the most problematic. In that case relevant linguists should be consulted. --Millosh 22:20, 3 October 2008 (UTC)

I basically agree (see my comments above in a previous section). Just one thing: "Speakers of both varieties have to be able to write other variety at the basic level." -- don't entirely agree that when speakers of both varieties can use the other variety at a basic level there is no justification for a separate wiki. #2 is truly problematic, and I suggest instead leaving such cases open to community discussion. Dovi 16:32, 4 October 2008 (UTC)
  • There are however dialects that have an ISO code. Take Egyptian Arabic for example, this one has an ISO code. But most of the Egyptian society consider it as a dialect of Arabic. There has been a proposal to create a Wikipedia for it. The proposal was approved. But on this wiki, there is a lot of Original Research. The users there wrote some article in Latin alphabet instead of using Arabic. They used a Latin alphabet system that they invented on their own. Isn't that Original Research? They requested a wikitionary to make what they call "standariazion"! Peer in mind that there is many varieties of Arabic. But there is no defined rules to write any one of them.--Mohammed Ahmed 19:52, 11 October 2008 (UTC)

Revised version as of 3 October 2008

I think this is now a good, simple, clear policy, and much better than the version currently used by the Subcommittee. I would only suggest that the following sentence

  • "This generally excludes regional dialects and different written forms of the same language, especially when technical solutions make such alternative forms viable on the same wiki."

should be formatted in italics, as a guideline rather than a substantive rule. Maybe others disagree on this?

Yes, it is good to mention technical conversion as an option (as currently with Serbian and Kurdish, I guess?) It is also noteworthy that similar dialects/languages can sometimes coexist in a wiki even if automatic conversion is not possible: for example Norman (with templates announcing the dialect in which each article is written). Andrew Dalby 13:48, 6 October 2008 (UTC)

Addressing new projects for languages that already have a project

The proposal does not address the procedure for new projects for languages that are already supported by the WMF.

For these requests there is a requirement to fully localise both the MediaWiki messages and the messages of the extensions as used by the WMF>
unsigned by GerardM on 10:12, 7 October 2008 (UTC).

it is redundant. the requirement is explained in the localization guide and requirements. Crazymadlover
The requesters' handbook is not policy, it is simply documentation. Nothing it says is binding in any way. The requirements should be mentioned in the policy itself. —Pathoschild 19:12:55, 07 October 2008 (UTC)
I add a section for this. please review and tell me your opinion. Crazymadlover
Less requirements for languages already supported by WMF, no sense discuss all requirements for a language that has a previous project approved. Crazymadlover.
Full localisation is a requirement and it proves extremely positive for the languages that are properly supported. Indeed there is no sense in discussing this requirement. Thanks, GerardM 12:36, 8 October 2008 (UTC)

The comments here totally ignore all previous discussion on the talk page about localization requirements. Quite a few people think this should be dropped entirely, so that people can start on their wikis and translate the interface along the way wiki-style (I am aware that Gerard disagrees). Others (including myself) are willing to live with minimal requirements as a test of seriousness and potential success, but consider 500 to be a highly exaggerated number for "most used". Far fewer translations than this are needed for basic usage and editing on average wiki pages and edit pages for a beginning wiki. I have changed the text to reflect this: An entirely new language needs 100 of the "most used" done, while a language that already has a project needs the full set of 500 "most used" done. These numbers are nothing more than suggestions, of course.

It would be great if Gerard could set up a list of the 100 "most used" on Betawiki in addition to the 500 most used. Dovi 05:32, 8 October 2008 (UTC)

There are quite a number of people who argue for complete localisation from the start. The reduction to 500 was a downward compromise at the time. I am not in favour of a further reduction to 100 messages because the "500" were based on a needs analysis. Thanks, GerardM 12:36, 8 October 2008 (UTC)
FYI Betawiki does not consider the promotion to SVN for languages that have less then 50% of the most used messages localised. GerardM 12:45, 8 October 2008 (UTC)

Dovi, requirements which include (I think) more than 90% of the basic localizations (I think, three categories) are really "a minimal requirement". I moved alone Serbian translations from ~60% to more than 90% in a week or two. Three persons interested in project are able to do the job in maximum 2 weeks. --Millosh 14:44, 8 October 2008 (UTC)

Classical languages 2

The fact that extinct languages exist as a Wikipedia is only because they precede the language policy. It does not mean that new ones will be accepted! GerardM 05:48, 10 October 2008 (UTC)

That is your opinion! People consider acceptable classical languages. Crazymadlover

By "extinct" do you mean languages which are hardly ever used or learnt by people other than specialist scholars (such as, say, Hittite), or all languages that no longer have native speakers? The first kind should, of course, not be accepted by Wikimedia, but for the second kind there is a much stronger case to be made. The success that some of these languages have had suggests that quite a lot of people find them useful and learn something from them, so I think that the more widely known languages with no native speakers should be considered viable candidates for a Wikipedia. The exclusion of these languages from the current policy was based on an interpretation of the Wikimedia Foundation's mission with which not everyone agrees; perhaps a new policy should have a different interpretation. LeighvsOptimvsMaximvs (talk) 18:39, 10 October 2008 (UTC)

Classical forms of existing languages are definitely not the same thing as ancient extinct languages that are either ill-attested or don't have any descendants. It is not a coincidence that among the more prolific projects of the incubator are the Ottoman Turkish and the Ancient Greek ones. It is often true that the classical form of a language may have a more significant cultural influence (and prestige, if you like) on the world (or at least in large parts of the world), than myriad other modern languages and dialects that weren't "lucky" enough to have a long literary tradition. The typical examples, apart from the two languages of the 20-centuries old Roman Republic/Empire (yes, I'm counting its Eastern part, too), are Classical Hebrew and Classical Arabic; essentially, the languages that are used today in the Hebrew (he) and Arabic Wikipedias (ar), respectively, are just revived/varied forms of them. Since classical languages are still tought (from middle schools to colleges) in their respective spheres of influence, they are immensely more popular (in the sense that there is a large number of people who understands them) than any marginal constructed language, or regional modern dialect, and of course, than any other non-influential to the Modern world (except for the Academic community) language, such as Ugaritic and Elamite. Omnipaedista 10:53, 11 October 2008 (UTC)
Accordying to the respective Wikipedia article. we should add, at least, this prestigious ancient languages to your list Omnipaedista: Sanskrit and Classical Chinese. Crazymadlover
I'm very curious about everyone's opinion here on the "delicate" matter of Middle English and Middle Dutch (a form of Dutch famous for its full-fledged case system). Both are considered archaic (if not classical) versions of modern-day languages, they both have test-projects in the incubator (though dormant), and the ancestral form of one of them already has its own Wikipedia (ang). One can argue that their cultural significance and their educational usefulness are enough to justify the existence/opening of their corresponding Wikiprojects. On the other hand, another one can argue that, even if languages with no native speakers are allowed, the difference between some modern languages, such as these, and their recent (few-centuries old) archaic varieties isn't that great to justify their separate treatment. In my opinion, the issue of enm & dum is just a minor-scale instance of a more general problem encountered in certain European countries: obsolete/written vs. vernacular/spoken form of a language (see Norwegian language struggle & Greek language question). Now, historically, such a harsh struggle never really occured in the case of English and Dutch, since their vernacular counterparts easily prevailed, but this doesn't mean that there aren't people willing to form an editing community in a Wikiproject written in a more archaick versioun of an already existing one. Any comments? Omnipaedista 21:05, 11 October 2008 (UTC)
In some degree, we can say the same about Finnish. Crazymadlover.
I can't really comment on Middle Dutch, but I'd be wary of Middle English. Now, don't get me wrong: I would like to see a flourishing Middle English wiki, but I don't think it would do well. Middle English has a few problems that would need to be addressed.
Firstly, Middle English is a very broad range of dialects. the English language has undergone, for several reasons, a huge amount of change over the past 1500 years, much of it in the Middle English period: from what I have seen there is *far* more diversity between the dialects of Middle English than there is between the dialects of Ancient Greek. Now, a diverse range of dialects over a long period of time is fine if there is an agreed written standard with (more or less) agreed rules on spelling and grammar, like Attic for Ancient Greek or Old West Saxon for Old English; Middle English doesn't really have this. The nearest there is would be the English of Chaucer, which is close enough to Modern English to be read without much difficulty by a Modern English native speaker; indeed most English literature courses at British universities force students to read Middle English, which is done without a great deal of language teaching*. But although (or perhaps because) the nearest Middle English has to an accepted standard is so similar to Modern English, it has a lot of readers but very few writers. One might even compare it to a non-standard modern dialect of English - many can understand it, but few write in it. As has been seen, the Middle English test wiki has attracted few contributors, and I can't really see a proper wiki being much different.
Secondly, thanks in part to the dominance of French and Latin during the second half of the Middle Ages in England (thankfully, this is not the case for Old English), there has been very little serious prose (by "serious", I mean non-fiction written for non-artistic purposes) written in Middle English compared to many of the other, let's call them, "retired", languages which have been proposed as suitable candidates for a wikipedia. Therefore, I foresee problems when writing about academic subjects; more than when writing about them in Latin or Ancient Greek, and even Old English.
* I know that something similar could be said of Ancient Greek (or at least Classical Attic), which is very close to modern Greek (something which shows how much English has changed over the past 1000 years compared to Greek). However, there are some differences; Classical Greek is more often taught with prose composition and rules of spelling and grammar in mind; Middle English tends to be taught differently. And although I don't have any figures on hand, I can be quite certain that there have been far more non-Hellenophones who have learnt Ancient Greek than there have been non-Anglophones who have learnt Middle English, and who have therefore learnt it as a distinct language, completely separate from any other languages or dialects they know. To put it (hopefully) more clearly and less open to interpretation (lest someone take what i said the wrong way), nearly all people who know Middle English know another dialect which is very similar to it, whereas nowhere near all people who know Ancient Greek can say the same. LeighvsOptimvsMaximvs (talk)

Well, this is interesting. It means that apart from decoupling "Classical" from "Ancient", we should decouple "Artificially-Refined Archaic" from "Natural Classical having a standard version available". Languages such as Archaic Dutch whose case system is considered artificially constructed by "Batavophones" (please, confirm this, if you are from Netherlands and reading this), Mid. English (since the nearest it has to an accepted standard is too similar to Modern English), Neolatin (refined from Vulgar Latin), and Katharevousa. --Parenthetically, I remark that Katharevousa (it means "refined, cleansing") is a constructed form of Greek consisting of (re-introduced) Attic Greek morphology/lexicon mingled with calque-phrases and compound-words from 19th c. French; Dhimotiki on the other hand is the natural evolution from Koine, having a very diverged morphophonology (compared to its ancestral forms) and a lexicon heavily influenced by other Balkan languages. Eventually what became known as Modern Greek is just a "compromise" between these two, and that is why Anc. Greek are not so unfamiliar to a modern Grecophone (not because, there weren't many changes in Greek over the centuries)--. On the other hand, there are classical forms of Modern Languages that are "natural" (not constructed by 20th c. ideologically charged philologists) that have a more-or-less standard prestigious dialect that can be used as a canon: Old English, Gothic, Old Church Slavonic, Old Norse (?), Latin, Anc. Greek, Sanskrit, Pāli, Anc. Japanese, Anc. Chinese, and Ottoman Turkish (?). In the current policy all these three instances (I'll dub them Ancient, Classical, Refined) are treated exactly the same, despite their fine differences, allowing, thusly, the "flourishing" of many misconceptions about "dead languages". Of course, there is still a delicate matter with this new category of "Refined": languages such as Bokmål, Revived Prussian, and the Revived Brythonic ones don't fall easily into a category. Bokmål is perhaps the only known example of an "obsolete" European language that didn't "lose the battle" de jure, and so it is still in use and has wiki-projects in it. As for the Bryth. languages (such as British, Cornish, Pictish, and Cumbric) and Baltic Prus., though they are extinct, and have almost no descendants (unless we controversially count Breton/Welsh and Lithuanian), there are people (both amateurs and linguists) who are, currently, passionately reconstructing them (Old Prus. already has a test-project), and it is very likely that there will be requests for them, if the new policy passes, before we have clarified this point. Should we treat reconstructed Bryth. and Prus. the way we treat Volapük or the same as Klingon? Or if we don't consider them constructed, should we consider them Classical (Latin), Ancient (Hittite), or Refined (Neolatin)? Omnipaedista 13:01, 13 October 2008 (UTC)

As interesting as this discussion is (and I personally consider it absolutely fascinating), it is not relevant to the proposal. The proposal states that a classical languages widely studied and of major cultural significance are at least eligible in principal. Whether any particular language will actually get a wiki, however will depend on many of the fascinating questions asked above: How many people can express themselves fluently? What is its relationship to modern languages that already have their own wikis? Does it have an uninterrupted history of study and use (e.g. Latin) or was it extinct more many centuries until reconstructed by modern scholars (e.g. Ugaritic and Akkadian). Etc. All of these questions are important and relevant to the discussion of individual languages when they come up, but they are not relevant to this policy. Instead of having the policy dig into all of these questions, let's just let the community discuss things with common sense whenever a new language of this type is proposed. Dovi 13:36, 13 October 2008 (UTC)
OK, Dovi, you're right (admittedly, we, the "palaeoglots", got carried away a bit =). On my part, I just wanted all these issues to be raised here (at least, once), not because I believe this is the place to be solved, but in order for the new policy to be as fair and objective as possible. There is a whole spectrum of individual cases (never been mentioned before), that we ought to have in mind, if we don't want any "serotinous" complaints or unpleasant surprises, as was the case with the previous (but still valid) policy. Omnipaedista 15:30, 13 October 2008 (UTC)

Languages already supported without ISO Code 639-3

I add a sentences in initial proposal. the clause is for languages with active wikis, but which Code has been invented: voro, tarantino, cantonese, min nan, etc. you can not demand higher requirements for languages that already have the backing of WMF. Those will use the same invented code that is using. addtional problem: Imagine that they will use different codes for new projects, the technical chaos that would occur. Crazymadlover.

It is not acceptable. Codes should conform to the standards it is a requirement for using the standard. It also does not conform with the BCP and all the RFC's. Thanks, GerardM 07:50, 19 October 2008 (UTC)
then, what are suggesting you? a massive migration of previous projects?
for example:
Move to
because of new projects in Min Nan will have only "nan" code.
If we follow your criterion, for a coherent interwikifying, this is what we have do.
I am fine with the renaming of projects and using redirects or having a redirecting page. Most important is the renaming of the projects that are squatting on a wrong code. Thanks, GerardM 23:23, 19 October 2008 (UTC)

Approval Mechanism of Community draft

It seems that major issues has been solved. Please this section is to decide the approval mechanism of the present draft. Crazymadlover

Fell free to suggest the approval procedure:

On the one hand, we could have a vote. Here or elsewhere. We could re-open discussion at the Foundation-1 mailing list using the current version[9]. Or else, since it was already widely advertised and discussed, the Language Committee could take this as an acceptable, thought-out version with community consensus and simply adopt it. Dovi 20:41, 8 November 2008 (UTC)

3000 system messages!

My views on localization-as-a-prerequisite have already been stated above (regarding the requirement for the 500 most-used messages). Nevertheless, I consider it a reasonable compromise.

However, Pathoschild recently added a huge requirement for all subsequent languages: Over 3000! message translations (all messages + all extension messages used in Wikimedia). To my mind this is way beyond anything even remotely reasonable. Dovi 09:39, 25 November 2008 (UTC)

Pathoschild did no such thing. The requirement for the localisation of all MediaWiki messages and all the messages used by the Wikimedia Foundation has been part of the language policy for a very long time. The rationale is that once the most used messages have been localised, the work needs to continue with the other messages as it is in the interest of the success of the projects in the language that the localisation is maintained and expanded. When the first project has reached a certain maturity, it will be mostly applying finishing touches to meet the requirement. GerardM 10:06, 25 November 2008 (UTC)
Gerard, that was a huge change of the policy draft as it stood. I didn't actually revert the change (I wouldn't do that without discussion), but I made what it actually means on a practical level far clearer. I will insist that the approximate numbers remain in the draft so that people can evaluate it properly. Dovi 10:59, 25 November 2008 (UTC)
You are wrong. There is no change in the policy. It has been clear practically that a lot of work needs to be done for a subsequent project when the primary project did not work on the localisation. The amount of work grows as MediaWiki grows. GerardM 14:23, 25 November 2008 (UTC)
You are always so sweet... I wrote it was a change in the policy draft, and it was. Since you have decided unilaterally that numbers which may change should not be reflected in the draft of a policy, then I will entirely remove the link to a list of messages that is always changing. The policy should not depend on constantly changing lists collected outside of a Wikimedia project. Instead, the will put in language that the number of messages to be translated will be decided upon by the community. Dovi 15:08, 25 November 2008 (UTC)
The software that is used on the WMF servers is decided by the WMF. Betawiki only reflects these numbers and the status of the localisation. It is normal not to include numbers that are variable. Consequently, it is over the top to replace what is current practice without discussion. If at all you should get consensus first when you want to remove this. For your information, the compulsory nature of the localisation has improved the quality of the usability for many languages dramatically. Consequently your notion of localisation being a chore (true) that needs to removed (debatable) is where you have to find consensus. Thanks, GerardM 15:25, 25 November 2008 (UTC)

Gerard, this is a community draft. Even though overall it does not differ greatly in spirit or in practice from the current policy, it nevertheless does not have to agree on all points. There was rough consensus (or at least accepted compromise) between those who drafted this page on the localization requirements (see plenty of discussion above) until the recent moves by Language Committee members.

Since you (as usual) brook no compromise nor any real discussion, I will revert the page back to what it stated before Pathoschild's edit. If and when there seems to be consensus on this talk page that his move was correct then the change can be made again. If you revert unilaterally once again I will consider it in bad faith. I will not revert further, but I think your domineering and uncivilized conduct in everything related to language issues deserves specific community discussion, in this forum or in another. Dovi 15:35, 25 November 2008 (UTC)

Just check this out and you will see that you are wrong and not willing to listen either.. Sad. Thanks, GerardM 16:52, 25 November 2008 (UTC)

I was indeed wrong. The addition of full localization seems to have been a result of this message and this edit. I apologize to Gerard (sincerely) for the mistake.

It has indeed been in there for a while. In my opinion it goes far beyond what I thought was the talk page compromise, namely that "most used messages" would provide a reasonable test for the viability wiki. But apparently I didn't keep up with the text well enough on this point, and for that I can only blame myself. Sorry Gerard. I will leave the text as it is unless there is further support here at the talk page (best in this section) for lesser requirements. Dovi 04:13, 26 November 2008 (UTC)


Given discussion over the past few months on the mailing lists, I suggest that the policy also contain a commitment to transparency. Unfortunately, there are people who are disappointed at the lack of complete transparency in the language process.

All discussion of each specific language proposal based on this policy, both community discussion and Language Committee communications, will be done in an open and transparent manner. No aspect of such discussion will be kept private.

Transparency is closely related to accountability. Accountability is essential for anyone who takes responsibility for community affairs. Among other things, it means not being able to say "I'm right because I'm right."Dovi 10:56, 25 November 2008 (UTC)

Implemented. Crazymadlover
There are known reasons why the communications on the language committee list are not public. The question is do we lose expertise or do we restrict public access. Sadly we have this quandary. Thanks, GerardM 18:38, 25 November 2008 (UTC)
Subcommittee discussion is archived to Language subcommittee/Archives, excepting two members who have not agreed to public archival. —Pathoschild 17:04:43, 25 November 2008 (UTC)
I had hoped for discussion before actual implementation... This is indeed about the two members who have not agreed to public archival. For a community draft, the question is whether the community accepts that lack of agreement or not. Let's see if there is any support for adding to the text or not. Dovi 04:17, 26 November 2008 (UTC)
A clause of transparency has been added at langcom page. --- Crazymadlover.
That section has been there all along; the issue Dovi raised is that two members do not agree to transparency. —Pathoschild 10:16:15, 18 December 2008 (UTC)
Dovi, i think this is not the correct place to discuss that. The langcom talk-page is. Crazymadlover.

Any objection against Latin?

A proposal for Latin wikinews has been recently rejected [10]. Can anybody please argue why Latin should not be included? If everybody agree that Latin deserves to be represented, why not to change the current policy to allow Latin? For example, it has been proposed to change native speakers requirement to fluent speakers.--Nxx 02:35, 4 December 2008 (UTC)

The main objections are that classical languages can't have wikis other than 'quotes and 'sources, and that they can't render modern concepts. The first one can be very easily be rejected once the current draft gets adopted. The second one is i.m.o. de facto rejected, as well: Vicipaedia proved that Contemporary Latin (using lexical input from Vulgar Latin, Neolatin, Classical Greek, and the various modern Romance Languages, modified in order to fit its Classical Latin core) can express pretty much everything. So, if you are among the people who want Vicinuntii project to open, all you have to do is to help promoting the community draft; provided that it gets accepted and that the test project on the incubator is successful (it has a healthy editing community, that is), there will be virtually no reason for a new Wn/la-proposal to be rejected. Omnipaedista 15:57, 6 December 2008 (UTC)