File format policy

From Meta, a Wikimedia project coordination wiki

Original:File format policy[edit]

Whereas an essential part of the Wikimedia Foundation's mission is encouraging the development of free-content educational resources that may be created, used, and reused by a diverse community, without restriction, and because we believe that this mission requires thriving open formats and open standards on the web to allow the creation of content not subject to restrictions on creation, use, and reuse, it is resolved that all material, text , multimedia, or software, on Wikimedia Foundation projects must be in a format that is:

  1. Viewable or playable by existing free software tools
  2. Able to be created or edited by existing free software tools.
  3. Defined by an open standard, implementation, or specification not under proprietary control
  4. Not itself subject to material patent-related restrictions on use that are incompatible with free software, nor only able to be authored or viewed by software so restricted.
  5. Not encrypted or otherwise subject to technical protection measures incompatible with the permissions of free content licensing.

where "free software" is software under any licensing terms that meet the Free Software Definition.
where an independently-used subset of the format meets these criteria, even if some files in that format do not (as with PDF and encrypted PDF), files in that subset qualify as acceptable formats under the text of this resolution.

Reworked:File format policy[edit]

Whereas an essential part of the Wikimedia Foundation's mission is encouraging the development of free-content educational resources that may be created, used, and reused by a diverse community, without restriction, and because we believe that this mission requires thriving open formats and open standards on the web to allow the creation of content not subject to restrictions on creation, use, and reuse, it is resolved that all material, text, multimedia, or software, on Wikimedia Foundation projects must be in a format that is:

  1. Viewable or playable by existing free software tools
  2. Able to be created or edited by existing free software tools.
  3. Defined by an open standard, implementation, or specification not under proprietary control
  4. Not known to be currently or eventually subject to material patent-related restrictions on use that are incompatible with free software, nor otherwise only able to be authored or viewed by software so restricted.
  5. Not encrypted or otherwise subject to technical protection measures incompatible with the permissions of free content licensing.

where "wikimedia projects" are major projects hosted by Wikimedia Foundation (http://wikimediafoundation.org/wiki/Our_projects), Wikipedia, Wiktionary, Wikinews, Wikiversity, Wikibooks, Wikisource, Wikispecies, Wikiquote, Wikimedia Commons, and Mediawiki

where "free software" is software under any licensing terms that meet the Free Software Definition

where an independently-used subset of the format meets these criteria, even if some files in that format do not (as with PDF and encrypted PDF), files in that subset qualify as acceptable formats under the text of this resolution.

comments[edit]

4. Not itself subject to material patent-related restrictions on use that are incompatible with free software, nor only able to be authored or viewed by software so restricted.

Problematical because it fails to make clear that such any patents that have had restrictions waved for the time being need to have had their restrictions waved until the patent expires.

The thing that jumps out at me is the unqualified use of "must". This policy would make it impossible to use content for which there are no free formats (not that I can think of any examples of such content at the moment). Is that intentional? A "where possible" could be added to get around it if it's not intentional. (I'm undecided on whether it would be good to completely ban such material or not.)

5. Not encrypted or otherwise subject to technical protection measures incompatible with the permissions of free content licensing.

Is this the anti-DRM clause? I think so, but I just want to confirm.
Is it right to include "software" in relation to this? Does it mean software that runs on our servers? (Because the projects don't really host software, except for mediawiki.)

you first need to define what you mean by "Wikimedia Foundation projects". Is it only Wikipedia, Wikinews, etc? Or does such a policy also apply to http://wikimediafoundation.org/ and meta (which both have powerpoint slides on), the office wikis which I'm assuming have MS Word documents on, and things like the Toolserver which is stuffed full of non-free software.


Looking at the revised version, some thoughts that immediately occur:

  1. This needs to make clear whether it applies to only actual projects, all publicly-viewable wikis, or all wikis. It does: definition of "wikimedia projects"
  2. It doesn't allow the use of non-standardized fallback formats where a standard format is not easily usable for some of our viewers. Would it be inappropriate, for instance, to use <canvas> plus an ActiveX control to emulate it for IE?
  3. Can we use non-standard ad hoc microformats such as rel="nofollow" or X-Forwarded-For? (Obviously, "no" is not the right answer!) There are a lot of de facto standards that aren't under anyone's control at all that we rely on heavily, and that in fact everyone relies on heavily. If point 3 is kept at all, it needs to focus on free and interoperable implementations, not formal recognition by a standards body. Or if it does emphasize formal recognition by a standards body, it needs to say "where appropriate" or "if possible" or some similar dodge. Greg points out to me that it doesn't say "open standard", it says "open standard, implementation, or specification", so anything with an open implementation is fine.
  4. There's at least one non-standard data format we use extensively that has only a single full implementation (although admittedly a GPL one): wikitext. There may be others too. If the board wants to make wikitext specification and regularization a high priority, that would surely be a good thing, but the current wording would ban it outright. Again, point 3 is much too broad in practice. See above.
  5. "Not encrypted" is poor phrasing. It makes it sound like HTTPS might be against the rules.  :) The rest of the point clarifies the intent, but the wording should still be improved to begin with. Perhaps "encrypted or otherwise" should just be dropped from point 5.
  6. What does it means to "be in a format", for our purposes? Does that mean that this restricts the storage format (seems irrelevant to the stated goal), that it requires everything that's made available to be made available in at least one free format, or that it requires absolutely nothing to be made available in non-free formats at all? The difference between the latter two points is significant (see point 2 of mine above, and other remarks on the mailing list). Also, does it make any distinction between things that are provided ephemerally, only likely to be used for viewing the site, and things that are likely to be downloaded and redistributed? I think that an ActiveX fallback for <canvas> would be unobjectionable, while an option for MP3 downloads of audio files (for users without OGG players) would be more objectionable: one is contributing to the widespread circulation of proprietary formats a lot more than the other, even if both increase compatibility and usability. —Simetrical (talk • contribs) 20:37, 19 October 2008 (UTC)[reply]
  7. Perhaps the word open in point 3 needs definition. Only OSI-style openness, or is full and public specification enough? Presumably Flash is not an "open" implementation just because it's freely downloadable, but also presumably ISO standards are meant to be "open" despite the fact that they can't even be freely reproduced, let alone modified.
Also, PDF is not necessarily a free format by any reasonable standard. To the best of my knowledge, it's maintained by Adobe, not any standards body; its patents are waived only if you properly implement it (no changes allowed: not up to OSI standards); and its patents are waived revocably (I think). We do not, in fact, currently allow PDF uploads of any kind on public wikis, as far as I know: we use DjVu for scanned documents, and HTML for general text use. (Although Wikibooks people have been asking for PDF export, I seem to vaguely recall.) It might be "free enough" that we'd be willing to use it, given how common and useful it is for some purposes, but it's probably a bad example anyway. —Simetrical (talk • contribs) 20:43, 19 October 2008 (UTC)[reply]
Well, okay, actually it's an ISO standard now, I had forgotten that. But it's still heavily patent-encumbered in theory, and in practice not quite unencumbered either (you must obey their developer policies to use it). Plus it has dependencies on particular non-free fonts, as I understand it. It's not a free format the way something like HTML or OGG is, anyway. —Simetrical (talk • contribs) 21:05, 19 October 2008 (UTC)[reply]
(modified various points based on re-reading it and some feedback from Greg —Simetrical (talk • contribs) 15:37, 20 October 2008 (UTC))[reply]