Jump to content

Wikilegal/Lexicographical Data

From Meta, a Wikimedia project coordination wiki

Background

Lexicographical data refers to the information about a word provided in a dictionary, like meanings and pronunciations. This data is usually arranged in alphabetical order. The words are arranged and alphabetised in order to compile dictionaries, which is also called lexicography. This article examines the components of lexicography which might be subjected to copyright protection.

In a dictionary there are two main structures called the macrostructure and the microstructure. The macrostructure of a dictionary refers to the gross structure that lays out what is included in the dictionary, like the preface, table of contents of the dictionary, and similar overall organization, which varies from author to author. The microstructure refers to the individual entries under the macrostructure, for example, a word list of the dictionary where the author chooses which specific words to include or exclude. Within an individual definition of a word, there is lemmata which refers to the heading of a word, and is usually bolded. An author can also choose to provide context to meanings (such as how a word is used, or whether the word is archaic), which is called the pragmatic information in a dictionary. From a copyrightability perspective, the lemmata and pragmatic information can be thought of as an approach to express factual data.

As we detail below, dictionaries generally lack copyright in the choice of word arrangement, but the choices in what to include in the dictionary and its visual arrangement (the macro and microstructure) may be copyrighted, which means that the overall average dictionary probably cannot be photocopied. Many aspects within the dictionary are not copyrightable, however, and can be individually copied freely, while others may have copyright for a specific element. In other words, there are many parts of a dictionary that can be freely copied as factual information, but one generally cannot make an identical copy of the whole thing.

General rule of copyrightability of facts

Under the US Copyright regime and other similar regimes around the world, facts are not copyrightable. The principle that facts are not protected was discussed in a landmark case, Feist Publications v. Rural Telephone Services. The decision impacted the way copyright protection is understood at present. The principle stems from the Copyright regime’s underlying intention that an author’s creative and original expression must be awarded in the form of protection of that work, but working hard to gather or compile information does not receive copyright protection just for the effort. This follows that even though facts may not be copyrightable, the creative expression of those facts can be copyrighted. Facts can be expressed in multiple ways, and one of the ways is to arrange factual information in a creative manner.

Copyrightability

Most lexicographical data is factual in nature and, therefore, cannot be copyrighted as a standalone work. However, an author of a lexicographical work (i.e. a dictionary) can claim copyright protection on the organization or other unique and creative choices they make to present the data. These organizational and choice elements would be called the “expression” of the data under copyright law. Thus, the core question when looking at the elements of a dictionary, is whether an element depends on the author’s creativity or not. If it does, it may be copyrighted, but if it doesn’t, then it’s not possible to copyright that element. In a typical dictionary, this means that the author’s way of writing the meaning of words will be copyrighted, but other elements such as the word list, the part of speech or grammar, and the pronunciation guide will not be copyrighted.

The organization of words in alphabetical order typically will not be creative enough to be copyrighted, barring some very unusual choice or arrangement by the dictionary author.[1] To understand whether a compilation or arrangement of factual data can be copyrighted or not, arrangement theory is used. According to arrangement theory, the manner in which factual information is arranged decides whether the work will be awarded copyright protection. Creative arrangement will lead to copyright protection, while uncreative arrangements will not. The US Supreme Court, in the case of Feist Publications v. Rural Telephone Services, has held that a telephone directory cannot be copyrighted as it lacks a creative element. Because a dictionary is arranged using the same scheme as a telephone book, it typically will not be creative enough to receive copyright protection.

A literary work is defined under the US Copyright law as “works, other than audiovisual works, expressed in words, numbers, or other verbal or numerical symbols or indicia, regardless of the nature of the material objects, such as books, periodicals, manuscripts, phonorecords, film, tapes, disks, or cards, in which they are embodied.”[2] Under this broad definition, a meaning of a word can be considered a literary work and, therefore, it can qualify as copyrightable. Copyright protection will depend on the manner in which the author chooses to define a word and if an author defines a word in a creative manner, there can be a copyright on that definition. An analogy discussed by the Supreme Court in the Feist case was that of census takers. Just like census takers do not create data and simply record it, dictionary authors also do not “create” meanings. But the thin line that differentiates these two instances is that an author of a dictionary can exercise creativity in order to write meanings of a word. This means that other people can paraphrase a dictionary, however, what they cannot copy is the exact wording of creative definitions.

For example, in the case of James T Richard v. Merriam Webster, the Justice held that a dictionary is a result of creative processes that reflects the choices and opinions of the Dictionary’s developers.[3] Therefore, verbatim copying of an entire dictionary can constitute infringement.

A meaning can be copyrighted whenever the author exercises creativity, however, there are two instances when a meaning may not be copyright protected.

One of the scenarios is when a meaning is so short that creativity for copyright protection is not possible. Under the US Copyright regime, the question whether short phrases can be copyrighted is an ongoing debate, however, creativity and originality play a major role in deciding copyrightability. In a dictionary, meanings and other small aspects can sometimes be deemed to be short phrases for the purposes of copyrightability. This can act as a major defense to infringement because short phrases are not generally thought of as creative.[4] Having said that, a definition of a word, in most instances, is not as short as a phrase like “Let’s go Thunder”[5] (which was not copyrightable) so dictionary definitions are not a clear case of never being creative under the short phrases exception. Prominent authorities in Copyright law, Melville B. Nimmer & David Nimmer, have also discussed cases wherein the copyrightability of a phrase like “Let’s go Thunder” was challenged, and concluded that exercising creativity in smaller phrases can be challenging. In the case, Syrus v. Clay Bennett, the Tenth Circuit, ruled that an ordinary phrase cannot be creative for the purposes of copyright protection. An ordinary phrase is used to define small phrases which a person might use in their day- to- day conversations.

The second scenario is when an author tries to define a technical jargon word or a similar fixed or standard expression. Fixed expression refers to words and phrases which are defined in a specific manner and are commonly used and defined together. An author cannot claim copyright on the meanings of these words because of their nature. This is likely true in other countries besides the United States, which use similar rules for determining the level of creativity necessary to obtain copyright.

Copyright in a word’s meaning will depend on these factors. Some words may not have copyrightable definitions because they are ordinary or a fixed expression while others will be creative enough to be copyrighted. It will, therefore, be a better practice to not copy a dictionary verbatim and choose words according to their nature and the way they are defined commonly.

Copyrightability of other elements in a dictionary

As discussed above, there are other essential aspects of a dictionary as well, such as the lemmata, layout, pragmatic information, collocations and typology. Some of these aspects are copyrightable and some of them are not. The rule to judge copyrightability of these parts is the same, whether expression involves any creativity and original elements.

A lemmata is the bolded heading before the meaning of the word and cannot be copyrighted for two reasons. Firstly, it is a standard practice for authors to include the pronunciation and annotation of the word being defined and secondly, an author does not “create” this information. Similarly, collocations refer to a type of words which are always used together. For example, heavy rain cannot be defined as powerful rain by an author. Therefore, an author cannot exercise creativity and must follow the standard rule.

On the other hand are the aspects of a dictionary, such as the layout, pragmatic information and typology which are copyrightable in nature. An author exercises discretion in the way a reader will view the various elements associated with a word, and in configuring the artistic layout of a dictionary.

Conclusion

There can be a copyright on the definitions of words as long as they are creative. In many cases, it is safe to assume that a developer or an author would be meeting the creativity requirements, so a photocopy of an entire dictionary would most likely violate copyright. In particular, the likelihood of the definition of many words being copyrightable is high.

The table below, primarily based on a similar chart from Thierry Fontenelle’s paper “From Lexicography to Terminology,”[6] lays out various elements of lexicography, as discussed above. It gives a better understanding on what in a dictionary may be copyrightable, before someone copies a work which uses protectable lexicographical data. There is also a set of term definitions after this table to help with understanding the terminology.

Terms and Lexicography
Copyrighted Not Copyrighted
Microstructure and Macrostructure (unless required to be a certain way for a particular language so that the author has no choice about the arrangement) Lemmata
Definitions with room for creativity Definitions of jargon or words with fixed expressions
Pragmatic information Grammatical information
Encyclopedic information and example sentences Collocations and fixed expressions
Word in the Table: Meaning:
Microstructure Microstructure is the layout and arrangement of each entry within each category. For example, choosing to present the word bolded, followed by a pronunciation guide first, then grammatical info, then the definition, and then choosing to use an example sentence as.
Lemmata Heading of the definition, usually the bolded part in the dictionary, including- pronunciations and annotations.
Macrostructure Gross structure of a dictionary (not the arrangement in an alphabetic manner). For example, the structure of inserting the table of contents, followed by entry lists and so on.
Collocations Collocations are words which go together and have to be defined together in a dictionary. For example, heavy rain and it cannot be phrased powerful rain.
Pragmatic information Providing context to the meaning of a word. It tells the reader how the author wants to communicate meanings.
Fixed expressions Terms and expressions that are essential to a dictionary and common to all dictionaries. For example, “All of a sudden”- can only be defined in a specific manner, which is common to all dictionaries.

References

  1. Feist Publications v. Rural Telephone Services, Page 18, [***380].
  2. 17 U.S.C.A. § 101, Definitions, Literary Works. See https://www.law.cornell.edu/uscode/text/17/101.
  3. James T Richard v. Marriam Webster, 55 F.Supp.3d 205.
  4. Melville B. Nimmer & David Nimmer, Nimmer on Copyright, § 2.01[B].
  5. Syrus v. Clay Bennett.
  6. Thierry Fontenelle, From Lexicography to Terminology: a Cline, not a Dichotomy, Translation Centre for the Bodies of the European Union, Luxembourg.