European Commission copyright consultation/Data mining

From Meta, a Wikimedia project coordination wiki
Disabilities European Commission copyright consultation
Text and data mining
User-generated content


The European Commission is considering modernizing European copyright laws. To get feedback and input on this modernization, the Commission has published a series of questions, and is looking to interested stakeholders (like our community) to answer them. This is a vital opportunity to participate in a dialogue that could have a major impact on copyright laws and the future of the free knowledge movement. More background is available from the European Commission.

We would like to prepare a draft response here, as a collaborative experiment. If we wish to respond, it will need to be finalized before the end of January 2014 (see the proposed timeline).

Welcome to the discussion! Please help by answering the questions below.

Text and data mining[edit]

Text and data mining/content mining/data analytics[1] are different terms used to describe increasingly important techniques used in particular by researchers for the exploration of vast amounts of existing texts and data (e.g., journals, web sites, databases etc.). Through the use of software or other automated processes, an analysis is made of relevant texts and data in order to obtain new insights, patterns and trends.

The texts and data used for mining are either freely accessible on the internet or accessible through subscriptions to e.g. journals and periodicals that give access to the databases of publishers. A copy is made of the relevant texts and data (e.g. on browser cache memories or in computers RAM memories or onto the hard disk of a computer), prior to the actual analysis. Normally, it is considered that to mine protected works or other subject matter, it is necessary to obtain authorisation from the right holders for the making of such copies unless such authorisation can be implied (e.g. content accessible to general public without restrictions on the internet, open access).

Some argue that the copies required for text and data mining are covered by the exception for temporary copies in Article 5.1 of Directive 2001/29/EC. Others consider that text and data mining activities should not even be seen as covered by copyright. None of this is clear, in particular since text and data mining does not consist only of a single method, but can be undertaken in several different ways. Important questions also remain as to whether the main problems arising in relation to this issue go beyond copyright (i.e. beyond the necessity or not to obtain the authorisation to use content) and relate rather to the need to obtain “access” to content (i.e. being able to use e.g. commercial databases).

A specific Working Group was set up on this issue in the framework of the "Licences for Europe" stakeholder dialogue. No consensus was reached among participating stakeholders on either the problems to be addressed or the results. At the same time, practical solutions to facilitate text and data mining of subscription-based scientific content were presented by publishers as an outcome of “Licences for Europe”[2]. In the context of these discussions, other stakeholders argued that no additional licences should be required to mine material to which access has been provided through a subscription agreement and considered that a specific exception for text and data mining should be introduced, possibly on the basis of a distinction between commercial and non-commercial.

Question 53[edit]

53) (a) [In particular if you are an end user/consumer or an institutional user:] Have you experienced obstacles, linked to copyright, when trying to use text or data mining methods, including across borders?

(b) [In particular if you are a service provider:] Have you experienced obstacles, linked to copyright, when providing services based on text or data mining methods, including across borders?

(c) [In particular if you are a right holder:] Have you experienced specific problems resulting from the use of text and data mining in relation to copyright protected content, including across borders?

Yes[edit]

  • Your name here

No[edit]

  • Your name here

No opinion[edit]

  • Your name here

Comments[edit]

Instructions: If yes or no, please explain.

  • ...

Proposed Foundation answer[edit]

I respectfully disagree with the premise of the Free Knowledge Advocacy Group EU's answer, so propose the following significant change. — LVilla (WMF) (talk) 03:44, 31 January 2014 (UTC)[reply]

The Wikimedia projects demonstrate both the risks of current data mining policy in the EU, and the success of the rest of the world's policies on databases and database rights.
The risks to us stem from the complete uncertainty around database and data mining rules in the EU. This makes it extremely difficult for the communities who are creating our data sources (such as Wikidata and Wikipedia) to understand when they can or cannot use a given data source, as facts that seem unprotectable on their face may implicate other rights that are not obvious from the data themselves.
On the flip side, Wikipedia (and soon Wikidata) are some of the most widely mined and analyzed data sources on the planet. This has occurred because of our commitment to making this information freely available, and demonstrates that creativity and innovation are compatible with a scheme that reduces barriers to participation rather than increasing "protection".

Question 54[edit]

54) If there are problems, how would they best be solved?

Responses[edit]

[Open question]


Proposed Foundation response[edit]

I respectfully disagree with the premise of the Free Knowledge Advocacy Group EU's answer, so propose the following answer, based on Kaldari's comment above and the Creativity4Copyright answers. — LVilla (WMF) (talk) 03:47, 31 January 2014 (UTC)[reply]

The EU should avoid creating new rights to protect previously unprotectable information, like the database and suggested data mining rights. Instead, legislation should provide a formal clarification that data mining (and databases) is not prohibited by copyright, and that contracts and technical protection measures cannot be used to override that position.

Question 55[edit]

55) If your view is that a legislative solution is needed, what would be its main elements? Which activities should be covered and under what conditions?

Responses[edit]

[Open question]

  • ...

Proposed Foundation position[edit]

I respectfully disagree with the premise of the Free Knowledge Advocacy Group EU's answer, so propose the following significant change, based in part on the C4C answer. — LVilla (WMF) (talk) 04:02, 31 January 2014 (UTC)[reply]

As noted above, text and data mining should be specifically excepted and allowed, and the database directive should be repealed. In addition, TPMs and contracts should not be allowed to override these statutory decisions. This would ensure that vast amounts of information would be broadly available to the public and to researchers, which Wikipedia's experience shows will lead to a variety of new uses and means of delivery.

Question 56[edit]

56) If your view is that a different solution is needed, what would it be?

Responses[edit]

[Open question]

  • ...

Proposed Foundation answer[edit]

Based on the Creativity4Copyright suggestions, I propose the following answer for the official Foundation response: —LVilla (WMF) (talk) 02:51, 31 January 2014 (UTC)[reply]

Only a legislative approach can solve the issues faced. See our answer to questions 54 and 55.

Question 57[edit]

57) Are there other issues, unrelated to copyright, that constitute barriers to the use of text or data mining methods?

Responses[edit]

[Open question]

  • There are privacy concerns with data analysis. Lots of information about persons can be found from data analysis. There should be strict precautions to prevent human rights to be violated with data analysis. --NaBUru38 (talk) 14:40, 11 January 2014 (UTC)[reply]
  • ...

Proposed Foundation answer[edit]

Based on the comments above and the Creativity4Copyright suggestions, I propose the following answer for the official Foundation response: —LVilla (WMF) (talk) 04:06, 31 January 2014 (UTC)[reply]

A variety of problems further complicate use of text or data mining methods. Lack of clarity around privacy rules for data related to individuals, use of contracts and technical protection measures to impede legally-authorized access to information, and the use of proprietary or patent-encumbered data formats all can help reduce the promise of data mining.

References[edit]

  1. For the purpose of the present document, the term “text and data mining” will be used.
  2. See the document “Licences for Europe – ten pledges to bring more content online”: http://ec.europa.eu/internal_market/copyright/docs/licences-for-europe/131113_ten-pledges_en.pdf .