Wikilegal/A changing legal world for free knowledge

From Meta, a Wikimedia project coordination wiki

When the Wikimedia Foundation board passed a resolution in 2009 about the biographies of living persons, there were very few things that were legally required to host Wikipedia. Biographies were important because false information about someone specific could hurt them, and might lead to a defamation case against an editor in a really bad situation. The Foundation had duties to handle copyright violations under the DMCA and a very rare duty to handle criminal matters (primarily people occasionally trying to upload illegal child sexual content).

Fast-forward to 2023, and things look quite a bit different! While biographies and copyright remain the main source of issues, the range of laws, the expected role of the website host, and the kinds of things countries around the world will do to enforce their laws are quite different than they used to be.

This essay aims to give the reader an overall understanding of what legal duties and risks apply to any entities that host a major global website (and particularly one that’s primarily made of user-generated content) in 2023. It covers in turn 1) why more laws of more places apply to web hosts, 2) A non-comprehensive look at the sort of legal rights that apply to article subjects, and 3) a similarly non-comprehensive look at rights available to users and other legal compliance obligations for website hosting besides article subject complaints.

Applicable law[edit]

The law that applies to a given website is not always consistent or fully clear, and entire law school courses are regularly taught on how to determine applicable law in an international context. For the sake of this essay, we look at two aspects: Jurisdiction and Enforcement. Defined roughly, jurisdiction refers to the willingness of the courts in a country to hear legal cases, while enforcement refers to the capabilities that the courts and government agencies of a given country have to actually compel obedience to their decisions. Jurisdiction can be thought of as “the will to do something” while enforcement can be thought of as “the power to do something.”

Jurisdiction[edit]

Starting with jurisdiction then, traditionally jurisdiction was based on physical geography. If someone were running a business in Kentucky in the United States and they ordered something from Colombia and then wanted to file a lawsuit when it never arrived, there might be some complexity as to whether the matter could be heard in Kentucky or Colombia, but those would be the only two options on the table. If the Kentucky resident decided they wanted their case heard in the UK, the UK courts would dismiss the matter for lack of jurisdiction over the parties to the case.[1]

Initially, internet jurisdiction was based on the location of company HQs and a question of whether the company hosting the website had engaged in some sort of activity to “purposely avail” themselves of another place.[2] Absent some kind of specific targeting activity, simply hosting a website accessible globally did not mean that you could be sued globally, which was a reasonably effective way to allow for people to feel safe following the law where they lived without needing to learn the law of every state on the planet.

However, over the years, most judges around the world have expanded the idea of “targeting” to include very general characteristics. For example, France generally accepts jurisdiction over legal disputes involving any material written in French as targeted at France, despite the fact that French is spoken officially in twenty-eight different countries (presuming the Wikipedia article on the topic is up to date as of this writing). Many judges have begun using the fact that a plaintiff indicates they were harmed as the basis of jurisdiction: thus the fact that a website like Wikipedia has an article about someone can be enough in many countries for their courts to decide they have jurisdiction if the article subject resides there.

Finally, a few recent laws bypass the traditional judicial jurisdiction issue entirely by including special provisions. These include the GDPR[3] in the EU, the DSA in the EU, India’s 2021 IT rules, Indonesia’s MR5 regulation, and likely several others, with more expected to come. Many of these laws base jurisdiction on a website having users within the territory the law covers, or on the website being accessible to make a user account to people within the territory. These laws greatly expand the traditional scope of internet jurisdiction and are unsettlingly close to the rule that simply hosting a website accessible around the world does make you subject to the laws of all the countries in the world.

Enforcement[edit]

While jurisdiction is all well and good, without enforcement power of some kind, a country’s courts that hear a case can only make recommendations to websites and hope that they will agree to comply voluntarily.

Enforcement is somewhat more ambiguous than jurisdiction. Historically, as with jurisdiction, it related to geographic territory. If you had your property (whether it be land, money, or physical goods) in a place, courts in that place could order it to be taken if they had heard a case against you and determined they had jurisdiction. If you were physically in a place, courts could order you arrested. These remain, even in the technology era, roughly the first port of call: places where money, equipment, or senior staff of a company are physically present tend to be places where that company needs to be conscious of the law because those things are at risk if they ignore the law of such a place. This also means that the more activities a company carries out in more places in the world, the more they must be conscious of the laws in those places.

What some countries have more recently added to their enforcement tools are website blocking and risks to users of websites. The former is fairly well-known and documented in the case of Wikipedia, although threats of blocking have increased in 2022-23, and several of the newer jurisdictional laws linked above also contemplate blocking as a potential penalty for website hosts that refuse to follow them.

Risks to users are somewhat more vague and can take several forms. In some cases, these are legal risks, typically accompanied by obligations for the web host to disclose information necessary to identify and sue a user, with the latter data obligations themselves being the subject of a jurisdictional and enforcement analysis. For example, Indonesia’s MR5, linked above, contains a requirement for websites subject to it to allow Indonesian law enforcement to access their data upon request. This has been widely criticized because of the risks it creates for the harassment or intimidation of users.[4] In other cases, there may be risks of harassment or reputational risks more generally, for example via doxing or hounding, or through questioning by government authorities if the user’s real identity can be found.

Combined, these broader jurisdiction and enforcement risks make it more difficult to effectively host a website in 2023, requiring more resources to protect users and to protect the reliability, accuracy, and quality of content hosted on a site.

Complaints by article subjects or recipients[edit]

The nature of complaints has also broadened in some important ways. Historically, most major websites were headquartered in the US and protected by its laws, jurisdiction was limited, and the US was the primary country regulating the internet. Because of the protections provided under US law, as noted at the start of this essay, for a long time, copyright was the primary issue for hosting websites: the host needed a DMCA address and a way to remove copyright violations. Everything else was not the website host’s problem[5] or was extremely rare in comparison to overall user contributions.

The right to be forgotten[edit]

However, several laws and court rulings have expanded the subject matter and appropriate evaluation for article subjects. Arguably the most significant has been the right to be forgotten and similar laws. While originally applying only to search results, the idea of the right to be forgotten has been applied in several contexts, including requiring some newspapers and archives to actually remove names or personal descriptions from digital copies of their archival material. In many cases, plaintiffs and courts around the world have combined the claims of defamation and privacy, offering an argument that reads something like “this material is false and hurts my reputation. Further, even if it’s not false, it’s too old to be relevant anymore and the harm to my reputation far outweighs the benefit to the public.” We have seen this type of claim with increasing frequency in cases related to Wikipedia, and courts have increasingly been sympathetic to it.[6] As Wikipedia continues to age and some articles become less relevant or their sources disappear from the internet,[7] it will be valuable for Wikipedia communities to have a process to evaluate article subject complaints for age of article, quality of still existing sources, and public importance in order to head off lawsuits of this nature. One could phrase this question as: are there times when notability expires?

“Public Order” and similar claims of government information control[edit]

Some countries have begun to increasingly view the internet as a source of potential discord and social unrest and have passed content laws that give the regulators of that country the power to censor information. India’s IT rules, linked above, include a clause that the government can order content takedowns to protect the sovereignty of India. The EU-wide rules on preventing the dissemination of terrorist content online give some European authorities (many still being decided on as of this writing) the power to order rapid takedowns of content they deem to be terrorist content if it does not fit under exceptions for education and research.[8]

It is likely that the quality of these types of laws depends in large part on the quality of the agencies and courts enforcing them. If they are used very sparingly and can be effectively challenged when used inappropriately, these laws may be an appropriate exercise of government authority. On the other hand, if they are used indiscriminately and the courts are too deferential to whatever government agencies consider harmful, they will likely become a tool of inappropriate censorship. Wikipedia’s block in Turkey is a good example: Lower courts in Turkey ruled in favor of the Turkish government very quickly and, although the Turkish Constitutional Court ultimately ruled in favor of Wikipedia being accessible, it took 2.5 years to do it, during which time Wikipedia remained blocked.

What this all means is that hosting a website in 2023 means being contacted by an increasingly large group of countries whose law enforcement authorities are monitoring information online and demanding takedowns of material they deem dangerous.

Legal Compliance[edit]

The other way that the laws around website hosting have changed is that there are now substantially more rules to follow, many of which are not related to receiving takedown and data demands. Many countries around the world have implemented additional requirements around online product design, user account data, and recently general risk analysis. Many of these requirements are useful, help protect public safety, or help ensure that security and privacy are protected for users generally. In aggregate, they represent a significant new aspect of hosting a website with many users that requires new and different staffing and resources than in the past.

As examples, the privacy laws of most regions of the world as of this writing offer some kind of access and/or deletion right for the subjects of data. This means that hosting a website with user accounts requires having (and staffing) a system to answer requests from people about what personal data is held about them and being able to either delete it upon request or explain why it can’t be deleted. App stores are also increasingly requiring this sort of structure contractually, requiring that any app in the app store have various data and deletion options. This is broadly good for everyone, but does make it more expensive to host a website because of the volume of legal requests it generates.

Product design is increasingly subject to similar legal requirements: new features require various types of legal analysis to determine if they comply with relevant laws around data collection and use. Newer laws are also particularly focused on automation tools (some based recently on concerns around generative AI, but some of the older ones simply worried that running too much automation without a method to catch and fix mistakes is bad practice that will unintentionally harm people).

Lastly, the DSA, linked above, is an entirely new legal system that requires broad risk assessments for the benefit of society by the services identified under it as very large platforms. The Foundation has written about what this means for Wikipedia. As these are new rules and will be subject to an entirely new auditing process, we are hopeful that they will produce beneficial ideas for mitigating present and future risks, but we are also aware that there may be some kinks to work out in the first few iterations of the process.

Conclusion[edit]

The aim of this essay is to help improve general understanding of the legal requirements and legal work that go into hosting a website with user-generated content and lots of users. Comparing 2023 to the early 2010s, what we see is that hosting a site is likely to result in having to pay attention to more countries around the world than a decade prior, even for the same activity. We also see that there are new legal claims, especially around the right to be forgotten, that can impact the legality of what is written online about a person and may require new evaluations of whether older content is still reliable and important for the public. And lastly, we’ve seen that the legal compliance to maintain and develop a website has increased, with new requirements going into hosting user accounts, designing software, and managing societal risks.

As a whole, many of these developments are concerning, though some also offer a promise of improved fairness and transparency across multiple platforms. We hope that this information is helpful for readers in understanding the current legal environment for hosting Wikipedia and the other Wikimedia projects.

  1. For the historically interested, the one exception was piracy of the high seas, in which all countries of the world had jurisdiction to prosecute pirates, if they could get their hands on them. See https://en.wikipedia.org/wiki/International_piracy_law
  2. For more detail of purposeful availment, see https://cyber.harvard.edu/property99/domain/Betsy.html
  3. GDPR was originally passed in 2016, which is arguably not recent in 2023, But it did not go into effect until 2018 and has taken many years to see how the GDPR’s jurisdiction rules work, and many of them are still subject to debate and ambiguity.
  4. For example https://www.hrw.org/news/2021/05/21/indonesia-suspend-revise-new-internet-regulation
  5. In the US, this protection was the result of a combination of Constitutional protections under the US First Amendment and legislative protections from Section 230 of the Communications Decency Act.
  6. https://diff.wikimedia.org/2014/08/06/wikipedia-pages-censored-in-european-search-results/ and https://diff.wikimedia.org/2016/10/19/petition-right-to-be-forgotten/ An example of a direct case is detailed (without identifying the subject) in our blog at https://wikimediafoundation.org/news/2019/04/11/a-german-court-forced-us-to-remove-part-of-a-wikipedia-articles-history-heres-what-that-means/ where a German court found that our duty as hosting provider included removing article history of material that had been corrected.
  7. While the Internet Archive’s Wayback Machine is useful to prevent linked sources from disappearing, it has dubious value for legal cases. It’s very difficult to know if a source found on the Wayback Machine disappeared from its original source due to negligence or because the original source found it to be inaccurate. Or perhaps even because the original source was already sued by the same person coming after Wikipedia and lost or gave up the case.
  8. While not linked in the jurisdiction section, the anti-terrorism regulations are another law written with a broad jurisdiction clause as well, applying to any website host with a substantial number of European users.