The Armageddon scenario for OpenAI

Copyright risks are about more than lawsuits

and

Nov 16, 2023

This week, copyright questions pose an existential threat to major AI model developers and an interview with IP lawyer and former Connecticut State Senator Bill Aniskovich.

Did a friend or colleague forward this to you? Welcome! Sign up above for a free weekly digest of all you need to know at the intersection of AI, policy, and politics.

When fair use isn’t fair

Jodi Picoult, George R.R. Martin, John Grisham, and other creatives are suing OpenAI for copyright infringement. Do they have a shot? It’s more complex than you might think.

Industry giants like Google, Meta, and OpenAI train their AI models with billions of web pages, images, and books. Without this immense pool of data to train on, ChatGPT and other sophisticated generative AI tools wouldn't exist. However, a significant portion of this data is copyrighted, raising a number of ethical and legal dilemmas; these companies harness the creative output of artists and authors without compensating them. While this does not strike most people as fair or equitable, AI companies argue that they stand on solid legal ground and point out the considerable benefits that consumers enjoy as a result of this technology.

Copyright law provides creators with exclusive rights to their original creative expression. This enables creators to profit from their works, encouraging future art and innovation. At the same time, these rights are not all-encompassing, nor do they last forever. Crucially, the concept of “fair use” protects the utilization of copyrighted material for education, criticism, newsgathering, and other purposes that don’t conflict commercially with the creator.

AI companies claim that it’s “fair use” to train their models with copyrighted content. To defend this claim they’ve made two primary arguments:

Legal: Courts have supported other claims of fair use by digital services, including a prominent case that allowed Google Books to scrape millions of books as long as it made only snippets of text available to the public.
Technical: Models like LLMs technically produce responses that draw only on their learned patterns of uncopyrightable facts, concepts, and ideas, and not the specific expressions from any one source.

Creators, on the other hand, would argue:

Training AI models for the benefit of large for-profit corporations is far from the educational or newsworthy purposes for which fair use was intended.
Large models are now more than capable of creating works that compete commercially with the creators whose works were used to train the model.

These fair use arguments are winding their way through the courts, with major class-action lawsuits in publishing, music, and visual art in progress. In the meantime, AI companies are nervous about public opinion souring, political attacks, and punitive regulations. Labor unions and organizations like the Authors Guild are mobilizing against them, and Congress or the Executive Branch might design new copyright rules that would upend the status quo. We’re already seeing some action in this direction: Biden’s Executive Order calls on the US Copyright Office (USCO) to issue recommendations to the President on potential executive actions relating to copyright and AI, and the US Patent Office is currently taking public comment on AI and copyright. Newer entrants like OpenAI and Anthropic look particularly vulnerable, as their competitors Google and Meta have access to enormous proprietary datasets they could leverage if data access were restricted.

AI companies appear to be deploying a three-pronged strategy to deal with this existential threat:

Fight the claims in court.
Lobby heavily to ensure no copyright reform is enacted that would inhibit their ability to scrape the web to feed their models.
License or synthetically generate data, in case 1 or 2 don’t work out.

To better understand these strategies and risks, we spoke with IP lawyer, professor, and former Connecticut State Senator Bill Aniskovich.

Interview with IP Lawyer Bill Aniskovich

Alex & Greg: AI companies seem to think they’ll likely be protected from these copyright claims via existing precedent. Is this your view, or should they be concerned?

Copyright infringement claims are time-consuming, complicated, and expensive. And I am not sure that the new Copyright Claims Board offers any real relief for copyright holders. Not surprisingly, this reality tends to favor the big corporation and not the artist/creator. In addition, the state of law has been that these cases turn on the application of a “fair use” doctrine that has generally been good for developer platform companies. The Google Books case is the typical precedent to which folks point.

That said, I think the SCOTUS decision this past spring in Warhol* may very well turn the table in favor of copyright owners. That case changes what constitutes a “transformative” use of someone else’s work and with the growth of platforms signing licensing agreements (e.g., Adobe Firefly, Shutterstock), I wouldn’t be too comfortable if I were a platform developer.

* Ed.: The US Supreme Court ruled in a landmark case that Andy Warhol’s painting of a photograph of Prince that he used without permission was not fair use, even after Warhol artistically transformed the image, because it was sold in the same market for which the original photograph was created – magazine publishing – and shortchanged the photographer.

A & G: Last week OpenAI joined the ranks of Microsoft, Adobe, and several other large companies in paying the legal costs incurred by customers who face lawsuits over IP claims related to work they generate using the developer platform. What are the kinds of potential damages they’re likely to incur, and on what timeline?

OpenAI was late to the game of protecting consumers from infringement claims related to their use of generative AI apps and services. Not surprisingly, its “Copyright Shield” program is relatively limited, and that could be a problem for customers who want explicit protection against infringement claims.

Copyright infringement claims are no joke. As far as timelines go, the statute of limitations on these claims is three years from the date of infringement. Under 17 U.S.C. § 504, a copyright infringer may be liable for statutory damages, actual (or what we call compensatory) damages, and in some cases even the profits gained by the infringer. The statutory damages alone range from $750 to $30,000 per work. In addition, removing or altering watermarks or other copyright information triggers liability under the Digital Millennium Copyright Act. It’s not clear whether the open source license claims being made against platforms related to AI training models and the “right of publicity” some states grant to the “style” of work would extend to customers. But probably not.

A & G: Is there a ceiling on the number of lawsuits that can be brought against these companies?

Nope.

A & G: A few weeks ago, we wrote about the implications of an OpenAI-affiliated lobbyist corralling a coalition of think tanks, academics, and public interest groups to urge Congress to oppose passing any additional copyright regulations for AI companies. What would the potential impact of copyright reform be on these companies, and how much worse would it get for them if there were to be legislative action?

Well, you don’t normally circle the advocacy group wagons if there’s not a lot at stake. So it’s safe to assume AI companies think it could get much worse if Congress embraces copyright reform. In addition to enhancing penalties under the DMCA, I think the Armageddon scenario for AI companies would be a comprehensive licensing requirement. For example, generative AI platforms are trained on huge archives of texts and images. These “data lakes” contain innumerable amounts of copyrighted material. A statutory requirement that companies obtain permission via mandatory licensing would be costly and time-consuming. Fighting these kinds of claims in court is preferable to the very public process of stopping legislative initiatives, especially if public opinion turns against the platform developers.

A & G: What executive actions could arise from the United States Copyright Office (USCO) recommendations; i.e., how much power does the executive branch really have here?

The language of the Biden Executive Order is extremely broad, bringing within its scope “any copyright and related issues…including the scope of protection for works produced using AI and the treatment of copyrighted works in AI training.”

While Congress and the federal courts have some ability to overturn executive orders, most scholars agree that the general scope of the President’s authority to lawfully exercise the Article II power to “faithfully execute” the laws. Given the current climate on Capitol Hill and the influence of Big Tech, I would not be surprised to see the “will” of the Executive used here to accomplish virtually any imaginable AI regulation in the copyright context.

Bill Aniskovich is an Associate Professor of Law & Management and the Dean of the Tagliatela School of Business and Leadership at Albertus Magnus College in New Haven, CT, where he teaches courses in IP and cybersecurity law. He served seven terms in the Connecticut State Senate and practices law at Brenner, Saltzman, and Wallman LLP in New Haven. He received his JD from the University of Virginia Law School.

Of Note

AI Political Pulse

Discussion about this post