AI & Copyright Litigation Updates
Copyright is a big deal in libraries. In 2023 and 2024 many authors and creatives decided to sue AI companies such as OpenAI, Meta, Anthropic, and Nvidia for claims related to copyright. Some of these people you may have heard of: John Grisham, Jodi Picoult, David Baldacci, Abdi Nazemian. However, there are even more you may not have heard of unless you are following AI litigation updates: Karissa Vacker, Mark Boyett, Stewart O’Nan, Brian Keene. Indeed, some big names are alleging that their books, art, voices, music, and videos are being used to train AI software without their permission, in violation of their copyright, but it is purportedly happening at all levels of fame and success. News agencies are also suing AI companies for allegedly using their protected content to train AI without their permission.
AI training is the process by which sophisticated software analyzes large amounts of data to create a model that is used to predict patterns in data. For text generators, the software identifies words or word parts (tokens) that are commonly used together in particular contexts when certain creators discuss specific topics. Often, the models are then rigorously tested by human trainers to monitor results and mitigate potential harm such as suppressing violent or explicit language. The software is then able to take that tokenized data with guidelines to generate outputs–answers–for users. As AI models are being used to generate data, more of the training data used is actually data that AI tools created. The quality and diversity of this training data are crucial, as they directly influence how well the AI can respond to unusual and varied requests. Therefore, human- or environment-created data are preferred to AI-generated data for training.
I was overwhelmed in the past two years trying to follow the constant news stories about people objecting to their content being used to train AI models. It seemed every week there was a new headline-making case. But where are the rulings? What has happened to all the hubbub? Allow me to provide a quick and non-legal summary from a librarian’s perspective and then point you in the direction of some great resources if you are interested in following these cases as they unfold. First, we indeed have two pertinent rulings on AI-related cases: in March, the D.C. Court of Appeals ruled in Thaler v. Perlmutter that a non-human machine cannot be the author when submitting a work for copyright protections, and perhaps even more significantly, in February, the U.S. District Court of Delaware ruled in Thomson Reuters v. Ross that training an AI model using copyrighted works infringes on the owner’s copyright. That is by no means the end of the story.
It is evident that we are in the case consolidation phase of many of these significant proceedings. With AI litigation, as Bruce Barcott explained in an article for Tech Policy Press, cases originate in the Northern District of California court (NDCA), where Silicon Valley is and tend to favor the tech companies, or they originate in the Southern District of New York court (SDNY), where publishing houses traditionally do business. Several major cases have transferred court districts and have been consolidated into class action lawsuits, if they weren’t already. Even cases that have different AI company defendants have been combined, such as a SDNY case that recently combined 12 other cases, some with OpenAI as the defendant and some with Microsoft as the defendant. Many cases are in the complaint and answer phase, such as The New York Times v. Microsoft & OpenAI (SDNY), Concord Music Group, et al. v. Anthropic (NDCA), some are in discovery, such as Sarah Andersen v. Stability AI (NDCA).
We will need to continue to be patient as these cases progress. However, there are some excellent resources created by experts that anyone with an interest can follow for updates. The AI Litigation Tracker by Peter Csathy and McKool Smith is about as current and comprehensive as it gets and they are tracking 18 cases currently. Wired has a handy-dandy spreadsheet tracking 28 cases. Ropes & Gray and Kirkland & Ellis have periodic updates about AI litigation. Court Listener tracks case documents and districts for all cases. Finally, if you want to see a selection of official court documents, I suggest visiting the cases of interest pages for SDNY and NDCA. If you want reporting that advocates for creators retaining their copyright, the Copyright Alliance has a lot of resources and reporting on active copyright litigation and rulings.