WEBVTT 1 00:00:04.490 --> 00:00:06.389 Katherine Klosek (ARL): Everyone's trickling in welcome. 2 00:00:11.650 --> 00:00:16.600 Katherine Klosek (ARL): All right, we're getting there. Welcome, everyone. 3 00:00:22.480 --> 00:00:29.950 Katherine Klosek (ARL): Okay, just watching the numbers tick up as, okay. 4 00:00:33.930 --> 00:00:44.970 Katherine Klosek (ARL): sweet. Okay, I think we're there so welcome everyone. I'm Catherine Klosek, Director of Information policy and Federal Relations at the Association of Research Libraries. Arl. 5 00:00:44.970 --> 00:01:05.240 Katherine Klosek (ARL): I want to thank you all for being here, and thanks to Authors, alliance for co-hosting this timely conversation, as we're starting to see decisions made in the courts about fair use and generative AI. The Library Copyright Alliance put out principles on copyright, and AI back in 2023, and we knew that litigation would shape the legal landscape for generative AI, 6 00:01:05.239 --> 00:01:23.629 Katherine Klosek (ARL): and there is a lot more litigation pending in district courts. But today we're looking forward to hearing about the the early decisions that we've seen so far, and their takeaways for researchers, authors, and libraries from our expert panelists, including John Bam, who's copyright counsel for Arl and the Library Copyright Alliance. 7 00:01:23.740 --> 00:01:35.200 Katherine Klosek (ARL): Dave Hansen, Executive Director of Authors Alliance and Yun Chao Xu, Staff Attorney at Authors Alliance. Thanks again to those who submitted questions ahead of time for the panelists 8 00:01:35.200 --> 00:02:00.689 Katherine Klosek (ARL): for all viewers you can use or viewing live. You can use the Q. And a function on Zoom to submit your questions for the panel, and we're going to leave time at the end for discussion. So enough from me. Let's hear from our panelists. Yan Chao, would you mind kicking us off by just summarizing the cases, and maybe highlighting places where there was consistency in the cases, and where the judges diverged. 9 00:02:01.780 --> 00:02:02.560 Yuanxiao Xu: Thank you. 10 00:02:02.770 --> 00:02:22.699 Yuanxiao Xu: So thank you all for being here today. You probably have noticed these disputes over AI training are forcing us to ask some very big questions, why do human authors create new works? And how do we advocate for these authors, especially the ones we're counting on to start creating in the future 11 00:02:22.870 --> 00:02:37.839 Yuanxiao Xu: as you listen and participate in today's discussion, I'd like to ask you to keep those questions front of mind, because the answers to these questions affect all current and future authors and creators. That is all of us. 12 00:02:38.210 --> 00:02:46.480 Yuanxiao Xu: So at the center of these cases is fair use. A proud tradition of US. Copyright law since 1841 13 00:02:46.700 --> 00:03:01.159 Yuanxiao Xu: fair use is what allows scholars to quote books and artists to create parodies. It's a 1st amendment right designed to ensure that copyright serves human creativity instead of just monopolistic corporate interest. 14 00:03:01.690 --> 00:03:14.559 Yuanxiao Xu: So now 2 opinions, handed down just 2 days apart have become the 1st court decisions on whether training, general purpose, large language models, or Llms can qualify as fair use. 15 00:03:15.190 --> 00:03:27.559 Yuanxiao Xu: Let's be clear, first, st that these are summary judgments. They are early stage rulings. That means the courts had to interpret disputed facts in light most favorable to the plaintiffs 16 00:03:27.670 --> 00:03:52.279 Yuanxiao Xu: and some of the judges. Comments in the issued opinions are just their musings, not even holdings. For example, in Kadri v Meta, the market dilution theory was never even raised by the plaintiffs in their pleadings, and its 10 page long. Analysis, taking up 1 4th of the entire opinion, had no bearing on the judge's ultimate holding in the summary judgment. 17 00:03:52.640 --> 00:04:04.810 Yuanxiao Xu: But even these musings matter, because how we interpret critique and respond to these musings will shape, how copyright evolves for all authors and for the public. 18 00:04:04.930 --> 00:04:13.840 Yuanxiao Xu: So I want to really thank Catherine today for convening this panel. These are exactly the kinds of conversations we need right now. 19 00:04:14.500 --> 00:04:37.780 Yuanxiao Xu: I want to make sure we're all on the same page about what these cases are actually about. So I'll start by briefly explaining how large language models are trained, and then I'll walk through the 2 cases, when we talk about training a large language model like Claude or Metaslama, we're referring to the process of feeding huge amounts of text to an algorithm. 20 00:04:37.800 --> 00:04:48.550 Yuanxiao Xu: The model uses this text to detect patterns in language and predict what comes next in a sentence. The result is not a searchable database. 21 00:04:48.660 --> 00:05:00.970 Yuanxiao Xu: What what's produced is an Llm. Made up of numerical weights and parameters, a system that can generate language based on statistical patterns, not stored passages. 22 00:05:01.920 --> 00:05:09.160 Yuanxiao Xu: Now, with that basic understanding in mind. Let's look at the 2 lawsuits that are putting this process under legal scrutiny. 23 00:05:09.450 --> 00:05:36.730 Yuanxiao Xu: The 1st case is Bart's V anthropic. 3 authors are suing anthropic for using their works to trans cloud models, but they are not just suing for themselves. The court recently certified a class that could include millions of authors whose works were found in 2 shadow libraries. Libgen and Pylemy. Whether class certification on that scale is problematic is a question we can maybe discuss later. 24 00:05:36.890 --> 00:05:44.850 Yuanxiao Xu: The second case is called Cadre V. Meta. It involves 13 authors doing Meta for using their books to train Lamma models. 25 00:05:45.160 --> 00:05:53.299 Yuanxiao Xu: Both of these lawsuits focus exclusively on the training of Llms, not its possible outputs. 26 00:05:53.500 --> 00:05:58.020 Yuanxiao Xu: so not the specific text. These ais are capable of producing. 27 00:05:58.640 --> 00:06:03.780 Yuanxiao Xu: The plaintiffs in both cases argue that the training itself is infringing. 28 00:06:03.890 --> 00:06:12.329 Yuanxiao Xu: and that the training will eventually lead to a flood of AI generated texts that indirectly compete with their original works. 29 00:06:13.200 --> 00:06:23.569 Yuanxiao Xu: So now that we've covered the basics. I want to turn to what has probably gotten the most public attention in both of these cases the allegations of piracy. 30 00:06:23.810 --> 00:06:30.050 Yuanxiao Xu: You've likely seen headlines claiming that Meta and anthropic stole millions of books. 31 00:06:30.370 --> 00:06:34.210 Yuanxiao Xu: And here's what we know from the pleadings in these cases. 32 00:06:34.390 --> 00:06:45.789 Yuanxiao Xu: In Barts, anthropic, allegedly obtained millions of books. Some are pirated, some lawfully purchased, and then, according to the court, destructively scanned. 33 00:06:46.280 --> 00:06:50.429 Yuanxiao Xu: and they use these books to build a massive internal library. 34 00:06:50.600 --> 00:06:55.679 Yuanxiao Xu: From there subsets were cleaned, tokenized, and used for model training. 35 00:06:56.150 --> 00:07:13.159 Yuanxiao Xu: The court found it troubling that anthropic couldn't confirm which of the 7 million pirated titles were actually used in training from the judge's perspective. At least some of the copying may have lacked a fair use, justification, because they were never intended for training. 36 00:07:14.020 --> 00:07:27.829 Yuanxiao Xu: What made matters worse for anthropic is that it refused to explain its internal data usage during discovery. So the court didn't waste its time speculating what these pirated books may have been used 37 00:07:29.050 --> 00:07:39.359 Yuanxiao Xu: and just said, there's no justification to say it's conclusively fair use. By contrast. Meta apparently used fewer pirated books, but 38 00:07:39.540 --> 00:07:45.430 Yuanxiao Xu: worse for them. They uploaded these pirated books through peer-to-peer file sharing system. 39 00:07:46.100 --> 00:07:48.630 Yuanxiao Xu: This is a whole different red flag. 40 00:07:49.140 --> 00:07:59.469 Yuanxiao Xu: The Cadre Court speculated that Meta's action may have supported the piracy ecosystem itself, which could be relevant to secondary liability down the road. 41 00:07:59.860 --> 00:08:17.840 Yuanxiao Xu: But here's what's important to stress. Piracy was not resolved in either opinion. No party asked the courts to decide whether the pirating was unlawful under traditional infringement theories. These questions remain open and will be dealt with later in litigation. 42 00:08:17.940 --> 00:08:22.310 Yuanxiao Xu: So what did the courts actually hold in these summary judgment opinions. 43 00:08:22.600 --> 00:08:32.740 Yuanxiao Xu: despite the piracy concerns. Both courts agreed that training Llms on books as the plaintiffs described, it is highly transformative 44 00:08:32.919 --> 00:08:38.249 Yuanxiao Xu: piracy was not a factor in the fair use analysis, at least. At this stage 45 00:08:38.600 --> 00:08:51.669 Yuanxiao Xu: both courts courts rejected the idea that plaintiffs are entitled to a new market for licensing books specifically for AI training. No such market currently exists, and even if it did. 46 00:08:51.770 --> 00:08:57.230 Yuanxiao Xu: the court said, copyright law doesn't require transformative uses to be licensed. 47 00:08:58.360 --> 00:09:08.430 Yuanxiao Xu: and crucially because the plaintiffs failed to identify any concrete market harm. Both courts held that training Llms qualified as a fair use. 48 00:09:08.870 --> 00:09:21.680 Yuanxiao Xu: The courts also remind us that the purpose of copyright law is not to reward copyright holders. It is to benefit the public by promoting the creation and sharing of creative works. 49 00:09:22.570 --> 00:09:29.119 Yuanxiao Xu: But the 2 courts also diverged in deeper assumptions about the AI technology 50 00:09:29.260 --> 00:09:44.520 Yuanxiao Xu: on the value of AI. The Cadric Court takes a very bleak view. It describes Llms as low quality imitators that will displace less famous human authors and reduce incentives for future creativity. 51 00:09:44.630 --> 00:09:51.230 Yuanxiao Xu: It portrays Llms as capable of autonomously flooding the market without human supervision. 52 00:09:51.850 --> 00:10:07.630 Yuanxiao Xu: The Bards Court, on the other hand, believes that Llms will likely enhance human creativity, helping under-resourced authors refine their writing or explore new ideas. Humans remain central to the Bards Court's analysis 53 00:10:07.970 --> 00:10:22.940 Yuanxiao Xu: and on market harm. The Cadre Court proposes a very novel novel theory of market dilution, suggesting that AI might harm writers by flooding the creative market with countless AI generated books. 54 00:10:23.340 --> 00:10:31.999 Yuanxiao Xu: The Bards Court did not believe machine-generated content represented the kind of competition that copyright law is designed to prevent. 55 00:10:33.040 --> 00:10:46.559 Yuanxiao Xu: If these cases make you feel uncomfortable, you are not alone. The discussion of AI training exposes some long-standing tensions that copyright law has failed to resolve. On the one hand. 56 00:10:46.560 --> 00:11:04.029 Yuanxiao Xu: major rights holders like Amazon, audible Disney Netflix and major academic publishers are pushing us toward a pay-per-use model one that maximizes control restricts access and monetizes every interaction we have with the creative work. 57 00:11:04.430 --> 00:11:21.969 Yuanxiao Xu: At the same time, tech companies are building these transformative technologies behind closed doors with minimal public oversight and no clear pathway for how the broader public might benefit from these tools beyond just being treated as passive consumers 58 00:11:22.400 --> 00:11:33.570 Yuanxiao Xu: and caught in the middle are independent creators, researchers, educators, libraries, and everyday users. The very communities that fair use was meant to protect. 59 00:11:33.790 --> 00:11:42.749 Yuanxiao Xu: Yet much of today's policy. Conversation is focused on crafting profit, sharing deals between tech giants and major rights holders 60 00:11:43.070 --> 00:11:47.339 Yuanxiao Xu: a solution that does little to address the deeper structural problems. 61 00:11:47.520 --> 00:11:55.899 Yuanxiao Xu: Instead, in my opinion it risks entrenching monopolies and pushing actual human authors, voices further to the margins. 62 00:12:00.030 --> 00:12:19.690 Katherine Klosek (ARL): Thank you. I think that was really useful kind of table setting, and I think you gave us some really useful perspective about the cases kind of what the courts actually said, kind of where we are procedurally, and the questions that remain open so really appreciate that. And I'm sure we'll elaborate on these and other points as we go along. 63 00:12:19.690 --> 00:12:37.169 Katherine Klosek (ARL): Turning to John, so some copyright lawyers have said that these 2 decisions don't change anything about the fair use doctrine that authors, libraries, and others rely on. So tell us, then, what can authors, researchers, and librarians who support research and scholarship take away from these decisions. 64 00:12:38.020 --> 00:12:39.313 Jonathan Band: Thank you. 65 00:12:40.190 --> 00:12:46.051 Jonathan Band: So I'd say at the at first, st just as as young Xiao 66 00:12:46.940 --> 00:12:51.219 Jonathan Band: was saying, that you know these are just 2 67 00:12:51.510 --> 00:12:58.180 Jonathan Band: decisions out of more than 40 cases that are being litigated right now in this area. 68 00:12:58.470 --> 00:13:06.960 Jonathan Band: And they're also early phases like they're on summary judgment. And there's a long way forward in both of these cases. 69 00:13:07.090 --> 00:13:12.440 Jonathan Band: and they will talk a little bit more about the class certification part in one of the cases. But it's 70 00:13:12.570 --> 00:13:16.260 Jonathan Band: you know these cases are going to be appealed, and then they'll 71 00:13:16.710 --> 00:13:25.019 Jonathan Band: you know, I'm sure. You know they'll there'll be petitions to the Supreme Court, and then it'll go back. There's 72 00:13:25.300 --> 00:13:32.119 Jonathan Band: litigation in this area is going to continue for a long time. And so these are 73 00:13:32.660 --> 00:13:45.149 Jonathan Band: 2 of the 1st cases. So everyone's focusing on them. But it's it's important that we don't get all worked up because there's going to be a lot more coming down the pike. 74 00:13:45.180 --> 00:14:07.580 Jonathan Band: And you know, at some point, you know, 5 years or 10 years from now we'll barely remember the names of these decisions, even though we're all so focused on them. I mean, we'll what we'll remember is the name of the decision that you know where the Supreme Court or the several decisions where the Supreme court ultimately 75 00:14:07.580 --> 00:14:18.879 Jonathan Band: tackle some of these issues and resolve them. And and we will forget these District court decisions unless these happen to be the the court. The cases that the Supreme Court decides. 76 00:14:19.090 --> 00:14:22.120 Jonathan Band: It's also important to remember that 77 00:14:22.410 --> 00:14:36.950 Jonathan Band: you know a major factor of how these cases ultimately get resolved is the position of the Administration, and so far it seems that this administration is generally viewing 78 00:14:37.340 --> 00:14:41.730 Jonathan Band: training and the issues relating to training is fair use. 79 00:14:42.570 --> 00:15:07.059 Jonathan Band: it wasn't explicitly stated in the AI Action plan that came out last week. But then President Trump, in a subsequent speech, basically said to the extent one can understand what he said. But he basically said that you know, basically, training has to be a fair use doesn't make sense to have any other approach. 80 00:15:07.576 --> 00:15:29.399 Jonathan Band: And you know that will have an influence on the courts as they go forward. There's no question, you know. I don't think that there's going to be a judge who's really going to feel like be responsible, or even the Supreme Court that will ultimately feel like they should be responsible for, you know, sort of destroying this sector of the economy. 81 00:15:31.420 --> 00:15:34.750 Jonathan Band: Now, turning to so so. 82 00:15:35.830 --> 00:15:56.849 Jonathan Band: you know, there's a long way down. And so I'm sort of suggesting what these cases decide. That doesn't matter that much. Don't worry about it one way or the other. However, we have to deal with these cases until the Supreme Court decides. You know, we're going to have to live with these lower court decisions, whether it's at the District Court level or the Court of Appeals level. 83 00:15:57.000 --> 00:16:04.880 Jonathan Band: And so then there's 2 points I want to stress. And this gets to Catherine's question about. 84 00:16:05.060 --> 00:16:08.520 Jonathan Band: you know, fair use and so forth, is, 1st of all, 85 00:16:09.900 --> 00:16:15.280 Jonathan Band: again, it's important to remember. These cases arose in the context of 86 00:16:16.196 --> 00:16:25.769 Jonathan Band: commercial AI firms. Right? So it's anthropic. And Meta. And so 87 00:16:25.910 --> 00:16:28.729 Jonathan Band: the analysis likely would be very different 88 00:16:28.910 --> 00:16:33.430 Jonathan Band: if you were dealing with non-commercial players. So let's say. 89 00:16:34.340 --> 00:16:48.639 Jonathan Band: a scientific researcher at a university. You know what they do. You know, they they should be aware of what's going on in these decisions, but I'm not sure they should necessarily worry too much about them. 90 00:16:49.140 --> 00:17:00.290 Jonathan Band: because I think the analysis would be quite different in particular with respect to the shadow libraries so clearly, you know again, here, you know 91 00:17:00.700 --> 00:17:25.420 Jonathan Band: the in the Barts case. The court was very disturbed by the accessing of the shadow libraries and the downloading in the Meta case. Less so, but I would think in both cases it would be different if you were dealing with commercial research, non-commercial researchers who were accessing the shadow libraries because 92 00:17:25.660 --> 00:17:28.200 Jonathan Band: they simply didn't have the budget to 93 00:17:28.540 --> 00:17:32.020 Jonathan Band: buy the books or buy licenses for the books 94 00:17:32.567 --> 00:17:45.780 Jonathan Band: whereas and and they were doing it for research purposes. So so you know, I think that that's important to stress, however, to the extent that you're going to have joint ventures between 95 00:17:46.440 --> 00:17:50.810 Jonathan Band: academic researchers and commercial firms. 96 00:17:51.558 --> 00:18:08.609 Jonathan Band: You know, I would say, yeah, stay away from the shadow libraries, because it's very, very risky at this point to train on them, I would think again. 97 00:18:08.670 --> 00:18:25.410 Jonathan Band: This is, you know, if you're purely academic, maybe not so much of a risk, but if you're either commercial or a joint venture, it just seems that it's it's very risky to use the shadow libraries. 98 00:18:25.838 --> 00:18:34.650 Jonathan Band: And then the second point, you know. But but we're going to hear a lot more about that, and in particular, in in future cases, because 99 00:18:35.150 --> 00:18:54.290 Jonathan Band: some of the quite a few of the commercial firms did rely on the shadow. So we know anthropic. We know Meta Openai, all probably trained on shadow libraries. On the other hand, Google probably didn't, because they have their own database 100 00:18:54.330 --> 00:19:11.029 Jonathan Band: from the from the Google Books case. But again, that remains to be seen. But but I think we're going to hear a lot more about that in the future. And then then, finally, the market dilution point 101 00:19:11.320 --> 00:19:13.839 Jonathan Band: that we just heard about. 102 00:19:13.960 --> 00:19:15.730 Jonathan Band: So this is a 103 00:19:19.020 --> 00:19:24.479 Jonathan Band: Interestingly, again, there was a split between these 2 cases of the 104 00:19:25.450 --> 00:19:48.689 Jonathan Band: Judge Alsup sort of rejected in the in the, in the Bart's case sort of rejected it said, this doesn't make any sense as we just heard. It doesn't have anything to do with traditional copyright law, and even the copyright office which sort of endorsed this market dilution theory, described it as a theory, you know, sort of like uncharted territory, right? That this is a new theory. 105 00:19:48.770 --> 00:20:01.600 Jonathan Band: and and even Judge Shabria in the In the cadre, met a decision. He sort of acknowledged. He said, Yeah, you know this is uncharted territory, but you know. 106 00:20:01.700 --> 00:20:19.910 Jonathan Band: so is, oh, so is AI right. The whole thing is uncharted. We're making it up as we go. And so, you know, he was saying that market dilution could be an appropriate theory. So we're going to be hearing a lot about market dilution going forward. I mean, I personally think that it is. 107 00:20:20.050 --> 00:20:28.670 Jonathan Band: I agree with Judge Alsup that it is very problematic from a traditional 108 00:20:28.990 --> 00:20:37.090 Jonathan Band: copyright perspective. I mean, this is not the I mean the whole point of the fair use. Doctrine and copyright generally is to promote competition 109 00:20:37.720 --> 00:21:04.670 Jonathan Band: with when you're not creating things that are substantially similar. And here but here this is exactly this flooding of the market would be with things that are not substantially similar to the original work. That's what copyright I mean. That's that's what you're supposed to be able to do so. The market dilution issue really, a theory really goes to the heart of that. And it's also, I would say, problematic 110 00:21:05.510 --> 00:21:15.569 Jonathan Band: in terms of, and even the the judge judge, who sort of endorsed the theory. He sort of talked about 111 00:21:15.710 --> 00:21:36.269 Jonathan Band: all the stuff, all the things that you need to do to prove it, and and he acknowledged that the plaintiffs in this case, meaning well-known authors. Their works aren't going to be harmed. I mean, they have sort of like a brand. It's it's the harm is going to be towards the new people, the undiscovered people. 112 00:21:36.320 --> 00:21:47.279 Jonathan Band: And it seems to me that that's just completely speculative. And how you go about proving that is going to be impossible. And it really and and this goes to, you know, sort of what young young Charles 113 00:21:47.410 --> 00:21:58.180 Jonathan Band: Point at the end of her presentations, like. There's a lot of issues here that ultimately have nothing to do with copyright law. Fair use. I mean, these are going to be issues relating to 114 00:22:01.050 --> 00:22:08.579 Jonathan Band: you know what our, you know, cultural heritage and cultural promotion and our our cultural policy 115 00:22:08.710 --> 00:22:11.389 Jonathan Band: things that really go way way beyond 116 00:22:11.600 --> 00:22:21.150 Jonathan Band: copyright law. And you know, maybe maybe it's going to be important in the future to have more funding for young authors and young creators 117 00:22:21.290 --> 00:22:24.099 Jonathan Band: could be hard to imagine. 118 00:22:24.810 --> 00:22:36.060 Jonathan Band: Certainly this administration and other administrations really funding that kind of thing. But that might be the what we really need. And but but again. 119 00:22:36.370 --> 00:22:39.829 Jonathan Band: copyright is not. A 120 00:22:39.930 --> 00:22:46.999 Jonathan Band: should not be used in place of a cultural policy or a policy of promoting culture. 121 00:22:47.500 --> 00:23:14.269 Katherine Klosek (ARL): Well, thanks, John. I think you maybe help bring the temperature down a little bit and remind us that you know these cases have a way to go, and I think you touched on some, you know practical takeaways for our audience as well. So appreciate that Dave would love to hear from you. So I know you've written about that. Getting a court to certify a class is a big deal. And so what does it mean? That judge also has said that authors can bring a class action in the Barts case. Tell us about that. 122 00:23:14.780 --> 00:23:39.099 Dave Hansen: Sure. Thank you, and thank you everybody for joining. I was really pleased to see how many people are on the call, so what I thought I would do is give you like a short primer on class action law which don't fall asleep on me. It is a lot more interesting in these cases than it might 1st appear. And it's really become a prominent issue. This issue of 123 00:23:39.100 --> 00:23:45.299 Dave Hansen: a class being certified in the Barts case because the judge has already put his stamp of approval on that. 124 00:23:45.550 --> 00:24:09.210 Dave Hansen: So lots of you have probably heard about class actions before. The idea with a class action is that you can have a small group of plaintiffs who sort of represent a larger group of people who are similarly situated and similarly harmed by some action that someone else has taken so they can sue on their behalf. Lots of you, for instance. 125 00:24:09.210 --> 00:24:32.399 Dave Hansen: probably own a car that at some point has been subject to a class action lawsuit, where you know, the manufacturer did something that was wrong, and and they had to go fix it, and rather than having you know everybody who bought that car bring a separate lawsuit. There's this mechanism in the law that allows for a handful of people to bring that suit on behalf of everyone. And then. 126 00:24:32.400 --> 00:24:56.200 Dave Hansen: you know, bring that issue to resolution. So that's the kind of basic function of a class action. And so one of the things that we've seen across a number of these AI suits there are over 40, as John said, over 40 that have been filed, and a large number of them are class action lawsuits. 127 00:24:56.200 --> 00:25:18.820 Dave Hansen: and they're mostly being brought by a handful of writers or other creators along with law firms that are very experienced with class action lawsuits against some of these big technology companies. None of them have really progressed to the point of deciding 128 00:25:18.820 --> 00:25:43.650 Dave Hansen: whether those small group of creators can actually represent the whole body of other copyright holders that they claim. Except for this Barts lawsuit and this Barts lawsuit is a real unicorn compared to most of them. It was. It was not filed that long ago relative to the timeline. For many of these other cases Barts was filed in August of last year. 129 00:25:43.830 --> 00:26:08.839 Dave Hansen: and we are less than a year into this suit, and we've already got a decision on summary judgment on the fair use issue, and we already have a decision on whether the class representatives and the class should be certified. A year may sound like a long time, but that's like lightning fast compared to the speed that all of these other suits have gone. And 130 00:26:09.000 --> 00:26:33.239 Dave Hansen: and it's caused some real problems. I think I have a lot of criticism of how this has progressed because I don't think it's really allowed for adequate development of the facts or information needed to really certify the class in a responsible way. So so that gets us to what has actually happened here about a month and a half ago, or a little bit more than that, I guess. Now 131 00:26:33.520 --> 00:26:49.929 Dave Hansen: the plaintiffs in this suit said, we're asking the judge to certify the class, meaning that we are officially the plaintiffs in this case are officially allowed to represent all of the rights holders who have an issue 132 00:26:49.930 --> 00:27:14.569 Dave Hansen: in this suit where anthropic has trained on their books, and as Yun Xiao has talked about those books came from a number of different places. Some of them came from sources like Libgen. And how did you say it, Pilami? We were debating how to say this other data set but another kind of shadow library, pirate site. And then some of the books, actually anthropic. 133 00:27:14.570 --> 00:27:23.150 Dave Hansen: purchased copies, scanned them and then use them for AI training. So in Judge Alsop's decision in this case he 134 00:27:23.150 --> 00:27:51.350 Dave Hansen: sort of separated the analysis and said, If you're training, particularly using those books that were purchased and scanned a-okay. That looks fair use like fair use. But if you're just kind of accumulating, a central library is the term that he used, I don't love his use of the term library there. But this central data set of books obtained from Libgen and this other 135 00:27:51.440 --> 00:28:18.209 Dave Hansen: Shadow library, and just kind of holding on to it just in case which is what it appears anthropic did. They were holding on to these contents, presumably for future research and development or future work on their models, but they were just holding on to it. That's where the Court said. I can't grant summary judgment on that yet. Technically, that's still got to go to trial is what it looks like. But 136 00:28:18.380 --> 00:28:41.420 Dave Hansen: you know, all indications from the opinion are that the judge really does not believe that that use is fair use. So what happened is the court then certified this class, and said that the class includes all beneficial or legal copyright owners of the exclusive right to reproduce copies of any book in the versions of libgen or pylomide downloaded by anthropic. 137 00:28:41.440 --> 00:29:11.070 Dave Hansen: And so, if you sit there for a minute. And you think about what is in these data sets. There's about 7 million books represented in these data sets, and the judge has now pulled in all legal or beneficial copyright owners. That means people who actually are the copyright holders or people who get royalties from an exclusive licensing deal from those works, all of those covering about 7 million books. 138 00:29:11.070 --> 00:29:33.979 Dave Hansen: and he certified that that class could be represented by 3 authors, Andrea Bartz, who writes thrillers. Charles Graber, a nonfiction author who wrote books like The Good Nurse and The Breakthrough, published by Hachette and a 3rd author, Kirk Wallace Johnson, who's a nonfiction author 139 00:29:33.980 --> 00:29:50.429 Dave Hansen: as well, and those companies what's known as their loan out companies where they essentially just hold rights on their behalf. So those 3 authors in this suit represent essentially now the entirety of the publishing industry. 140 00:29:50.430 --> 00:30:14.399 Dave Hansen: and that includes academic authors that includes university presses. That includes probably some libraries that hold rights, that they've, you know, inherited or been gifted from authors, and so those 3 authors represent them in bringing the rest of this suit to trial, and potentially a judgment against anthropic. 141 00:30:14.600 --> 00:30:32.979 Dave Hansen: or, what is more likely to happen if this class certification stands is that they'll try to work out some sort of deal with anthropic. And that is the reason I think that's likely is because the level of damages that are at play in this suit are so astronomically large that 142 00:30:32.980 --> 00:30:53.339 Dave Hansen: with a class that's certified of 7 million or so books and copyright statutory damages which could be as high as $150,000 per work infringed. You're talking about billions and billions in potential liability. And so, you know, in from anthropics standpoint. 143 00:30:53.650 --> 00:31:05.390 Dave Hansen: if they allow that to go to trial, and it turns out very badly. That's the end of the company. They go bankrupt, they just have no capacity to handle that kind of liability. And so so this this 144 00:31:05.420 --> 00:31:29.739 Dave Hansen: decision could also mean that those class representatives now representing again the entirety of the publishing industry, are going to try to negotiate a settlement agreement. So I have lots of criticisms of this, I mean, one of my main ones is that you know, it's really troubling and really impossible, I think, for those 3 authors to adequately 145 00:31:29.740 --> 00:31:53.359 Dave Hansen: the interests of this broad of a group of creators and rights holders. They just have very, very different interests. I mean, if you talk to academic authors, for instance, I think many of them will have a very different perspective even on the very basis of the suit. For instance, the class includes many books written by AI researchers, text data, mining researchers who rely on the very same fair use rationale 146 00:31:53.360 --> 00:32:16.230 Dave Hansen: that anthropic has asserted in this case. So that's 1 issue. Another issue is, there's no real common set of issues like it's virtually impossible to identify with any level of precision all of the rights holders of these books. The closest analogy we have is the Google Books lawsuit. From 147 00:32:16.450 --> 00:32:39.040 Dave Hansen: about 10 to 15 years ago, and in the midst of that suit Google attempted to kind of craft, a mechanism to identify rights holders and clear rights, and at that time they indicated they thought they were going to have to spend. I think it was 34 million dollars just to set up the organization to kind of do that rights clearance work. 148 00:32:39.040 --> 00:33:02.059 Dave Hansen: So it's very expensive, very complicated. Congress has tried to resolve this issue in a number of cases with orphan works legislation. The Copyright Office has studied it so lots of people have looked at this rights identification issue and found it like very, very complicated and difficult. And the class certification that we have here just sort of glosses over it, as if you know. 149 00:33:02.060 --> 00:33:20.439 Dave Hansen: you just send out letters and post notices on the Internet, and that will resolve all of these issues. So that's the class certification issue. One thing I will note procedurally to keep an eye out for is that the class was certified. 150 00:33:20.700 --> 00:33:49.880 Dave Hansen: Let's see, not last Thursday, but the Thursday before, and anthropic does, under the Federal rules of civil procedure, have the option to petition the 9th circuit for an appeal, and this is a appeal that kind of happens midstream in the suit. They don't have to get permission from the judge judge also in this case, but they do have to persuade the 9th circuit to take that appeal. 151 00:33:50.080 --> 00:34:15.839 Dave Hansen: And I note that because the deadline for that is coming up very quick, they get basically 2 weeks to file such a petition for appeal to the 9th circuit given the stakes here, and what kind of liability this could mean for anthropic, I would put. The probability is high that they would try to attempt such an appeal to get the 9th circuit to do a quick look at this 152 00:34:16.050 --> 00:34:19.490 Dave Hansen: class certification issue, to see if they will unwind it. 153 00:34:21.340 --> 00:34:32.600 Katherine Klosek (ARL): Dave. Thanks for unpacking that I don't. I mean, I'll speak for myself. I usually think of like sort of like pharmaceutical cases, or whatever when we talk about class action. Obviously, there's a lot more at play here. So really appreciate you explaining that 154 00:34:32.600 --> 00:34:57.520 Katherine Klosek (ARL): just a note we did get a bunch of really good questions in. So maybe we can do our next round of prepared questions as a little bit of a lightning round, and then we can get to some of those audience questions. So Yun Chao back to you. The Cadre Court invoked the theory of market dilution. A few of you have already brought it up on today's webinar so, and we heard about it in the Us. Copyright office report as well the 3rd part of their report on copyright and AI which was 155 00:34:57.520 --> 00:35:05.989 Katherine Klosek (ARL): released as a pre-publication. Can you just talk about that theory? Explain it a little bit and and specifically why it can be problematic for research and scholarship. 156 00:35:06.910 --> 00:35:35.270 Yuanxiao Xu: I can't follow your instruction on a lightning round. There are so many problems with this, and I think we really need to pay attention to this. So 1st of all, we have been throwing around this word market dilution. What does it actually mean? So it's actually a brand new concept coined only this year. We see it 1st in the Us. Copyright Office Report, and now in Judge Chabria's cadre decision, both using it to suggest that training AI models 157 00:35:35.270 --> 00:35:42.679 Yuanxiao Xu: might fail. The 4th fair use factor which asks whether the use causes harm to the market for the original work. 158 00:35:42.900 --> 00:36:08.179 Yuanxiao Xu: Traditionally, that market harm factor has focused on cognizable market harm, meaning direct substitution. So if I publish a book that lifts large parts of your book, and people started buying my book instead of yours. That's clearly problematic and not a behavior we want to encourage. That's a clear market substitution. 159 00:36:08.520 --> 00:36:23.829 Yuanxiao Xu: But market dilution stretches that logic very thin. Judge Tabrea puts it this way, even if the AI generated work isn't a copy. If it's similar enough in subject matter or genre 160 00:36:24.050 --> 00:36:33.060 Yuanxiao Xu: that it competes with the original, then that's still enough market harm, because it indirectly substitutes for the original 161 00:36:33.180 --> 00:36:47.089 Yuanxiao Xu: as a favorite example repeated both by the copyright office and by Judge Shabria. If a reader buys a romance novel written by an Llm. Instead of one written by a human author, that's substitution. 162 00:36:48.200 --> 00:36:56.910 Yuanxiao Xu: That's the essence of the market dilution theory, and, as you can see, it dramatically expands the monopoly power copyright holders could claim 163 00:36:57.230 --> 00:37:16.330 Yuanxiao Xu: honestly. When I 1st saw this, it made me wonder if public libraries were invented today? Would Judge Shabria see it as market dilution, too? After all, libraries present millions of books to readers and arguably reduce individual book sales? Would that be enough to claim market harm? 164 00:37:16.790 --> 00:37:24.500 Yuanxiao Xu: Judge Sabria admits that market dilution is a novel theory, and that he's developing it specifically to address AI. 165 00:37:24.620 --> 00:37:39.399 Yuanxiao Xu: What he doesn't admit is how deeply flawed and dangerous this logic really is. So there are at least 4 unfixable problems with this theory. First, st many AI generated works actually incorporate human authorship. 166 00:37:41.151 --> 00:37:53.450 Yuanxiao Xu: It's weird that this market dilution theory treats Llm generated works as if they're machine only creations when in reality many involve substantial human input 167 00:37:53.660 --> 00:38:12.089 Yuanxiao Xu: books and essays created with the help of Llms are often the result of carefully crafted prompts and multiple rounds of human refinement. These are not generic outputs, auto-generated and uploaded for sale. To call such a work a substitute for an existing work is misleading 168 00:38:12.090 --> 00:38:32.040 Yuanxiao Xu: in many cases. What we are looking at is a new kind of creative process. Market dilution theory ignores this nuance and reality. It proposes to regulate human creativity, not because the expression is potentially infringing, but simply because an AI tool was part of the creative process 169 00:38:32.200 --> 00:38:40.950 Yuanxiao Xu: that's not protecting the future of human authorship. It's penalizing up and coming authors who want to leverage new technologies. 170 00:38:41.220 --> 00:38:45.519 Yuanxiao Xu: And, second, this market dilution theory ignores the law 171 00:38:45.760 --> 00:38:54.610 Yuanxiao Xu: to prove copyright infringement. The law requires substantial similarity in protected expression, not just theme, tone, or genre. 172 00:38:54.760 --> 00:39:01.990 Yuanxiao Xu: Yet market dilution places liability on the AI model, even though the model is not substantially similar to the books. 173 00:39:02.380 --> 00:39:22.990 Yuanxiao Xu: so the model is not even prima facie infringing, but is still treated as harmful under this market dilution theory. In essence, proponents of market dilution theory are trying to rewrite copyright law. Sidestepping carefully balanced rules to create a new form of liability for AI developers without statutory backing. 174 00:39:23.680 --> 00:39:44.079 Yuanxiao Xu: And, thirdly, market dilution imposes impossible evidentiary burden on AI developers. The Cadre Court, says AI. Developers can only escape liability if they conclusively prove their model, doesn't now, and will never in the future cause any decline in sales of a plaintiff's book used for training 175 00:39:44.550 --> 00:40:07.889 Yuanxiao Xu: that is just outright. Impossible for anyone to prove conclusively. So let's imagine for a moment that is not what Judge Chobria actually meant, even though it is what he says in his opinion, another more concrete guidance for this kind of evidentiary showing needed is as follows. So I quote directly from his writing. 176 00:40:07.890 --> 00:40:18.250 Yuanxiao Xu: the proper comparison isn't to a world with no Llms. But to a world where Llms weren't trained on copyrighted books. 177 00:40:18.680 --> 00:40:46.969 Yuanxiao Xu: So that sounds more manageable until you realize what it means in practice is that every time an AI company is sued it must be prepared to produce a control model trained only on public domain material for comparison. This alone can create a legal environment where only the richest players can afford to participate. Startups and independent researchers will be priced out entirely for this duplicative effort required. 178 00:40:47.300 --> 00:41:04.950 Yuanxiao Xu: and lastly, and most egregious, I think, is that this market dilution theory only serves a few already successful authors and big rights holders. Market dilution presents itself as protecting the incentive to create. 179 00:41:05.070 --> 00:41:19.020 Yuanxiao Xu: Judge Tabria also expressed deep concern for lesser-known authors, the writers, he says, the most vulnerable, as John already mentioned, but who actually benefits from applying this market dilution theory. 180 00:41:19.110 --> 00:41:36.429 Yuanxiao Xu: Only those who can one afford to hire economists as testifying experts at around $1,000 an hour to produce very fact-specific reports, and 2 to show quantifiable loss of sales tied directly to the use of their work in AI training. 181 00:41:36.550 --> 00:41:51.770 Yuanxiao Xu: In other words, the beneficiaries of this theory are not emerging writers and definitely not future authors. The very human creators and human creativity. This theory claims to protect will have no use of this theory. 182 00:41:52.000 --> 00:42:00.779 Yuanxiao Xu: And let's go back to the romance novelist example. What is her best path forward under this market dilution theory. 183 00:42:01.370 --> 00:42:07.830 Yuanxiao Xu: Should she try to trick Meta into using her work for training and then sue them for lost sales? 184 00:42:07.940 --> 00:42:19.149 Yuanxiao Xu: Or should she aim to write books specifically to be licensed to AI companies for training AI. Since Judge Shabria thinks that AI will inevitably crowd out her market. 185 00:42:20.320 --> 00:42:24.700 Yuanxiao Xu: It seems to be the only realistic redress 186 00:42:25.270 --> 00:42:31.420 Yuanxiao Xu: market dilution proposes. But is that the kind of creativity we want copyright law to promote 187 00:42:31.530 --> 00:42:51.569 Yuanxiao Xu: market dilution is being sold as a way to protect the public and empower authors. But it's like so much in today's copyright discourse, including things like the No Fakes Act. It's a Trojan horse. It promises to protect everyday creators, but only delivers profits to a handful of very selective groups of people. 188 00:42:51.570 --> 00:43:06.960 Yuanxiao Xu: So to conclude, unless you are a copyright lawyer, an expert for hire, a super successful and established author, a big rights holder, or a rich tech company. You really have no reason to like this theory of market dilution. 189 00:43:07.610 --> 00:43:37.299 Katherine Klosek (ARL): Thanks for explaining that so clearly. And I think hopefully, when we all see or hear that term, we'll have a better understanding of what it means, and maybe our ears will perk up a little bit. So thanks for breaking that down, John, over to you with a question about legislation calling a little bit of an audible. You were going to talk about the relationship between these cases and the recently introduced AI accountability and Personal Data Protection act. And I think, you know, we definitely want to hear about that. But there's also a question in the chat 190 00:43:37.300 --> 00:43:57.469 Katherine Klosek (ARL): about whether the President's remarks in the recent White House speech that you mentioned maybe give us an opportunity to codify. You know that training generative AI models on copyrighted works is fair use. So love to hear your thoughts on kind of the future and prospects of potential legislation in this area and on the bills that have been introduced so far. 191 00:43:58.910 --> 00:44:09.240 Jonathan Band: So. The the. With respect to the holly Bill, it's completely unrealistic. It! It is so sweeping that 192 00:44:09.873 --> 00:44:12.696 Jonathan Band: you know you you could never have 193 00:44:13.610 --> 00:44:20.000 Jonathan Band: any any sort of generative AI whatsoever, whether it's 194 00:44:20.430 --> 00:44:40.309 Jonathan Band: a commercial entity or a nonprofit. It's just the way it's it's it's worded. It would just be so so broad. And and it is sort of the latest in a succession of bills in the last Congress were many bills that would, you know, sort of talking about transparency requiring. 195 00:44:41.816 --> 00:44:53.250 Jonathan Band: AI developers to, you know, keep records and disclose what works they trained on and so forth, and and all of them, really. 196 00:44:53.390 --> 00:45:00.729 Jonathan Band: you know, would would have the effect of narrowing or restricting the ability of 197 00:45:00.850 --> 00:45:04.520 Jonathan Band: new entrants in the field and 198 00:45:04.800 --> 00:45:22.779 Jonathan Band: end up. Basically, you know, in a world where you know as a practical matter, you know, you know, you could have Google and Meta. They'd be the only ones who could afford it. And you know, smaller, newer companies like Anthropic or Openai. They can't do it. 199 00:45:22.890 --> 00:45:30.909 Jonathan Band: And and certainly academic researchers, they can't do it. It would just be Google, Meta. And I guess Microsoft, 200 00:45:33.200 --> 00:45:56.049 Jonathan Band: and and that's probably not a great result from a public policy point of view. Now, the the question of the chat was sort of like, okay, let's go the other direction. I mean, maybe maybe you know, now, we could codify this. And you know, codify some of the decisions. Or of course, we only have, you know, parts of 201 00:45:56.240 --> 00:46:17.690 Jonathan Band: you have these 2 decisions which are sort of at odd, you know, coalesce in some respects, and go in opposite directions in other respects. And you know, even to say, Okay, well, you would never get agreement. On which parts would you want to codify even of these 2 decisions? And as more time passes, there'll be more decisions that will be going in other directions. 202 00:46:17.890 --> 00:46:19.183 Jonathan Band: And so 203 00:46:20.150 --> 00:46:34.569 Jonathan Band: you know. You know. Yes, I suppose you you could imagine a world where you know the President basically says, Okay, this is what I want. And then all the Republicans would fall in line, and then they'd need to get. 204 00:46:34.690 --> 00:46:55.079 Jonathan Band: you know, just a handful of Senate Democrats, and maybe they could. But I, you know, given the politics, I would think that anything that you you would have to, with copyright legislation as a practical matter you'd need to get. You need to get the 60 votes in favor. You'd have to get across the 205 00:46:55.280 --> 00:47:03.460 Jonathan Band: the filibuster threshold. And I just even if you could conceivably convince. 206 00:47:05.270 --> 00:47:18.700 Jonathan Band: you know, I guess 11, or you know, you know 9 or 8, 8, or 9, or whatever the number of Democrats on the merits. I think this is a matter of principle. They would oppose it right? So it's it's, you know. One can 207 00:47:18.940 --> 00:47:25.900 Jonathan Band: talk about it. But I just don't see as a practical matter that will ever happen. So at least at this point. 208 00:47:26.020 --> 00:47:52.700 Jonathan Band: you know, we're going to have to just continue the slog through the courts, and you know, I think ultimately it will, you know, because I have faith in the courts, at least on copyright matters, maybe not in other areas, but in copyright matters, and I have faith in fair use, you know. I think ultimately we'll end up in a good place, but it's going to take a while. And along the way you're going to have. 209 00:47:52.780 --> 00:48:00.600 Jonathan Band: you know, there'll be certain issues which might be problematic in certain areas. For example, the Shadow library problem. 210 00:48:01.810 --> 00:48:31.220 Katherine Klosek (ARL): Thank you. I do love the way folks are thinking, though, about advocacy on this, and where there might be opportunities and whatnot. I want to point out that some of the questions are being answered in the Q. And A. Itself, and I also think for some of these we could maybe summarize them and maybe provide written answers afterward. Perhaps I also wanted to note that one of our questions was going to be around. How folks can, you know, stay up to date on all these cases. But Rachel asked a 211 00:48:31.220 --> 00:48:56.080 Katherine Klosek (ARL): related but better question around sort of what types of resources we can develop and share to facilitate conversations with scholars. So I think maybe that's something we can think about and work on. Maybe some, you know, talking points for librarians or takeaways from the cases that you can use in those conversations. So if folks have, you know, thoughts on resources or support that would be useful, definitely, let us know. But I think that sounds 212 00:48:56.080 --> 00:49:22.740 Katherine Klosek (ARL): that's a great suggestion, and going to keep it rolling with questions from the audience. Dave would love to take a crack at this one. Libraries and librarians share some interest with big tech and with content creators. And on copyright. We've historically, or you know, we often align with big tech. But AI raises a number of other issues which Yun, Chao and John and Dave have really raised already. So what are the key points of difference between 213 00:49:22.740 --> 00:49:29.460 Katherine Klosek (ARL): library interest and big tech in copyright and those other policy areas. What might those areas be? Dave. 214 00:49:29.970 --> 00:49:37.889 Dave Hansen: Sure I'd happy to. I'd be happy to answer that and on the resource question my 1st thought of what to do there is to ask Rachel Sandberg 215 00:49:37.890 --> 00:50:02.720 Dave Hansen: for her help to draft those things. So I thought this was a really interesting question. I wanted to answer it because of the way it was framed, and I do think that often it feels like the library community and sort of the community of authors who really cares about sort of the public interest we have to go and pick a side right? And it's like, do we care about the authors and creators and copyright holders today? 216 00:50:02.720 --> 00:50:14.220 Dave Hansen: Or are we aligning ourselves with big technology companies? Because, you know, they seem to be driving a lot of the narrative around what happens with policy here and 217 00:50:14.280 --> 00:50:40.880 Dave Hansen: I get why that happens. But I think that you know, from our perspective, at least, my perspective. What I really care about is the ability of people to continue to create, to continue to do research uninhibited by overaggressive copyright law and other laws, while also maintaining the ability to continue to actually publish those things and have them distributed, and copyright is an important 218 00:50:41.010 --> 00:51:03.890 Dave Hansen: part of that ecosystem as well. And as I'm looking at this space, I think one of the things that gets really lost in here is we have these lawsuits that are against some very big companies, Meta and Google and Openai, which is backed by Microsoft Anthropic is actually like the smallest of any of these. They have a market cap of 60 billion dollars. 219 00:51:03.890 --> 00:51:12.279 Dave Hansen: which sounds really big until you compare it to some of those other companies, and the reality is that if 220 00:51:12.280 --> 00:51:15.849 Dave Hansen: if we have a future that demands 221 00:51:15.850 --> 00:51:40.330 Dave Hansen: licensing for access to every single work for AI training. The only companies that benefit from that are the companies that can afford to either purchase those permissions or just obtain them in other ways. I mean, remember, some of these companies. Meta has millions of users who upload content for free every day. That Meta is training on Google has an incredible 222 00:51:40.330 --> 00:52:04.009 Dave Hansen: wealth of content being voluntarily given to it by users every day, and Google says in their terms of service, they can use that to improve their services and improve their products, presumably that includes with AI as well. So if we think about what these very, very large companies, with ready access to licensed permission, content 223 00:52:04.010 --> 00:52:09.899 Dave Hansen: and the ability to pay for additional content. If we think about that kind of environment. 224 00:52:10.100 --> 00:52:15.409 Dave Hansen: then I think it really behooves us to say, Okay. 225 00:52:15.910 --> 00:52:35.540 Dave Hansen: great. They can do some of this, but we probably don't want to live in a world where only those companies can do this. We want researchers to be able to engage in this space. We want startup competitors to engage in this space. And so I think that aligns us not so much with big tech, but with freedom to research and fair use. So 226 00:52:35.540 --> 00:52:51.599 Dave Hansen: I guess what I'm doing is sort of rejecting the premise of the question and saying, it's those principles that I think are important to us, and if that happens to align with some of the positions that some of these tech companies are taking in litigation today. 227 00:52:51.730 --> 00:52:57.039 Dave Hansen: So be it. But I think it's a much longer term perspective that we're taking. 228 00:52:57.990 --> 00:53:03.230 Katherine Klosek (ARL): And Yun Chao. Thanks, Dave and Yun Chao. You added some thoughts on this in the Q&A but please elaborate. 229 00:53:03.530 --> 00:53:29.530 Yuanxiao Xu: Yeah, I wanted to add a concrete example on where big tech and the research community, the library community may differ greatly, which is open access. So the big tech they have pretended like they're releasing a lot of models under open access structure. But what they are actually releasing are just the parameters and weights of a finished product. 230 00:53:29.530 --> 00:53:44.030 Yuanxiao Xu: and no human, no matter how great a programmer you are, can make sense of those weights and parameters of a trained AI model. So you're just looking at nonsensical numbers, numerical sequences. 231 00:53:44.070 --> 00:54:06.450 Yuanxiao Xu: and there's nothing you can do as a researcher to make sense of how an open model like Llama, can be used or can be fine-tuned further, based on that information alone. So a true open access model would at least include what training materials have been used and how the fine tuning was done. 232 00:54:08.290 --> 00:54:24.569 Yuanxiao Xu: so the research community, in order to exercise more oversight and to control the kind of responsible AI that can be produced would want to advocate for true open access models. 233 00:54:26.910 --> 00:54:54.700 Katherine Klosek (ARL): Thank you. A participant has pointed out that, in fact, the audience cannot see the Q&A. Or the answers. So I think what we'll do is we'll share the recording and transcript out with everyone who registered along with the questions and answers of the questions that have been answered. Oh, okay, some. Some might be able to see them, and some cannot. Conflicting information either way. We'll make sure everybody has all of the information that has been shared here. 234 00:54:55.407 --> 00:54:57.500 Katherine Klosek (ARL): Okay, sorry. 235 00:54:57.660 --> 00:55:18.310 Katherine Klosek (ARL): We are running out of time, and we have a lot of questions to get to. But, John, did you want to speak. We had a question that was submitted ahead of time about whether these decisions impact the potential for publishers and authors to receive compensation for their works. Do you want to touch on that? That seems important. 236 00:55:18.800 --> 00:55:20.133 Jonathan Band: Sure. So 237 00:55:22.920 --> 00:55:49.620 Jonathan Band: the the authors and the publishers can always receive compensation. Meaning there's there's going to be. There are currently licensing arrangements underway. And those are going to continue regardless of how these fair use cases come out. And even, you know the nuances of, you know. Is it a shadow library, or is it just on the open web, which is again 238 00:55:49.810 --> 00:55:52.660 Jonathan Band: a critical distinction? But but 239 00:55:53.300 --> 00:56:06.370 Jonathan Band: there! There are all kinds of reasons why, again, a a large commercial firm would want to enter into a licensing arrangement, in part, because even if, even if 240 00:56:06.430 --> 00:56:25.789 Jonathan Band: arguably, or you know again, it's going to take 10 years or more, I would say, to know for sure what is fair use and what isn't fair use. And so, if you have the money. Why wouldn't you enter into a licensing arrangement with a few publishers 241 00:56:26.010 --> 00:56:33.980 Jonathan Band: who, you think have useful content, and and you want to train on their materials? Also. 242 00:56:34.140 --> 00:56:52.649 Jonathan Band: you know, there's there's training, and there's training. And and you know you have pure training. But then you also might. What happens if you want to use expression in the outputs which could be very, very useful. Certainly you're going to want to enter into a licensing arrangement with 243 00:56:52.880 --> 00:57:07.859 Jonathan Band: a publisher who has, whose whose work you're going to want to incorporate in your outputs. For example, the New York Times. I am confident that you know you know the New York times is, you know, it's it's it's litigating with the 244 00:57:07.860 --> 00:57:27.150 Jonathan Band: with the AI firms. But the point is, they were before they filed litigation. They were trying to work out a licensing arrangement. They couldn't work it out. And so the American solution to that is, you file suit. And now I'm sure they're still talking, and I wouldn't be surprised if they will ultimately 245 00:57:27.250 --> 00:57:34.509 Jonathan Band: reach an agreement, or at least the New York Times might reach an agreement with one company. 246 00:57:34.970 --> 00:57:53.989 Jonathan Band: and they will then make a lot of their content available not only for training, but also for use and outputs. Because that's what you know. That's what the companies really want to do right. I mean, you want to be able to in your results. Say, the New York Times says such and such about 247 00:57:54.290 --> 00:58:11.350 Jonathan Band: you know about, you know the so so-called agreement on tariffs between the Us. And the EU, which, of course, is not really an agreement. It's an agreement to agree at some point in the future, with details to be worked out. But you know, you're going to want to be, you know. 248 00:58:13.230 --> 00:58:21.910 Jonathan Band: there, there's going to be a desire to be able to include those, not only the facts, but also the expression 249 00:58:22.367 --> 00:58:39.160 Jonathan Band: and and so there's, you know. But but again, you don't need to do that with everyone, you know. Maybe just the New York Times, or just a few newspapers or a few other publishers. So certainly there'll be a desire to include the expression. And so. 250 00:58:39.660 --> 00:58:45.749 Jonathan Band: even if the training you know the training issue aside, there's a lot of opportunities for licensing. 251 00:58:46.360 --> 00:59:10.860 Katherine Klosek (ARL): Thank you. We have less than a minute left, so I will wrap us up. Unfortunately that went by really quickly. I really appreciate all of the expertise that was shared here, and I'm really excited by all the questions. So I do want to make sure, you know, like I said, we'll make sure to share those out, and you will hear more from Authors, Alliance and arl on these cases. And these issues. Thanks again for your attention for your expertise. And yeah, more to come. Thank you.