WEBVTT

1
00:00:04.490 --> 00:00:06.389
Katherine Klosek (ARL): Everyone's trickling in welcome.

2
00:00:11.650 --> 00:00:16.600
Katherine Klosek (ARL): All right, we're getting there. Welcome, everyone.

3
00:00:22.480 --> 00:00:29.950
Katherine Klosek (ARL): Okay, just watching the numbers tick up as, okay.

4
00:00:33.930 --> 00:00:44.970
Katherine Klosek (ARL): sweet. Okay, I think we're there so welcome everyone. I'm Catherine Klosek, Director of Information policy and Federal Relations at the Association of Research Libraries. Arl.

5
00:00:44.970 --> 00:01:05.240
Katherine Klosek (ARL): I want to thank you all for being here, and thanks to Authors, alliance for co-hosting this timely conversation, as we're starting to see decisions made in the courts about fair use and generative AI. The Library Copyright Alliance put out principles on copyright, and AI back in 2023, and we knew that litigation would shape the legal landscape for generative AI,

6
00:01:05.239 --> 00:01:23.629
Katherine Klosek (ARL): and there is a lot more litigation pending in district courts. But today we're looking forward to hearing about the the early decisions that we've seen so far, and their takeaways for researchers, authors, and libraries from our expert panelists, including John Bam, who's copyright counsel for Arl and the Library Copyright Alliance.

7
00:01:23.740 --> 00:01:35.200
Katherine Klosek (ARL): Dave Hansen, Executive Director of Authors Alliance and Yun Chao Xu, Staff Attorney at Authors Alliance. Thanks again to those who submitted questions ahead of time for the panelists

8
00:01:35.200 --> 00:02:00.689
Katherine Klosek (ARL): for all viewers you can use or viewing live. You can use the Q. And a function on Zoom to submit your questions for the panel, and we're going to leave time at the end for discussion. So enough from me. Let's hear from our panelists. Yan Chao, would you mind kicking us off by just summarizing the cases, and maybe highlighting places where there was consistency in the cases, and where the judges diverged.

9
00:02:01.780 --> 00:02:02.560
Yuanxiao Xu: Thank you.

10
00:02:02.770 --> 00:02:22.699
Yuanxiao Xu: So thank you all for being here today. You probably have noticed these disputes over AI training are forcing us to ask some very big questions, why do human authors create new works? And how do we advocate for these authors, especially the ones we're counting on to start creating in the future

11
00:02:22.870 --> 00:02:37.839
Yuanxiao Xu: as you listen and participate in today's discussion, I'd like to ask you to keep those questions front of mind, because the answers to these questions affect all current and future authors and creators. That is all of us.

12
00:02:38.210 --> 00:02:46.480
Yuanxiao Xu: So at the center of these cases is fair use. A proud tradition of US. Copyright law since 1841

13
00:02:46.700 --> 00:03:01.159
Yuanxiao Xu: fair use is what allows scholars to quote books and artists to create parodies. It's a 1st amendment right designed to ensure that copyright serves human creativity instead of just monopolistic corporate interest.

14
00:03:01.690 --> 00:03:14.559
Yuanxiao Xu: So now 2 opinions, handed down just 2 days apart have become the 1st court decisions on whether training, general purpose, large language models, or Llms can qualify as fair use.

15
00:03:15.190 --> 00:03:27.559
Yuanxiao Xu: Let's be clear, first, st that these are summary judgments. They are early stage rulings. That means the courts had to interpret disputed facts in light most favorable to the plaintiffs

16
00:03:27.670 --> 00:03:52.279
Yuanxiao Xu: and some of the judges. Comments in the issued opinions are just their musings, not even holdings. For example, in Kadri v Meta, the market dilution theory was never even raised by the plaintiffs in their pleadings, and its 10 page long. Analysis, taking up 1 4th of the entire opinion, had no bearing on the judge's ultimate holding in the summary judgment.

17
00:03:52.640 --> 00:04:04.810
Yuanxiao Xu: But even these musings matter, because how we interpret critique and respond to these musings will shape, how copyright evolves for all authors and for the public.

18
00:04:04.930 --> 00:04:13.840
Yuanxiao Xu: So I want to really thank Catherine today for convening this panel. These are exactly the kinds of conversations we need right now.

19
00:04:14.500 --> 00:04:37.780
Yuanxiao Xu: I want to make sure we're all on the same page about what these cases are actually about. So I'll start by briefly explaining how large language models are trained, and then I'll walk through the 2 cases, when we talk about training a large language model like Claude or Metaslama, we're referring to the process of feeding huge amounts of text to an algorithm.

20
00:04:37.800 --> 00:04:48.550
Yuanxiao Xu: The model uses this text to detect patterns in language and predict what comes next in a sentence. The result is not a searchable database.

21
00:04:48.660 --> 00:05:00.970
Yuanxiao Xu: What what's produced is an Llm. Made up of numerical weights and parameters, a system that can generate language based on statistical patterns, not stored passages.

22
00:05:01.920 --> 00:05:09.160
Yuanxiao Xu: Now, with that basic understanding in mind. Let's look at the 2 lawsuits that are putting this process under legal scrutiny.

23
00:05:09.450 --> 00:05:36.730
Yuanxiao Xu: The 1st case is Bart's V anthropic. 3 authors are suing anthropic for using their works to trans cloud models, but they are not just suing for themselves. The court recently certified a class that could include millions of authors whose works were found in 2 shadow libraries. Libgen and Pylemy. Whether class certification on that scale is problematic is a question we can maybe discuss later.

24
00:05:36.890 --> 00:05:44.850
Yuanxiao Xu: The second case is called Cadre V. Meta. It involves 13 authors doing Meta for using their books to train Lamma models.

25
00:05:45.160 --> 00:05:53.299
Yuanxiao Xu: Both of these lawsuits focus exclusively on the training of Llms, not its possible outputs.

26
00:05:53.500 --> 00:05:58.020
Yuanxiao Xu: so not the specific text. These ais are capable of producing.

27
00:05:58.640 --> 00:06:03.780
Yuanxiao Xu: The plaintiffs in both cases argue that the training itself is infringing.

28
00:06:03.890 --> 00:06:12.329
Yuanxiao Xu: and that the training will eventually lead to a flood of AI generated texts that indirectly compete with their original works.

29
00:06:13.200 --> 00:06:23.569
Yuanxiao Xu: So now that we've covered the basics. I want to turn to what has probably gotten the most public attention in both of these cases the allegations of piracy.

30
00:06:23.810 --> 00:06:30.050
Yuanxiao Xu: You've likely seen headlines claiming that Meta and anthropic stole millions of books.

31
00:06:30.370 --> 00:06:34.210
Yuanxiao Xu: And here's what we know from the pleadings in these cases.

32
00:06:34.390 --> 00:06:45.789
Yuanxiao Xu: In Barts, anthropic, allegedly obtained millions of books. Some are pirated, some lawfully purchased, and then, according to the court, destructively scanned.

33
00:06:46.280 --> 00:06:50.429
Yuanxiao Xu: and they use these books to build a massive internal library.

34
00:06:50.600 --> 00:06:55.679
Yuanxiao Xu: From there subsets were cleaned, tokenized, and used for model training.

35
00:06:56.150 --> 00:07:13.159
Yuanxiao Xu: The court found it troubling that anthropic couldn't confirm which of the 7 million pirated titles were actually used in training from the judge's perspective. At least some of the copying may have lacked a fair use, justification, because they were never intended for training.

36
00:07:14.020 --> 00:07:27.829
Yuanxiao Xu: What made matters worse for anthropic is that it refused to explain its internal data usage during discovery. So the court didn't waste its time speculating what these pirated books may have been used

37
00:07:29.050 --> 00:07:39.359
Yuanxiao Xu: and just said, there's no justification to say it's conclusively fair use. By contrast. Meta apparently used fewer pirated books, but

38
00:07:39.540 --> 00:07:45.430
Yuanxiao Xu: worse for them. They uploaded these pirated books through peer-to-peer file sharing system.

39
00:07:46.100 --> 00:07:48.630
Yuanxiao Xu: This is a whole different red flag.

40
00:07:49.140 --> 00:07:59.469
Yuanxiao Xu: The Cadre Court speculated that Meta's action may have supported the piracy ecosystem itself, which could be relevant to secondary liability down the road.

41
00:07:59.860 --> 00:08:17.840
Yuanxiao Xu: But here's what's important to stress. Piracy was not resolved in either opinion. No party asked the courts to decide whether the pirating was unlawful under traditional infringement theories. These questions remain open and will be dealt with later in litigation.

42
00:08:17.940 --> 00:08:22.310
Yuanxiao Xu: So what did the courts actually hold in these summary judgment opinions.

43
00:08:22.600 --> 00:08:32.740
Yuanxiao Xu: despite the piracy concerns. Both courts agreed that training Llms on books as the plaintiffs described, it is highly transformative

44
00:08:32.919 --> 00:08:38.249
Yuanxiao Xu: piracy was not a factor in the fair use analysis, at least. At this stage

45
00:08:38.600 --> 00:08:51.669
Yuanxiao Xu: both courts courts rejected the idea that plaintiffs are entitled to a new market for licensing books specifically for AI training. No such market currently exists, and even if it did.

46
00:08:51.770 --> 00:08:57.230
Yuanxiao Xu: the court said, copyright law doesn't require transformative uses to be licensed.

47
00:08:58.360 --> 00:09:08.430
Yuanxiao Xu: and crucially because the plaintiffs failed to identify any concrete market harm. Both courts held that training Llms qualified as a fair use.

48
00:09:08.870 --> 00:09:21.680
Yuanxiao Xu: The courts also remind us that the purpose of copyright law is not to reward copyright holders. It is to benefit the public by promoting the creation and sharing of creative works.

49
00:09:22.570 --> 00:09:29.119
Yuanxiao Xu: But the 2 courts also diverged in deeper assumptions about the AI technology

50
00:09:29.260 --> 00:09:44.520
Yuanxiao Xu: on the value of AI. The Cadric Court takes a very bleak view. It describes Llms as low quality imitators that will displace less famous human authors and reduce incentives for future creativity.

51
00:09:44.630 --> 00:09:51.230
Yuanxiao Xu: It portrays Llms as capable of autonomously flooding the market without human supervision.

52
00:09:51.850 --> 00:10:07.630
Yuanxiao Xu: The Bards Court, on the other hand, believes that Llms will likely enhance human creativity, helping under-resourced authors refine their writing or explore new ideas. Humans remain central to the Bards Court's analysis

53
00:10:07.970 --> 00:10:22.940
Yuanxiao Xu: and on market harm. The Cadre Court proposes a very novel novel theory of market dilution, suggesting that AI might harm writers by flooding the creative market with countless AI generated books.

54
00:10:23.340 --> 00:10:31.999
Yuanxiao Xu: The Bards Court did not believe machine-generated content represented the kind of competition that copyright law is designed to prevent.

55
00:10:33.040 --> 00:10:46.559
Yuanxiao Xu: If these cases make you feel uncomfortable, you are not alone. The discussion of AI training exposes some long-standing tensions that copyright law has failed to resolve. On the one hand.

56
00:10:46.560 --> 00:11:04.029
Yuanxiao Xu: major rights holders like Amazon, audible Disney Netflix and major academic publishers are pushing us toward a pay-per-use model one that maximizes control restricts access and monetizes every interaction we have with the creative work.

57
00:11:04.430 --> 00:11:21.969
Yuanxiao Xu: At the same time, tech companies are building these transformative technologies behind closed doors with minimal public oversight and no clear pathway for how the broader public might benefit from these tools beyond just being treated as passive consumers

58
00:11:22.400 --> 00:11:33.570
Yuanxiao Xu: and caught in the middle are independent creators, researchers, educators, libraries, and everyday users. The very communities that fair use was meant to protect.

59
00:11:33.790 --> 00:11:42.749
Yuanxiao Xu: Yet much of today's policy. Conversation is focused on crafting profit, sharing deals between tech giants and major rights holders

60
00:11:43.070 --> 00:11:47.339
Yuanxiao Xu: a solution that does little to address the deeper structural problems.

61
00:11:47.520 --> 00:11:55.899
Yuanxiao Xu: Instead, in my opinion it risks entrenching monopolies and pushing actual human authors, voices further to the margins.

62
00:12:00.030 --> 00:12:19.690
Katherine Klosek (ARL): Thank you. I think that was really useful kind of table setting, and I think you gave us some really useful perspective about the cases kind of what the courts actually said, kind of where we are procedurally, and the questions that remain open so really appreciate that. And I'm sure we'll elaborate on these and other points as we go along.

63
00:12:19.690 --> 00:12:37.169
Katherine Klosek (ARL): Turning to John, so some copyright lawyers have said that these 2 decisions don't change anything about the fair use doctrine that authors, libraries, and others rely on. So tell us, then, what can authors, researchers, and librarians who support research and scholarship take away from these decisions.

64
00:12:38.020 --> 00:12:39.313
Jonathan Band: Thank you.

65
00:12:40.190 --> 00:12:46.051
Jonathan Band: So I'd say at the at first, st just as as young Xiao

66
00:12:46.940 --> 00:12:51.219
Jonathan Band: was saying, that you know these are just 2

67
00:12:51.510 --> 00:12:58.180
Jonathan Band: decisions out of more than 40 cases that are being litigated right now in this area.

68
00:12:58.470 --> 00:13:06.960
Jonathan Band: And they're also early phases like they're on summary judgment. And there's a long way forward in both of these cases.

69
00:13:07.090 --> 00:13:12.440
Jonathan Band: and they will talk a little bit more about the class certification part in one of the cases. But it's

70
00:13:12.570 --> 00:13:16.260
Jonathan Band: you know these cases are going to be appealed, and then they'll

71
00:13:16.710 --> 00:13:25.019
Jonathan Band: you know, I'm sure. You know they'll there'll be petitions to the Supreme Court, and then it'll go back. There's

72
00:13:25.300 --> 00:13:32.119
Jonathan Band: litigation in this area is going to continue for a long time. And so these are

73
00:13:32.660 --> 00:13:45.149
Jonathan Band: 2 of the 1st cases. So everyone's focusing on them. But it's it's important that we don't get all worked up because there's going to be a lot more coming down the pike.

74
00:13:45.180 --> 00:14:07.580
Jonathan Band: And you know, at some point, you know, 5 years or 10 years from now we'll barely remember the names of these decisions, even though we're all so focused on them. I mean, we'll what we'll remember is the name of the decision that you know where the Supreme Court or the several decisions where the Supreme court ultimately

75
00:14:07.580 --> 00:14:18.879
Jonathan Band: tackle some of these issues and resolve them. And and we will forget these District court decisions unless these happen to be the the court. The cases that the Supreme Court decides.

76
00:14:19.090 --> 00:14:22.120
Jonathan Band: It's also important to remember that

77
00:14:22.410 --> 00:14:36.950
Jonathan Band: you know a major factor of how these cases ultimately get resolved is the position of the Administration, and so far it seems that this administration is generally viewing

78
00:14:37.340 --> 00:14:41.730
Jonathan Band: training and the issues relating to training is fair use.

79
00:14:42.570 --> 00:15:07.059
Jonathan Band: it wasn't explicitly stated in the AI Action plan that came out last week. But then President Trump, in a subsequent speech, basically said to the extent one can understand what he said. But he basically said that you know, basically, training has to be a fair use doesn't make sense to have any other approach.

80
00:15:07.576 --> 00:15:29.399
Jonathan Band: And you know that will have an influence on the courts as they go forward. There's no question, you know. I don't think that there's going to be a judge who's really going to feel like be responsible, or even the Supreme Court that will ultimately feel like they should be responsible for, you know, sort of destroying this sector of the economy.

81
00:15:31.420 --> 00:15:34.750
Jonathan Band: Now, turning to so so.

82
00:15:35.830 --> 00:15:56.849
Jonathan Band: you know, there's a long way down. And so I'm sort of suggesting what these cases decide. That doesn't matter that much. Don't worry about it one way or the other. However, we have to deal with these cases until the Supreme Court decides. You know, we're going to have to live with these lower court decisions, whether it's at the District Court level or the Court of Appeals level.

83
00:15:57.000 --> 00:16:04.880
Jonathan Band: And so then there's 2 points I want to stress. And this gets to Catherine's question about.

84
00:16:05.060 --> 00:16:08.520
Jonathan Band: you know, fair use and so forth, is, 1st of all,

85
00:16:09.900 --> 00:16:15.280
Jonathan Band: again, it's important to remember. These cases arose in the context of

86
00:16:16.196 --> 00:16:25.769
Jonathan Band: commercial AI firms. Right? So it's anthropic. And Meta. And so

87
00:16:25.910 --> 00:16:28.729
Jonathan Band: the analysis likely would be very different

88
00:16:28.910 --> 00:16:33.430
Jonathan Band: if you were dealing with non-commercial players. So let's say.

89
00:16:34.340 --> 00:16:48.639
Jonathan Band: a scientific researcher at a university. You know what they do. You know, they they should be aware of what's going on in these decisions, but I'm not sure they should necessarily worry too much about them.

90
00:16:49.140 --> 00:17:00.290
Jonathan Band: because I think the analysis would be quite different in particular with respect to the shadow libraries so clearly, you know again, here, you know

91
00:17:00.700 --> 00:17:25.420
Jonathan Band: the in the Barts case. The court was very disturbed by the accessing of the shadow libraries and the downloading in the Meta case. Less so, but I would think in both cases it would be different if you were dealing with commercial research, non-commercial researchers who were accessing the shadow libraries because

92
00:17:25.660 --> 00:17:28.200
Jonathan Band: they simply didn't have the budget to

93
00:17:28.540 --> 00:17:32.020
Jonathan Band: buy the books or buy licenses for the books

94
00:17:32.567 --> 00:17:45.780
Jonathan Band: whereas and and they were doing it for research purposes. So so you know, I think that that's important to stress, however, to the extent that you're going to have joint ventures between

95
00:17:46.440 --> 00:17:50.810
Jonathan Band: academic researchers and commercial firms.

96
00:17:51.558 --> 00:18:08.609
Jonathan Band: You know, I would say, yeah, stay away from the shadow libraries, because it's very, very risky at this point to train on them, I would think again.

97
00:18:08.670 --> 00:18:25.410
Jonathan Band: This is, you know, if you're purely academic, maybe not so much of a risk, but if you're either commercial or a joint venture, it just seems that it's it's very risky to use the shadow libraries.

98
00:18:25.838 --> 00:18:34.650
Jonathan Band: And then the second point, you know. But but we're going to hear a lot more about that, and in particular, in in future cases, because

99
00:18:35.150 --> 00:18:54.290
Jonathan Band: some of the quite a few of the commercial firms did rely on the shadow. So we know anthropic. We know Meta Openai, all probably trained on shadow libraries. On the other hand, Google probably didn't, because they have their own database

100
00:18:54.330 --> 00:19:11.029
Jonathan Band: from the from the Google Books case. But again, that remains to be seen. But but I think we're going to hear a lot more about that in the future. And then then, finally, the market dilution point

101
00:19:11.320 --> 00:19:13.839
Jonathan Band: that we just heard about.

102
00:19:13.960 --> 00:19:15.730
Jonathan Band: So this is a

103
00:19:19.020 --> 00:19:24.479
Jonathan Band: Interestingly, again, there was a split between these 2 cases of the

104
00:19:25.450 --> 00:19:48.689
Jonathan Band: Judge Alsup sort of rejected in the in the, in the Bart's case sort of rejected it said, this doesn't make any sense as we just heard. It doesn't have anything to do with traditional copyright law, and even the copyright office which sort of endorsed this market dilution theory, described it as a theory, you know, sort of like uncharted territory, right? That this is a new theory.

105
00:19:48.770 --> 00:20:01.600
Jonathan Band: and and even Judge Shabria in the In the cadre, met a decision. He sort of acknowledged. He said, Yeah, you know this is uncharted territory, but you know.

106
00:20:01.700 --> 00:20:19.910
Jonathan Band: so is, oh, so is AI right. The whole thing is uncharted. We're making it up as we go. And so, you know, he was saying that market dilution could be an appropriate theory. So we're going to be hearing a lot about market dilution going forward. I mean, I personally think that it is.

107
00:20:20.050 --> 00:20:28.670
Jonathan Band: I agree with Judge Alsup that it is very problematic from a traditional

108
00:20:28.990 --> 00:20:37.090
Jonathan Band: copyright perspective. I mean, this is not the I mean the whole point of the fair use. Doctrine and copyright generally is to promote competition

109
00:20:37.720 --> 00:21:04.670
Jonathan Band: with when you're not creating things that are substantially similar. And here but here this is exactly this flooding of the market would be with things that are not substantially similar to the original work. That's what copyright I mean. That's that's what you're supposed to be able to do so. The market dilution issue really, a theory really goes to the heart of that. And it's also, I would say, problematic

110
00:21:05.510 --> 00:21:15.569
Jonathan Band: in terms of, and even the the judge judge, who sort of endorsed the theory. He sort of talked about

111
00:21:15.710 --> 00:21:36.269
Jonathan Band: all the stuff, all the things that you need to do to prove it, and and he acknowledged that the plaintiffs in this case, meaning well-known authors. Their works aren't going to be harmed. I mean, they have sort of like a brand. It's it's the harm is going to be towards the new people, the undiscovered people.

112
00:21:36.320 --> 00:21:47.279
Jonathan Band: And it seems to me that that's just completely speculative. And how you go about proving that is going to be impossible. And it really and and this goes to, you know, sort of what young young Charles

113
00:21:47.410 --> 00:21:58.180
Jonathan Band: Point at the end of her presentations, like. There's a lot of issues here that ultimately have nothing to do with copyright law. Fair use. I mean, these are going to be issues relating to

114
00:22:01.050 --> 00:22:08.579
Jonathan Band: you know what our, you know, cultural heritage and cultural promotion and our our cultural policy

115
00:22:08.710 --> 00:22:11.389
Jonathan Band: things that really go way way beyond

116
00:22:11.600 --> 00:22:21.150
Jonathan Band: copyright law. And you know, maybe maybe it's going to be important in the future to have more funding for young authors and young creators

117
00:22:21.290 --> 00:22:24.099
Jonathan Band: could be hard to imagine.

118
00:22:24.810 --> 00:22:36.060
Jonathan Band: Certainly this administration and other administrations really funding that kind of thing. But that might be the what we really need. And but but again.

119
00:22:36.370 --> 00:22:39.829
Jonathan Band: copyright is not. A

120
00:22:39.930 --> 00:22:46.999
Jonathan Band: should not be used in place of a cultural policy or a policy of promoting culture.

121
00:22:47.500 --> 00:23:14.269
Katherine Klosek (ARL): Well, thanks, John. I think you maybe help bring the temperature down a little bit and remind us that you know these cases have a way to go, and I think you touched on some, you know practical takeaways for our audience as well. So appreciate that Dave would love to hear from you. So I know you've written about that. Getting a court to certify a class is a big deal. And so what does it mean? That judge also has said that authors can bring a class action in the Barts case. Tell us about that.

122
00:23:14.780 --> 00:23:39.099
Dave Hansen: Sure. Thank you, and thank you everybody for joining. I was really pleased to see how many people are on the call, so what I thought I would do is give you like a short primer on class action law which don't fall asleep on me. It is a lot more interesting in these cases than it might 1st appear. And it's really become a prominent issue. This issue of

123
00:23:39.100 --> 00:23:45.299
Dave Hansen: a class being certified in the Barts case because the judge has already put his stamp of approval on that.

124
00:23:45.550 --> 00:24:09.210
Dave Hansen: So lots of you have probably heard about class actions before. The idea with a class action is that you can have a small group of plaintiffs who sort of represent a larger group of people who are similarly situated and similarly harmed by some action that someone else has taken so they can sue on their behalf. Lots of you, for instance.

125
00:24:09.210 --> 00:24:32.399
Dave Hansen: probably own a car that at some point has been subject to a class action lawsuit, where you know, the manufacturer did something that was wrong, and and they had to go fix it, and rather than having you know everybody who bought that car bring a separate lawsuit. There's this mechanism in the law that allows for a handful of people to bring that suit on behalf of everyone. And then.

126
00:24:32.400 --> 00:24:56.200
Dave Hansen: you know, bring that issue to resolution. So that's the kind of basic function of a class action. And so one of the things that we've seen across a number of these AI suits there are over 40, as John said, over 40 that have been filed, and a large number of them are class action lawsuits.

127
00:24:56.200 --> 00:25:18.820
Dave Hansen: and they're mostly being brought by a handful of writers or other creators along with law firms that are very experienced with class action lawsuits against some of these big technology companies. None of them have really progressed to the point of deciding

128
00:25:18.820 --> 00:25:43.650
Dave Hansen: whether those small group of creators can actually represent the whole body of other copyright holders that they claim. Except for this Barts lawsuit and this Barts lawsuit is a real unicorn compared to most of them. It was. It was not filed that long ago relative to the timeline. For many of these other cases Barts was filed in August of last year.

129
00:25:43.830 --> 00:26:08.839
Dave Hansen: and we are less than a year into this suit, and we've already got a decision on summary judgment on the fair use issue, and we already have a decision on whether the class representatives and the class should be certified. A year may sound like a long time, but that's like lightning fast compared to the speed that all of these other suits have gone. And

130
00:26:09.000 --> 00:26:33.239
Dave Hansen: and it's caused some real problems. I think I have a lot of criticism of how this has progressed because I don't think it's really allowed for adequate development of the facts or information needed to really certify the class in a responsible way. So so that gets us to what has actually happened here about a month and a half ago, or a little bit more than that, I guess. Now

131
00:26:33.520 --> 00:26:49.929
Dave Hansen: the plaintiffs in this suit said, we're asking the judge to certify the class, meaning that we are officially the plaintiffs in this case are officially allowed to represent all of the rights holders who have an issue

132
00:26:49.930 --> 00:27:14.569
Dave Hansen: in this suit where anthropic has trained on their books, and as Yun Xiao has talked about those books came from a number of different places. Some of them came from sources like Libgen. And how did you say it, Pilami? We were debating how to say this other data set but another kind of shadow library, pirate site. And then some of the books, actually anthropic.

133
00:27:14.570 --> 00:27:23.150
Dave Hansen: purchased copies, scanned them and then use them for AI training. So in Judge Alsop's decision in this case he

134
00:27:23.150 --> 00:27:51.350
Dave Hansen: sort of separated the analysis and said, If you're training, particularly using those books that were purchased and scanned a-okay. That looks fair use like fair use. But if you're just kind of accumulating, a central library is the term that he used, I don't love his use of the term library there. But this central data set of books obtained from Libgen and this other

135
00:27:51.440 --> 00:28:18.209
Dave Hansen: Shadow library, and just kind of holding on to it just in case which is what it appears anthropic did. They were holding on to these contents, presumably for future research and development or future work on their models, but they were just holding on to it. That's where the Court said. I can't grant summary judgment on that yet. Technically, that's still got to go to trial is what it looks like. But

136
00:28:18.380 --> 00:28:41.420
Dave Hansen: you know, all indications from the opinion are that the judge really does not believe that that use is fair use. So what happened is the court then certified this class, and said that the class includes all beneficial or legal copyright owners of the exclusive right to reproduce copies of any book in the versions of libgen or pylomide downloaded by anthropic.

137
00:28:41.440 --> 00:29:11.070
Dave Hansen: And so, if you sit there for a minute. And you think about what is in these data sets. There's about 7 million books represented in these data sets, and the judge has now pulled in all legal or beneficial copyright owners. That means people who actually are the copyright holders or people who get royalties from an exclusive licensing deal from those works, all of those covering about 7 million books.

138
00:29:11.070 --> 00:29:33.979
Dave Hansen: and he certified that that class could be represented by 3 authors, Andrea Bartz, who writes thrillers. Charles Graber, a nonfiction author who wrote books like The Good Nurse and The Breakthrough, published by Hachette and a 3rd author, Kirk Wallace Johnson, who's a nonfiction author

139
00:29:33.980 --> 00:29:50.429
Dave Hansen: as well, and those companies what's known as their loan out companies where they essentially just hold rights on their behalf. So those 3 authors in this suit represent essentially now the entirety of the publishing industry.

140
00:29:50.430 --> 00:30:14.399
Dave Hansen: and that includes academic authors that includes university presses. That includes probably some libraries that hold rights, that they've, you know, inherited or been gifted from authors, and so those 3 authors represent them in bringing the rest of this suit to trial, and potentially a judgment against anthropic.

141
00:30:14.600 --> 00:30:32.979
Dave Hansen: or, what is more likely to happen if this class certification stands is that they'll try to work out some sort of deal with anthropic. And that is the reason I think that's likely is because the level of damages that are at play in this suit are so astronomically large that

142
00:30:32.980 --> 00:30:53.339
Dave Hansen: with a class that's certified of 7 million or so books and copyright statutory damages which could be as high as $150,000 per work infringed. You're talking about billions and billions in potential liability. And so, you know, in from anthropics standpoint.

143
00:30:53.650 --> 00:31:05.390
Dave Hansen: if they allow that to go to trial, and it turns out very badly. That's the end of the company. They go bankrupt, they just have no capacity to handle that kind of liability. And so so this this

144
00:31:05.420 --> 00:31:29.739
Dave Hansen: decision could also mean that those class representatives now representing again the entirety of the publishing industry, are going to try to negotiate a settlement agreement. So I have lots of criticisms of this, I mean, one of my main ones is that you know, it's really troubling and really impossible, I think, for those 3 authors to adequately

145
00:31:29.740 --> 00:31:53.359
Dave Hansen: the interests of this broad of a group of creators and rights holders. They just have very, very different interests. I mean, if you talk to academic authors, for instance, I think many of them will have a very different perspective even on the very basis of the suit. For instance, the class includes many books written by AI researchers, text data, mining researchers who rely on the very same fair use rationale

146
00:31:53.360 --> 00:32:16.230
Dave Hansen: that anthropic has asserted in this case. So that's 1 issue. Another issue is, there's no real common set of issues like it's virtually impossible to identify with any level of precision all of the rights holders of these books. The closest analogy we have is the Google Books lawsuit. From

147
00:32:16.450 --> 00:32:39.040
Dave Hansen: about 10 to 15 years ago, and in the midst of that suit Google attempted to kind of craft, a mechanism to identify rights holders and clear rights, and at that time they indicated they thought they were going to have to spend. I think it was 34 million dollars just to set up the organization to kind of do that rights clearance work.

148
00:32:39.040 --> 00:33:02.059
Dave Hansen: So it's very expensive, very complicated. Congress has tried to resolve this issue in a number of cases with orphan works legislation. The Copyright Office has studied it so lots of people have looked at this rights identification issue and found it like very, very complicated and difficult. And the class certification that we have here just sort of glosses over it, as if you know.

149
00:33:02.060 --> 00:33:20.439
Dave Hansen: you just send out letters and post notices on the Internet, and that will resolve all of these issues. So that's the class certification issue. One thing I will note procedurally to keep an eye out for is that the class was certified.

150
00:33:20.700 --> 00:33:49.880
Dave Hansen: Let's see, not last Thursday, but the Thursday before, and anthropic does, under the Federal rules of civil procedure, have the option to petition the 9th circuit for an appeal, and this is a appeal that kind of happens midstream in the suit. They don't have to get permission from the judge judge also in this case, but they do have to persuade the 9th circuit to take that appeal.

151
00:33:50.080 --> 00:34:15.839
Dave Hansen: And I note that because the deadline for that is coming up very quick, they get basically 2 weeks to file such a petition for appeal to the 9th circuit given the stakes here, and what kind of liability this could mean for anthropic, I would put. The probability is high that they would try to attempt such an appeal to get the 9th circuit to do a quick look at this

152
00:34:16.050 --> 00:34:19.490
Dave Hansen: class certification issue, to see if they will unwind it.

153
00:34:21.340 --> 00:34:32.600
Katherine Klosek (ARL): Dave. Thanks for unpacking that I don't. I mean, I'll speak for myself. I usually think of like sort of like pharmaceutical cases, or whatever when we talk about class action. Obviously, there's a lot more at play here. So really appreciate you explaining that

154
00:34:32.600 --> 00:34:57.520
Katherine Klosek (ARL): just a note we did get a bunch of really good questions in. So maybe we can do our next round of prepared questions as a little bit of a lightning round, and then we can get to some of those audience questions. So Yun Chao back to you. The Cadre Court invoked the theory of market dilution. A few of you have already brought it up on today's webinar so, and we heard about it in the Us. Copyright office report as well the 3rd part of their report on copyright and AI which was

155
00:34:57.520 --> 00:35:05.989
Katherine Klosek (ARL): released as a pre-publication. Can you just talk about that theory? Explain it a little bit and and specifically why it can be problematic for research and scholarship.

156
00:35:06.910 --> 00:35:35.270
Yuanxiao Xu: I can't follow your instruction on a lightning round. There are so many problems with this, and I think we really need to pay attention to this. So 1st of all, we have been throwing around this word market dilution. What does it actually mean? So it's actually a brand new concept coined only this year. We see it 1st in the Us. Copyright Office Report, and now in Judge Chabria's cadre decision, both using it to suggest that training AI models

157
00:35:35.270 --> 00:35:42.679
Yuanxiao Xu: might fail. The 4th fair use factor which asks whether the use causes harm to the market for the original work.

158
00:35:42.900 --> 00:36:08.179
Yuanxiao Xu: Traditionally, that market harm factor has focused on cognizable market harm, meaning direct substitution. So if I publish a book that lifts large parts of your book, and people started buying my book instead of yours. That's clearly problematic and not a behavior we want to encourage. That's a clear market substitution.

159
00:36:08.520 --> 00:36:23.829
Yuanxiao Xu: But market dilution stretches that logic very thin. Judge Tabrea puts it this way, even if the AI generated work isn't a copy. If it's similar enough in subject matter or genre

160
00:36:24.050 --> 00:36:33.060
Yuanxiao Xu: that it competes with the original, then that's still enough market harm, because it indirectly substitutes for the original

161
00:36:33.180 --> 00:36:47.089
Yuanxiao Xu: as a favorite example repeated both by the copyright office and by Judge Shabria. If a reader buys a romance novel written by an Llm. Instead of one written by a human author, that's substitution.

162
00:36:48.200 --> 00:36:56.910
Yuanxiao Xu: That's the essence of the market dilution theory, and, as you can see, it dramatically expands the monopoly power copyright holders could claim

163
00:36:57.230 --> 00:37:16.330
Yuanxiao Xu: honestly. When I 1st saw this, it made me wonder if public libraries were invented today? Would Judge Shabria see it as market dilution, too? After all, libraries present millions of books to readers and arguably reduce individual book sales? Would that be enough to claim market harm?

164
00:37:16.790 --> 00:37:24.500
Yuanxiao Xu: Judge Sabria admits that market dilution is a novel theory, and that he's developing it specifically to address AI.

165
00:37:24.620 --> 00:37:39.399
Yuanxiao Xu: What he doesn't admit is how deeply flawed and dangerous this logic really is. So there are at least 4 unfixable problems with this theory. First, st many AI generated works actually incorporate human authorship.

166
00:37:41.151 --> 00:37:53.450
Yuanxiao Xu: It's weird that this market dilution theory treats Llm generated works as if they're machine only creations when in reality many involve substantial human input

167
00:37:53.660 --> 00:38:12.089
Yuanxiao Xu: books and essays created with the help of Llms are often the result of carefully crafted prompts and multiple rounds of human refinement. These are not generic outputs, auto-generated and uploaded for sale. To call such a work a substitute for an existing work is misleading

168
00:38:12.090 --> 00:38:32.040
Yuanxiao Xu: in many cases. What we are looking at is a new kind of creative process. Market dilution theory ignores this nuance and reality. It proposes to regulate human creativity, not because the expression is potentially infringing, but simply because an AI tool was part of the creative process

169
00:38:32.200 --> 00:38:40.950
Yuanxiao Xu: that's not protecting the future of human authorship. It's penalizing up and coming authors who want to leverage new technologies.

170
00:38:41.220 --> 00:38:45.519
Yuanxiao Xu: And, second, this market dilution theory ignores the law

171
00:38:45.760 --> 00:38:54.610
Yuanxiao Xu: to prove copyright infringement. The law requires substantial similarity in protected expression, not just theme, tone, or genre.

172
00:38:54.760 --> 00:39:01.990
Yuanxiao Xu: Yet market dilution places liability on the AI model, even though the model is not substantially similar to the books.

173
00:39:02.380 --> 00:39:22.990
Yuanxiao Xu: so the model is not even prima facie infringing, but is still treated as harmful under this market dilution theory. In essence, proponents of market dilution theory are trying to rewrite copyright law. Sidestepping carefully balanced rules to create a new form of liability for AI developers without statutory backing.

174
00:39:23.680 --> 00:39:44.079
Yuanxiao Xu: And, thirdly, market dilution imposes impossible evidentiary burden on AI developers. The Cadre Court, says AI. Developers can only escape liability if they conclusively prove their model, doesn't now, and will never in the future cause any decline in sales of a plaintiff's book used for training

175
00:39:44.550 --> 00:40:07.889
Yuanxiao Xu: that is just outright. Impossible for anyone to prove conclusively. So let's imagine for a moment that is not what Judge Chobria actually meant, even though it is what he says in his opinion, another more concrete guidance for this kind of evidentiary showing needed is as follows. So I quote directly from his writing.

176
00:40:07.890 --> 00:40:18.250
Yuanxiao Xu: the proper comparison isn't to a world with no Llms. But to a world where Llms weren't trained on copyrighted books.

177
00:40:18.680 --> 00:40:46.969
Yuanxiao Xu: So that sounds more manageable until you realize what it means in practice is that every time an AI company is sued it must be prepared to produce a control model trained only on public domain material for comparison. This alone can create a legal environment where only the richest players can afford to participate. Startups and independent researchers will be priced out entirely for this duplicative effort required.

178
00:40:47.300 --> 00:41:04.950
Yuanxiao Xu: and lastly, and most egregious, I think, is that this market dilution theory only serves a few already successful authors and big rights holders. Market dilution presents itself as protecting the incentive to create.

179
00:41:05.070 --> 00:41:19.020
Yuanxiao Xu: Judge Tabria also expressed deep concern for lesser-known authors, the writers, he says, the most vulnerable, as John already mentioned, but who actually benefits from applying this market dilution theory.

180
00:41:19.110 --> 00:41:36.429
Yuanxiao Xu: Only those who can one afford to hire economists as testifying experts at around $1,000 an hour to produce very fact-specific reports, and 2 to show quantifiable loss of sales tied directly to the use of their work in AI training.

181
00:41:36.550 --> 00:41:51.770
Yuanxiao Xu: In other words, the beneficiaries of this theory are not emerging writers and definitely not future authors. The very human creators and human creativity. This theory claims to protect will have no use of this theory.

182
00:41:52.000 --> 00:42:00.779
Yuanxiao Xu: And let's go back to the romance novelist example. What is her best path forward under this market dilution theory.

183
00:42:01.370 --> 00:42:07.830
Yuanxiao Xu: Should she try to trick Meta into using her work for training and then sue them for lost sales?

184
00:42:07.940 --> 00:42:19.149
Yuanxiao Xu: Or should she aim to write books specifically to be licensed to AI companies for training AI. Since Judge Shabria thinks that AI will inevitably crowd out her market.

185
00:42:20.320 --> 00:42:24.700
Yuanxiao Xu: It seems to be the only realistic redress

186
00:42:25.270 --> 00:42:31.420
Yuanxiao Xu: market dilution proposes. But is that the kind of creativity we want copyright law to promote

187
00:42:31.530 --> 00:42:51.569
Yuanxiao Xu: market dilution is being sold as a way to protect the public and empower authors. But it's like so much in today's copyright discourse, including things like the No Fakes Act. It's a Trojan horse. It promises to protect everyday creators, but only delivers profits to a handful of very selective groups of people.

188
00:42:51.570 --> 00:43:06.960
Yuanxiao Xu: So to conclude, unless you are a copyright lawyer, an expert for hire, a super successful and established author, a big rights holder, or a rich tech company. You really have no reason to like this theory of market dilution.

189
00:43:07.610 --> 00:43:37.299
Katherine Klosek (ARL): Thanks for explaining that so clearly. And I think hopefully, when we all see or hear that term, we'll have a better understanding of what it means, and maybe our ears will perk up a little bit. So thanks for breaking that down, John, over to you with a question about legislation calling a little bit of an audible. You were going to talk about the relationship between these cases and the recently introduced AI accountability and Personal Data Protection act. And I think, you know, we definitely want to hear about that. But there's also a question in the chat

190
00:43:37.300 --> 00:43:57.469
Katherine Klosek (ARL): about whether the President's remarks in the recent White House speech that you mentioned maybe give us an opportunity to codify. You know that training generative AI models on copyrighted works is fair use. So love to hear your thoughts on kind of the future and prospects of potential legislation in this area and on the bills that have been introduced so far.

191
00:43:58.910 --> 00:44:09.240
Jonathan Band: So. The the. With respect to the holly Bill, it's completely unrealistic. It! It is so sweeping that

192
00:44:09.873 --> 00:44:12.696
Jonathan Band: you know you you could never have

193
00:44:13.610 --> 00:44:20.000
Jonathan Band: any any sort of generative AI whatsoever, whether it's

194
00:44:20.430 --> 00:44:40.309
Jonathan Band: a commercial entity or a nonprofit. It's just the way it's it's it's worded. It would just be so so broad. And and it is sort of the latest in a succession of bills in the last Congress were many bills that would, you know, sort of talking about transparency requiring.

195
00:44:41.816 --> 00:44:53.250
Jonathan Band: AI developers to, you know, keep records and disclose what works they trained on and so forth, and and all of them, really.

196
00:44:53.390 --> 00:45:00.729
Jonathan Band: you know, would would have the effect of narrowing or restricting the ability of

197
00:45:00.850 --> 00:45:04.520
Jonathan Band: new entrants in the field and

198
00:45:04.800 --> 00:45:22.779
Jonathan Band: end up. Basically, you know, in a world where you know as a practical matter, you know, you know, you could have Google and Meta. They'd be the only ones who could afford it. And you know, smaller, newer companies like Anthropic or Openai. They can't do it.

199
00:45:22.890 --> 00:45:30.909
Jonathan Band: And and certainly academic researchers, they can't do it. It would just be Google, Meta. And I guess Microsoft,

200
00:45:33.200 --> 00:45:56.049
Jonathan Band: and and that's probably not a great result from a public policy point of view. Now, the the question of the chat was sort of like, okay, let's go the other direction. I mean, maybe maybe you know, now, we could codify this. And you know, codify some of the decisions. Or of course, we only have, you know, parts of

201
00:45:56.240 --> 00:46:17.690
Jonathan Band: you have these 2 decisions which are sort of at odd, you know, coalesce in some respects, and go in opposite directions in other respects. And you know, even to say, Okay, well, you would never get agreement. On which parts would you want to codify even of these 2 decisions? And as more time passes, there'll be more decisions that will be going in other directions.

202
00:46:17.890 --> 00:46:19.183
Jonathan Band: And so

203
00:46:20.150 --> 00:46:34.569
Jonathan Band: you know. You know. Yes, I suppose you you could imagine a world where you know the President basically says, Okay, this is what I want. And then all the Republicans would fall in line, and then they'd need to get.

204
00:46:34.690 --> 00:46:55.079
Jonathan Band: you know, just a handful of Senate Democrats, and maybe they could. But I, you know, given the politics, I would think that anything that you you would have to, with copyright legislation as a practical matter you'd need to get. You need to get the 60 votes in favor. You'd have to get across the

205
00:46:55.280 --> 00:47:03.460
Jonathan Band: the filibuster threshold. And I just even if you could conceivably convince.

206
00:47:05.270 --> 00:47:18.700
Jonathan Band: you know, I guess 11, or you know, you know 9 or 8, 8, or 9, or whatever the number of Democrats on the merits. I think this is a matter of principle. They would oppose it right? So it's it's, you know. One can

207
00:47:18.940 --> 00:47:25.900
Jonathan Band: talk about it. But I just don't see as a practical matter that will ever happen. So at least at this point.

208
00:47:26.020 --> 00:47:52.700
Jonathan Band: you know, we're going to have to just continue the slog through the courts, and you know, I think ultimately it will, you know, because I have faith in the courts, at least on copyright matters, maybe not in other areas, but in copyright matters, and I have faith in fair use, you know. I think ultimately we'll end up in a good place, but it's going to take a while. And along the way you're going to have.

209
00:47:52.780 --> 00:48:00.600
Jonathan Band: you know, there'll be certain issues which might be problematic in certain areas. For example, the Shadow library problem.

210
00:48:01.810 --> 00:48:31.220
Katherine Klosek (ARL): Thank you. I do love the way folks are thinking, though, about advocacy on this, and where there might be opportunities and whatnot. I want to point out that some of the questions are being answered in the Q. And A. Itself, and I also think for some of these we could maybe summarize them and maybe provide written answers afterward. Perhaps I also wanted to note that one of our questions was going to be around. How folks can, you know, stay up to date on all these cases. But Rachel asked a

211
00:48:31.220 --> 00:48:56.080
Katherine Klosek (ARL): related but better question around sort of what types of resources we can develop and share to facilitate conversations with scholars. So I think maybe that's something we can think about and work on. Maybe some, you know, talking points for librarians or takeaways from the cases that you can use in those conversations. So if folks have, you know, thoughts on resources or support that would be useful, definitely, let us know. But I think that sounds

212
00:48:56.080 --> 00:49:22.740
Katherine Klosek (ARL): that's a great suggestion, and going to keep it rolling with questions from the audience. Dave would love to take a crack at this one. Libraries and librarians share some interest with big tech and with content creators. And on copyright. We've historically, or you know, we often align with big tech. But AI raises a number of other issues which Yun, Chao and John and Dave have really raised already. So what are the key points of difference between

213
00:49:22.740 --> 00:49:29.460
Katherine Klosek (ARL): library interest and big tech in copyright and those other policy areas. What might those areas be? Dave.

214
00:49:29.970 --> 00:49:37.889
Dave Hansen: Sure I'd happy to. I'd be happy to answer that and on the resource question my 1st thought of what to do there is to ask Rachel Sandberg

215
00:49:37.890 --> 00:50:02.720
Dave Hansen: for her help to draft those things. So I thought this was a really interesting question. I wanted to answer it because of the way it was framed, and I do think that often it feels like the library community and sort of the community of authors who really cares about sort of the public interest we have to go and pick a side right? And it's like, do we care about the authors and creators and copyright holders today?

216
00:50:02.720 --> 00:50:14.220
Dave Hansen: Or are we aligning ourselves with big technology companies? Because, you know, they seem to be driving a lot of the narrative around what happens with policy here and

217
00:50:14.280 --> 00:50:40.880
Dave Hansen: I get why that happens. But I think that you know, from our perspective, at least, my perspective. What I really care about is the ability of people to continue to create, to continue to do research uninhibited by overaggressive copyright law and other laws, while also maintaining the ability to continue to actually publish those things and have them distributed, and copyright is an important

218
00:50:41.010 --> 00:51:03.890
Dave Hansen: part of that ecosystem as well. And as I'm looking at this space, I think one of the things that gets really lost in here is we have these lawsuits that are against some very big companies, Meta and Google and Openai, which is backed by Microsoft Anthropic is actually like the smallest of any of these. They have a market cap of 60 billion dollars.

219
00:51:03.890 --> 00:51:12.279
Dave Hansen: which sounds really big until you compare it to some of those other companies, and the reality is that if

220
00:51:12.280 --> 00:51:15.849
Dave Hansen: if we have a future that demands

221
00:51:15.850 --> 00:51:40.330
Dave Hansen: licensing for access to every single work for AI training. The only companies that benefit from that are the companies that can afford to either purchase those permissions or just obtain them in other ways. I mean, remember, some of these companies. Meta has millions of users who upload content for free every day. That Meta is training on Google has an incredible

222
00:51:40.330 --> 00:52:04.009
Dave Hansen: wealth of content being voluntarily given to it by users every day, and Google says in their terms of service, they can use that to improve their services and improve their products, presumably that includes with AI as well. So if we think about what these very, very large companies, with ready access to licensed permission, content

223
00:52:04.010 --> 00:52:09.899
Dave Hansen: and the ability to pay for additional content. If we think about that kind of environment.

224
00:52:10.100 --> 00:52:15.409
Dave Hansen: then I think it really behooves us to say, Okay.

225
00:52:15.910 --> 00:52:35.540
Dave Hansen: great. They can do some of this, but we probably don't want to live in a world where only those companies can do this. We want researchers to be able to engage in this space. We want startup competitors to engage in this space. And so I think that aligns us not so much with big tech, but with freedom to research and fair use. So

226
00:52:35.540 --> 00:52:51.599
Dave Hansen: I guess what I'm doing is sort of rejecting the premise of the question and saying, it's those principles that I think are important to us, and if that happens to align with some of the positions that some of these tech companies are taking in litigation today.

227
00:52:51.730 --> 00:52:57.039
Dave Hansen: So be it. But I think it's a much longer term perspective that we're taking.

228
00:52:57.990 --> 00:53:03.230
Katherine Klosek (ARL): And Yun Chao. Thanks, Dave and Yun Chao. You added some thoughts on this in the Q&A but please elaborate.

229
00:53:03.530 --> 00:53:29.530
Yuanxiao Xu: Yeah, I wanted to add a concrete example on where big tech and the research community, the library community may differ greatly, which is open access. So the big tech they have pretended like they're releasing a lot of models under open access structure. But what they are actually releasing are just the parameters and weights of a finished product.

230
00:53:29.530 --> 00:53:44.030
Yuanxiao Xu: and no human, no matter how great a programmer you are, can make sense of those weights and parameters of a trained AI model. So you're just looking at nonsensical numbers, numerical sequences.

231
00:53:44.070 --> 00:54:06.450
Yuanxiao Xu: and there's nothing you can do as a researcher to make sense of how an open model like Llama, can be used or can be fine-tuned further, based on that information alone. So a true open access model would at least include what training materials have been used and how the fine tuning was done.

232
00:54:08.290 --> 00:54:24.569
Yuanxiao Xu: so the research community, in order to exercise more oversight and to control the kind of responsible AI that can be produced would want to advocate for true open access models.

233
00:54:26.910 --> 00:54:54.700
Katherine Klosek (ARL): Thank you. A participant has pointed out that, in fact, the audience cannot see the Q&A. Or the answers. So I think what we'll do is we'll share the recording and transcript out with everyone who registered along with the questions and answers of the questions that have been answered. Oh, okay, some. Some might be able to see them, and some cannot. Conflicting information either way. We'll make sure everybody has all of the information that has been shared here.

234
00:54:55.407 --> 00:54:57.500
Katherine Klosek (ARL): Okay, sorry.

235
00:54:57.660 --> 00:55:18.310
Katherine Klosek (ARL): We are running out of time, and we have a lot of questions to get to. But, John, did you want to speak. We had a question that was submitted ahead of time about whether these decisions impact the potential for publishers and authors to receive compensation for their works. Do you want to touch on that? That seems important.

236
00:55:18.800 --> 00:55:20.133
Jonathan Band: Sure. So

237
00:55:22.920 --> 00:55:49.620
Jonathan Band: the the authors and the publishers can always receive compensation. Meaning there's there's going to be. There are currently licensing arrangements underway. And those are going to continue regardless of how these fair use cases come out. And even, you know the nuances of, you know. Is it a shadow library, or is it just on the open web, which is again

238
00:55:49.810 --> 00:55:52.660
Jonathan Band: a critical distinction? But but

239
00:55:53.300 --> 00:56:06.370
Jonathan Band: there! There are all kinds of reasons why, again, a a large commercial firm would want to enter into a licensing arrangement, in part, because even if, even if

240
00:56:06.430 --> 00:56:25.789
Jonathan Band: arguably, or you know again, it's going to take 10 years or more, I would say, to know for sure what is fair use and what isn't fair use. And so, if you have the money. Why wouldn't you enter into a licensing arrangement with a few publishers

241
00:56:26.010 --> 00:56:33.980
Jonathan Band: who, you think have useful content, and and you want to train on their materials? Also.

242
00:56:34.140 --> 00:56:52.649
Jonathan Band: you know, there's there's training, and there's training. And and you know you have pure training. But then you also might. What happens if you want to use expression in the outputs which could be very, very useful. Certainly you're going to want to enter into a licensing arrangement with

243
00:56:52.880 --> 00:57:07.859
Jonathan Band: a publisher who has, whose whose work you're going to want to incorporate in your outputs. For example, the New York Times. I am confident that you know you know the New York times is, you know, it's it's it's litigating with the

244
00:57:07.860 --> 00:57:27.150
Jonathan Band: with the AI firms. But the point is, they were before they filed litigation. They were trying to work out a licensing arrangement. They couldn't work it out. And so the American solution to that is, you file suit. And now I'm sure they're still talking, and I wouldn't be surprised if they will ultimately

245
00:57:27.250 --> 00:57:34.509
Jonathan Band: reach an agreement, or at least the New York Times might reach an agreement with one company.

246
00:57:34.970 --> 00:57:53.989
Jonathan Band: and they will then make a lot of their content available not only for training, but also for use and outputs. Because that's what you know. That's what the companies really want to do right. I mean, you want to be able to in your results. Say, the New York Times says such and such about

247
00:57:54.290 --> 00:58:11.350
Jonathan Band: you know about, you know the so so-called agreement on tariffs between the Us. And the EU, which, of course, is not really an agreement. It's an agreement to agree at some point in the future, with details to be worked out. But you know, you're going to want to be, you know.

248
00:58:13.230 --> 00:58:21.910
Jonathan Band: there, there's going to be a desire to be able to include those, not only the facts, but also the expression

249
00:58:22.367 --> 00:58:39.160
Jonathan Band: and and so there's, you know. But but again, you don't need to do that with everyone, you know. Maybe just the New York Times, or just a few newspapers or a few other publishers. So certainly there'll be a desire to include the expression. And so.

250
00:58:39.660 --> 00:58:45.749
Jonathan Band: even if the training you know the training issue aside, there's a lot of opportunities for licensing.

251
00:58:46.360 --> 00:59:10.860
Katherine Klosek (ARL): Thank you. We have less than a minute left, so I will wrap us up. Unfortunately that went by really quickly. I really appreciate all of the expertise that was shared here, and I'm really excited by all the questions. So I do want to make sure, you know, like I said, we'll make sure to share those out, and you will hear more from Authors, Alliance and arl on these cases. And these issues. Thanks again for your attention for your expertise. And yeah, more to come. Thank you.