Internet Archive’s Open Library and Copyright Law: Second Addendum

This post is an update. Read the original post here, the first addendum here and the third addendum here.

In May and June of this year, we wrote about the copyright dispute between the Internet Archive and four major publishing companies — Hachette Book Group Inc., HarperCollins Publishers LLC, John Wiley & Sons Inc. and Penguin Random House LLC (the “Publishers”) — as a result of the Internet Archive’s National Emergency Library project launched in March 2020 in response to the COVID-19 pandemic. On June 1, the Publishers, in coordination with the Association of American Publishers, filed a copyright infringement lawsuit against the organization’s “Open Library” and National Emergency Library. Hachette Book Group, Inc. v. Internet Archive, No. 1:20-cv-04160 (S.D.N.Y. June 1, 2020). The Publishers argued that the so-called “National Emergency Library” was tantamount to asserting an emergency copyright act unilaterally and by private action. As a result, the Internet Archive prematurely shuttered the National Emergency Library on June 16, 2020, but many issues remain to be litigated. In fact, the Publishers are seeking to close the Open Library permanently.  Whatever the outcome of the litigation, it will have a critical impact on how works will be digitized and the access that researchers, academics, students and all readers will have to digitized content going forward. This addendum addresses some of the key arguments raised by the litigation, all of which go to the heart of copyright ownership and fair use.

As background, the Internet Archive is a nonprofit organization that has digitally preserved more than 1.3 million books and historic documents, as well as approximately 400 billion pages of Internet content. The Internet Archive owns or has obtained licenses from many prestigious libraries to all the books it has digitized. Much of the concern has arisen from the fact that the Internet Archive loans out these digital copies of copyrighted works to one patron to “borrow” at a time, mimicking the traditional library lending model. This practice is sometimes referred to as the “owned to loan” model. Works deemed in the public domain can be “read” without limitations. The Internet Archive notes that the Open Library is an accredited California State Library run by the Internet Archive, though the Publishers are dubious that this accreditation is justified.

Since its inception, the Publishers have considered Controlled Digital Lending (CDL) a manufactured legal paradigm without legal support. They also question the Digital Rights Management (DRM) tools that the Internet Archive uses to restrict copying and enforce the 14-day lending limit for borrowed books. However, when the Internet Archive waived this “one checkout at a time” requirement in the wake of the COVID-19 pandemic, this unilateral action further stretched the existing limits of copyright law. At that point, the Publishers determined that the time had come to attempt to protect their copyrights and historic business model in court and seek compensation for themselves and the authors for the distribution of their works. The Publishers are requesting destruction of the existing collection of digitized books in the Open Library, monetary damages for the use of their copyrighted books and an injunction against the Internet Archive’s digitization and lending practices.

The Publishers also challenge the applicability of the first sale doctrine to the Internet Archive’s business practices. The first sale doctrine, codified at 17 U.S.C. § 109, entitles the lawful “owner of a particular … lawfully made” copy of a copyrighted work, like a book, “to sell or otherwise dispose of the possession of that copy or phonorecord.”

As a practical matter, the destruction of the existing digital library is the most consequential of the relief the Publishers are seeking. The digitization of existing books has been twice litigated in the courts. In Authors Guild v. Google, Inc., the Second Circuit rejected the Authors Guild’s claim that the Google Books Project, which scanned and made searchable the collections of many university libraries, infringed copyright. 804 F.3d 202 (2d Cir. 2015). The Authors Guild asked the Supreme Court to review that decision, and it declined to do so. In Authors Guild, Inc. v. HathiTrust, the Second Circuit was again asked to rule on whether digitization of books is a legal fair use of copyrighted material. 755 F.3d 87 (2d Cir. 2014). The Court upheld HathiTrust’s right to maintain a full-text database to search for copyrighted and public domain books, stating that “the creation of a full‐text searchable database is a quintessentially transformative use” and consistent with the purpose of copyright. In its opinion, the Court emphasized the importance of the public as the primary intended beneficiary of copyright law, “whose access to knowledge copyright seeks to advance by providing rewards for authorship.” The Court also approved, as fair use, HathiTrust’s service to make text available in formats accessible to print-disabled people. These cases suggest that courts are not likely to require the Internet Archive to destroy its existing library. In contrast, however, a study authored by the Copyright Office, entitled “Legal Issues in Mass Digitization” published in October 2011, contains the statement “[t]he Section 108 exception does not contemplate mass digitization.” Since copyright law is very fact-specific, given the almost year-long discovery process anticipated in the Internet Archive case, any outcome is possible.

The broader legal issue is whether the nonprofit library’s lending practices constitute fair use, and a win for the Publishers would have a chilling effect on the Internet Archive and research. As a result, the list of institutions supporting the Internet Archive in this dispute is a who’s who of academia, including public and private institutions ranging from the Ivy League to community colleges, as well as certain high schools and local library systems.  While a searchable database of copyrighted books such as in HathiTrust addresses certain research requirements, it is not a substitute for the electronic access to the full text which the Internet Archive offers.

The Internet Archive relies on the fact that it has a mechanism in place whereby authors or publishers can request that public access to their works not be provided, and it claims to honor such requests. The Publishers argue that copyright holders should not bear the burden of providing notice for these infringing uses where it is the Internet Archive, the website owner, rather than a third party, that is posting the infringing material. 

Critical to the impact of this litigation is that, despite some interpretations to the contrary, even if the Publishers are successful on all their claims, at least as an initial matter, the decision would be limited to books. The Internet Archive’s Wayback Machine — the irreplaceable repository of the Internet’s history — and public domain documents are specifically excluded by the Publishers as the subject of this litigation, and its music, videos, software and lectures are also not mentioned in the litigation.

In the end, a loss would not necessarily be a financial death knell for the Internet Archive. Regardless of the outcome, this case will have a critical impact on the financial model of the publishing industry. It is undisputed that the Publishers’ business model consists of significant investment and risk in the marketing and publishing process, and they have aggressively entered the market to produce their own e-books. Although the Internet Archive’s PDFs are often not as high quality as, and may not be a direct substitute for, the more polished Publishers’ e-books, since they are available to users at zero cost, they do represent a major disruption to the Publishers’ business model. Not only do these copies devalue the authorized e-books among consumers, but more importantly, there is no economic model where the Publishers can compete with a free service with little out-of-pocket cost.

As of this writing, the Publishers have only cited 127 copyrighted books in the lawsuit, which translates to financial damages of approximately $19 million. While not insignificant, the Internet Archive acknowledges in the litigation that over the last ten years it has received donations, grants and other revenue totaling more than $100 million, and there is no indication the tech nonprofit’s revenue base is in jeopardy.  In fact, with the continued financial support of libraries, it has the capacity to digitize 3000 books from these libraries’ collections each day. It remains to be seen whether the Publishers will broaden their claim to include all the copyrighted books the Internet Archive has digitized, thus dramatically increasing the Internet Archive’s financial exposure. Alternatively, should the Internet Archive be required to pay fees to the copyright holders, its costs would increase. The more certain loser in the case is the education community that has come to rely on the no-cost digitized library to give students seamless access to educational resources.

In the meantime, the parties are waging a battle of words in their initial court filings and in the media. The Internet Archive brief states:

The Internet Archive does what libraries have always done: buy, collect, preserve, and share our common culture… Contrary to the publishers’ accusations, the Internet Archive and the hundreds of libraries and archives that support it are not pirates or thieves. They are librarians, striving to serve their patrons online just as they have done for centuries in the brick-and-mortar world.

Whether or not this litigation determines that Controlled Digital Lending (CDL) constitutes copyright infringement, this battle between new technology and existing channels of distribution will undoubtedly be a disruptive force in book publishing. Should the Publishers win as the record companies did in A&M Records, Inc. v. Napster, Inc., 239 F.3d 1004 (2001), lose as the movie industry did in Sony Corp. of America v. Universal City Studios, Inc., 464 U.S. 417 (1984), or whether technology simply causes a market shift as it did from Blockbuster to Netflix, change is inevitable. As the Digital Millennium Copyright Act (DMCA) Section 104 Report issued by the Copyright Office states:

Time, space, effort and cost no longer act as barriers to the movement of [digital] copies, since digital copies can be transmitted nearly instantaneously anywhere in the world with minimal effort and negligible cost.

As this case progresses through the courts or, alternatively, as some observers suggest is likely, the parties settle this dispute, we will keep you apprised of significant developments. In the meantime, look for related blogs in the near future on Controlled Digital Lending and Digital Rights Management beyond the Internet Archive application.