Internet Archive’s Open Library and Copyright Law

With libraries at K-12 schools, universities and local communities closed, the Internet Archive provides an increasingly valuable resource for educators and distance learning. Before diving into this vast wealth of content, however, some caution regarding the copyright issues the Archive raises is warranted.

The Internet Archive will soon be 25 years old, but it remains a source of much debate among content creators, distributors (such as libraries and book sellers) and content users. The Internet Archive is a 501(c)(3) non-profit organization that in 1996 began archiving the Internet by making web content history readily accessible through the "Wayback Machine." Today it also provides digital versions of other published works, such as books and other text, audio recordings (including live concerts), videos, images, and software programs. Material is accessible by age appropriateness, beginning with kindergarten. Books, however, have become its primary focus and the source of the most contentious copyright disputes.

Works in the Public Domain

Books and other content published before 1923 are in the public domain and can be freely digitized by the Internet Archive. These works can be used in whole or in part, for any purpose, by educators and others without further copyright concerns.

Fair Use Test

An untold number of works published after 1923 are protected by copyright. Determining which of those works is still subject to copyright is such a complex and time-consuming task that the better part of wisdom suggests that one should treat all such works as protected absent concrete information as to copyright status. As a result, the discussion of permitted uses of these works starts with an analysis of whether the intended use falls under the fair use exemption. In this case, however, the test requires a two-step analysis. First, is the digitization and distribution by the Internet Archive fair use, and second, is the end user’s use of the material fair use? As a reminder, there are four factors to be considered for fair use, and no one factor is dispositive.

  • The purpose and character of the use, including whether such use is of a commercial nature or is for nonprofit educational purposes
    • While the Internet Archive is a nonprofit organization and much of its content is made available for educational purposes, there is no restriction on how end users may use material from the Archive.
    • For most uses in the nonprofit education context, whether face-to-face or online, this test should weigh in favor of fair use. For commercial or entertainment purposes, the use would face greater scrutiny as to whether it passes this prong of the test.
  •  The nature of the copyrighted work
    • Is the work in the public domain or protected by copyright?  Everything in the Internet Archive, from books to music to newspaper articles, started as protected, copyrighted work. Full copyright protection for some works may have expired, and thus they may be in the public domain, and for others the copyright holders may have collaborated with or otherwise given the Internet Archive permission to digitize and publish their content. There is also more leeway for facts to be extracted from a news website on the Wayback Machine, for example, than there would be for excerpts from a work of fiction.
    • As noted above, users should assume that works created after 1923 are protected by copyright. In reality, since the 1909 Copyright Act required certain formalities, including notice and registration, as a precondition to full copyright protection, many post-1923 works were never registered or had their registrations lapse without renewal. However, it would be burdensome, if not impossible, for the user to determine if a work was not registered or if the copyright holder had consented to having its work freely available through the Internet Archive.
  •  The amount and substantiality of the portion taken
    • The Internet Archive is publishing entire works, and where an assignment requires students to read the entire book, for example, both uses would fail this element of the test.
    • This factor was at issue in the prolonged litigation over the Google Book Project. In the early 2000s Google started mass digitization of books, ultimately scanning some 25 million titles in an attempt to make works available to researchers. Many of the works were under copyright, and Google did not get permission from the copyright owns. The Authors Guild sued Google, and after many years of litigation and failed settlements, the appeals court found for Google on the grounds of fair use. It is key to note that, although Google scanned works in their entirety in order to offer the ability to search the entire work, only snippets were actually available from Google.  To secure the entire work, researchers had to go to the copyright source.
  • The effect of the use upon the potential market
    • Making the full text of a book available for free online as well as sending a link to a digitized copy of a book can most certainly decrease e-book and hard copy sales.

The National Emergency Library

In response to the COVID-19 shutdowns of physical libraries, the Internet Archive has made all its digitized books freely available for download without the usual one-at-a-time restriction discussed below. This National Emergency Library has pitted the Authors Guild and the American Association of Publishers against libraries, universities and individuals who welcomed the move. (Regarding the furor caused among authors, see our earlier blog, COVID-19 Causes Massive Copyright Fair Use Confusion). It was announced that the practice would be in effect until the end of June or the end of the US national emergency, whichever is later.

The Internet Archive did not consult with or get permission from the copyright holders for this unilateral action. Rather, it indicated that copyright holders can request that their content be removed from the site at and stated that such requests will be accommodated within 72 hours. This link could also be used by copyright holders to request that their content be added to the National Emergency Library.

Controlled Digital Lending

The COVID-19 response is not the first time the Authors Guild has charged the Internet Archive with the “unauthorized copying, distribution, and display of books.” In January 2019 the Guild reacted negatively to Controlled Digital Lending ( “CDL”), which the Internet Archive uses to replicate in the online world the physical limitations brick and mortar libraries have to permit only one user to check out a physical book at a time. The Authors Guild said in a statement that CDL is a “recently invented legal theory that allows libraries to justify the scanning (or obtaining of scans) of print books and e-lending those digital copies to users without obtaining authorization from the copyright owners.” These sentiments are echoed by the United Kingdom’s Society of Authors. (Note that the United Kingdom does not have a comparable fair use test, and the Internet is, obviously, without borders.) It is also worth noting that, unlike physical libraries, the Internet Archive does not obtain a license to distribute e-books, so the analogy fails.

Some scholars in a white paper argue however “that it is fair use for libraries to scan or obtain scans of physical books that they own and loan those books through e-lending technologies, provided they apply certain restrictions akin to physical library loans, such as lending only one copy (either the digital copy or the physical copy) at a time and only for a defined loan period.” The Internet Archives has also gained support from the state of California and the University of California library system as well as many other universities and authors for their stance.

In 2013 the Second Circuit Court of Appeals held in Capitol Records, LLC v. ReDigi Inc. (934 F. Supp. 2d 640 (S.D.N.Y. 2013)) that reselling a digital file without the copyright holder’s permission is not fair use because the resales compete with the legitimate copyright holder’s sales. The court also rejected the first sale argument (the doctrine that allows the owner of a copy or phonorecord lawfully made under the Copyright Act to sell or otherwise dispose of the possession of that copy or phonorecord) and held ReDigi liable for both direct and contributory infringement since ReDigi knew or should have known its business would encourage infringement. Whether the “wartime exigency” of COVID-19 alters the fair use analysis as applied to the transfer of digital files is dubious. Difficult as COVID-19 is for everyone, the rule of law – including copyright law – has not been suspended. There is also, as noted above, the question as to whether the user has any liability as well.

While this dispute continues, the Internet Archive has maintained its position on CLD and, at least temporarily, has increased access to copyrighted material. Technology advancements continue to test the boundaries of copyright law, and there are no clear answers as to how copyright law will ultimately address these new questions. Adapting to the post COVID-19 world is likely to bring these challenges to the fore more quickly. Stay tuned.