Data Poisoning

susanJanuary 2, 2024January 2, 2024

In order to train generative AI algorithms, AI companies scrape vast amounts of information from the Internet, thereby threatening creators’ intellectual property rights. Although some AI companies have given copyright owners the option to limit what works are used to train AI algorithms, there are still few avenues available for protecting copyrighted works from AI. Currently, copyright infringement litigation involves uncertainty as to the outcome and damage awards that may not fully compensate creators for the harm after the fact. Therefore, technological solutions may be more efficient, particularly until the courts provide clarity on what constitutes actionable infringement or fair use.

In response to scraping, researchers at the University of Chicago developed Nightshade, a data poisoning tool which incorporates nonsensical information into the data contained in an image. Data poisoning does not change the visual appearance of poisoned images, but rather prevents AI tools from identifying what the image actually contains. As an AI tool takes in more poisoned images, its ability to generate images that accurately respond to a textual prompt is disrupted. For example, once enough poisoned images of a hat enter an AI algorithm’s database, the algorithm may generate a cake if prompted to generate a hat. Since it is time-consuming and costly for AI companies to remove poisoned images from generative AI models, data poisoning may encourage AI companies to respect intellectual property rights in the development of their algorithms.

Data Poisoning

Subscribe to our newsletter