Telegram (AI) YouTube Facebook X
Ру
Study Finds AI Image Generators Copy Images From Training Data

Study Finds AI Image Generators Copy Images From Training Data

AI image-generation tools such as Stable Diffusion recall training images and generate near-identical copies. Gizmodo reports.

According to the paper, researchers extracted from the models more than a thousand training samples, including photographs of people, film frames, company logos and other images. The researchers found that AI can generate exact copies of these images with small alterations such as adding noise.

As an example, they cited a photograph of American preacher Anne Graham Lotz, taken from Wikipedia. When they entered the query in Stable Diffusion “Anne Graham Lotz”, the AI produced the same image with added noise.

The researchers measured the distance between the pixels of the two images, finding them virtually identical.

The process of finding duplicates proved straightforward. The researchers repeatedly fed the same prompt. When the generator produced identical images, they manually searched the training set for the same picture.

Дубликаты изображений, которые вернула модель Stable Diffusin
Discovered duplicates. Data: Extracting Training Data from Diffusion Models.

The researchers noted that the memorization effect is rare. In total, they tested around 300,000 prompts. The analysis showed the memorization rate of generators to be just 0.03%.

Moreover, Stable Diffusion copies images less often than any other model. The researchers attribute this to deduplication of the training dataset.

The Imagen algorithm from Google is more prone to copying.

“The warning is that the model should generalize and generate new images, not output a memorized version,” said co-author Vikash Sehvag.

The study also found that as AI generators scale, the memorization effect will increase.

“No matter how new a model is, bigger and more powerful, the memorization risks will be far higher than today,” said co-author Eric Wallace.

The researchers argue that the diffusion generators’ ability to reproduce content may fuel copyright disputes. According to Florian Tramèr, a computer science professor at ETH Zurich, many companies provide licenses to share and monetize AI images. However, if a generator reproduces a copyrighted work, this could lead to conflicts.

The study was conducted by researchers from Google, DeepMind, ETH Zurich, Princeton University, and the University of California, Berkeley.

Earlier in January, a group of artists filed suit against AI-generator developers over possible copyright infringement.

Follow ForkLog AI on Telegram: ForkLog AI — all the news from the AI world!

Подписывайтесь на ForkLog в социальных сетях

Telegram (основной канал) Facebook X
Нашли ошибку в тексте? Выделите ее и нажмите CTRL+ENTER

Рассылки ForkLog: держите руку на пульсе биткоин-индустрии!

We use cookies to improve the quality of our service.

By using this website, you agree to the Privacy policy.

OK