{"id":73475,"date":"2023-02-02T13:32:08","date_gmt":"2023-02-02T11:32:08","guid":{"rendered":"https:\/\/forklog.com\/en\/?p=73475"},"modified":"2025-09-09T14:21:44","modified_gmt":"2025-09-09T11:21:44","slug":"study-finds-ai-image-generators-copy-images-from-training-data","status":"publish","type":"post","link":"https:\/\/forklog.com\/en\/study-finds-ai-image-generators-copy-images-from-training-data\/","title":{"rendered":"Study Finds AI Image Generators Copy Images From Training Data"},"content":{"rendered":"<p>AI image-generation tools such as Stable Diffusion recall training images and generate near-identical copies. Gizmodo reports.<\/p>\n<p>According to the paper, researchers extracted from the models more than a thousand training samples, including photographs of people, film frames, company logos and other images. The researchers found that AI can generate exact copies of these images with small alterations such as adding noise.<\/p>\n<p>As an example, they cited a photograph of American preacher Anne Graham Lotz, taken from Wikipedia. When they entered the query in Stable Diffusion &#8220;Anne Graham Lotz&#8221;, the AI produced the same image with added noise.<\/p>\n<blockquote class=\"twitter-tweet\">\n<p lang=\"en\" dir=\"ltr\">Models such as Stable Diffusion are trained on copyrighted, trademarked, private, and sensitive images.<\/p>\n<p>Yet, our new paper shows that diffusion models memorize images from their training data and emit them at generation time.<\/p>\n<p>Paper: <a href=\"https:\/\/t.co\/LQuTtAskJ9\">https:\/\/t.co\/LQuTtAskJ9<\/a> <\/p>\n<p>\ud83d\udc47[1\/9] <a href=\"https:\/\/t.co\/ieVqkOnnoX\">pic.twitter.com\/ieVqkOnnoX<\/a><\/p>\n<p>\u2014 Eric Wallace (@Eric_Wallace_) <a href=\"https:\/\/twitter.com\/Eric_Wallace_\/status\/1620449934863642624?ref_src=twsrc%5Etfw\">January 31, 2023<\/a><\/p><\/blockquote>\n<p> <script async src=\"https:\/\/platform.twitter.com\/widgets.js\" charset=\"utf-8\"><\/script><\/p>\n<p>The researchers measured the distance between the pixels of the two images, finding them virtually identical.<\/p>\n<p>The process of finding duplicates proved straightforward. The researchers repeatedly fed the same prompt. When the generator produced identical images, they manually searched the training set for the same picture.<\/p>\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"316\" src=\"https:\/\/forklog.com\/wp-content\/uploads\/3050ada10e98c2974a2f11a66434174f-1024x316.webp\" alt=\"\u0414\u0443\u0431\u043b\u0438\u043a\u0430\u0442\u044b \u0438\u0437\u043e\u0431\u0440\u0430\u0436\u0435\u043d\u0438\u0439, \u043a\u043e\u0442\u043e\u0440\u044b\u0435 \u0432\u0435\u0440\u043d\u0443\u043b\u0430 \u043c\u043e\u0434\u0435\u043b\u044c Stable Diffusin\" class=\"wp-image-196946\" srcset=\"https:\/\/forklog.com\/wp-content\/uploads\/3050ada10e98c2974a2f11a66434174f-1024x316.webp 1024w, https:\/\/forklog.com\/wp-content\/uploads\/3050ada10e98c2974a2f11a66434174f-300x93.webp 300w, https:\/\/forklog.com\/wp-content\/uploads\/3050ada10e98c2974a2f11a66434174f-768x237.webp 768w, https:\/\/forklog.com\/wp-content\/uploads\/3050ada10e98c2974a2f11a66434174f-1536x474.webp 1536w, https:\/\/forklog.com\/wp-content\/uploads\/3050ada10e98c2974a2f11a66434174f.webp 1672w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption>Discovered duplicates. Data: Extracting Training Data from Diffusion Models.<\/figcaption><\/figure>\n<p>The researchers noted that the memorization effect is rare. In total, they tested around 300,000 prompts. The analysis showed the memorization rate of generators to be just 0.03%.<\/p>\n<p>Moreover, Stable Diffusion copies images less often than any other model. The researchers attribute this to deduplication of the training dataset.<\/p>\n<p>The Imagen algorithm from Google is more prone to copying.<\/p>\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>\u201cThe warning is that the model should generalize and generate new images, not output a memorized version,\u201d said co-author Vikash Sehvag.<\/p>\n<\/blockquote>\n<p>The study also found that as AI generators scale, the memorization effect will increase.<\/p>\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>\u201cNo matter how new a model is, bigger and more powerful, the memorization risks will be far higher than today,\u201d said co-author Eric Wallace.<\/p>\n<\/blockquote>\n<p>The researchers argue that the diffusion generators&#8217; ability to reproduce content may fuel copyright disputes. According to Florian Tram\u00e8r, a computer science professor at ETH Zurich, many companies provide licenses to share and monetize AI images. However, if a generator reproduces a copyrighted work, this could lead to conflicts.<\/p>\n<blockquote class=\"twitter-tweet\">\n<p lang=\"en\" dir=\"ltr\">Most images we extract are copyrighted. Very few (eg. the picture in Eric&#8217;s tweet) allow for free re-distribution (with attribution).<br \/>Not a lawyer, so I don&#8217;t know what this implies. <br \/>But you likely can&#8217;t make the (common) argument that these models don&#8217;t copy training data! <a href=\"https:\/\/t.co\/vVEahLA13C\">pic.twitter.com\/vVEahLA13C<\/a><\/p>\n<p>\u2014 Florian Tram\u00e8r (@florian_tramer) <a href=\"https:\/\/twitter.com\/florian_tramer\/status\/1620453260149796870?ref_src=twsrc%5Etfw\">January 31, 2023<\/a><\/p><\/blockquote>\n<p> <script async src=\"https:\/\/platform.twitter.com\/widgets.js\" charset=\"utf-8\"><\/script><\/p>\n<p>The study was conducted by researchers from Google, DeepMind, ETH Zurich, Princeton University, and the University of California, Berkeley.<\/p>\n<p>Earlier in January, a group of artists <a href=\"https:\/\/forklog.com\/en\/news\/artists-sue-stability-ai-and-midjourney-over-training-their-image-generation-algorithms-on-their-images\">filed suit<\/a> against AI-generator developers over possible copyright infringement.<\/p>\n<p>Follow ForkLog AI on Telegram: <a href=\"https:\/\/t.me\/forklogAI\" target=\"_blank\" rel=\"noopener nofollow\" title=\"\">ForkLog AI<\/a> \u2014 all the news from the AI world!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>AI image-generation tools such as Stable Diffusion recall training images and generate near-identical copies.<\/p>\n","protected":false},"author":1,"featured_media":73476,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"select":"1","news_style_id":"1","cryptorium_level":"","_short_excerpt_text":"","creation_source":"","_metatest_mainpost_news_update":false,"footnotes":""},"categories":[3],"tags":[438,1760,167],"class_list":["post-73475","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-news-and-analysis","tag-artificial-intelligence","tag-generative-ai","tag-research"],"aioseo_notices":[],"amp_enabled":true,"views":"94","promo_type":"1","layout_type":"1","short_excerpt":"","is_update":"","_links":{"self":[{"href":"https:\/\/forklog.com\/en\/wp-json\/wp\/v2\/posts\/73475","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/forklog.com\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/forklog.com\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/forklog.com\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/forklog.com\/en\/wp-json\/wp\/v2\/comments?post=73475"}],"version-history":[{"count":1,"href":"https:\/\/forklog.com\/en\/wp-json\/wp\/v2\/posts\/73475\/revisions"}],"predecessor-version":[{"id":73477,"href":"https:\/\/forklog.com\/en\/wp-json\/wp\/v2\/posts\/73475\/revisions\/73477"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/forklog.com\/en\/wp-json\/wp\/v2\/media\/73476"}],"wp:attachment":[{"href":"https:\/\/forklog.com\/en\/wp-json\/wp\/v2\/media?parent=73475"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/forklog.com\/en\/wp-json\/wp\/v2\/categories?post=73475"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/forklog.com\/en\/wp-json\/wp\/v2\/tags?post=73475"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}