{"id":24404,"date":"2025-05-30T19:00:00","date_gmt":"2025-05-30T16:00:00","guid":{"rendered":"https:\/\/forklog.com\/en\/deepseeks-new-ai-model-operates-on-a-single-gpu\/"},"modified":"2025-05-30T19:00:00","modified_gmt":"2025-05-30T16:00:00","slug":"deepseeks-new-ai-model-operates-on-a-single-gpu","status":"publish","type":"post","link":"https:\/\/forklog.com\/en\/deepseeks-new-ai-model-operates-on-a-single-gpu\/","title":{"rendered":"DeepSeek&#8217;s New AI Model Operates on a Single GPU"},"content":{"rendered":"<p>Chinese AI laboratory DeepSeek has <a href=\"https:\/\/huggingface.co\/deepseek-ai\/DeepSeek-R1-0528\">updated<\/a> its &#8220;reasoning&#8221; AI model R1. Its &#8220;distilled&#8221; version is capable of running on a single graphics card.<\/p>\n<p>DeepSeek-R1-0528-Qwen3-8B is based on Qwen3-8B, which Alibaba introduced in May. According to the company, it outperformed Google&#8217;s Gemini 2.5 Flash in AIME 2025\u2014a collection of complex mathematical questions.<\/p>\n<p>The &#8220;distilled&#8221; version is a simplified and accelerated variant of a large machine learning model, achieved through the method of <span data-descr=\"a method in machine learning where a smaller model (student) is trained based on the behavior of a larger and more powerful model (teacher)\" class=\"old_tooltip\">knowledge distillation<\/span>. Such neural networks are often less performant but much less demanding computationally.<\/p>\n<p>According to <a href=\"https:\/\/nodeshift.com\/blog\/how-to-install-qwen-3-locally\">NodeShift<\/a>, Qwen3-8B requires a graphics processor with 40-80 GB of video memory. It can be run on a single Nvidia H100 graphics card.<\/p>\n<p>DeepSeek used the updated version of R1 and Qwen3-8B for training and tuning DeepSeek-R1-0528-Qwen3-8B.<\/p>\n<p>The new variant of the main R1 neural network has minor updates, the company claims. It is available on the Hugging Face platform.<\/p>\n<p><script async src=\"https:\/\/telegram.org\/js\/telegram-widget.js?22\" data-telegram-post=\"forklogAI\/6014\" data-width=\"100%\"><\/script><\/p>\n<p>A developer with the nickname xlr8harder noted that the model is less willing to engage in discussions on controversial topics, especially those related to the Chinese government.<\/p>\n<blockquote class=\"twitter-tweet\">\n<p lang=\"en\" dir=\"ltr\">Deepseek R1 0528 is substantially less permissive on contentious free speech topics than previous Deepseek releases.<\/p>\n<p>It&#8217;s unclear if this indicates they&#8217;ve adapted their post-training goals, or if this is another example of a reasoning model. <a href=\"https:\/\/t.co\/BPOYodBCAH\">pic.twitter.com\/BPOYodBCAH<\/a><\/p>\n<p>\u2014 xlr8harder (@xlr8harder) <a href=\"https:\/\/twitter.com\/xlr8harder\/status\/1927964889743544784?ref_src=twsrc%5Etfw\">May 29, 2025<\/a><\/p><\/blockquote>\n<p> <script async src=\"https:\/\/platform.twitter.com\/widgets.js\" charset=\"utf-8\"><\/script><\/p>\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>&#8220;DeepSeek deserves criticism for this release: this model is a significant step back for free speech. It is mitigated by the fact that the neural network is open source with a permissive license, so the community can (and will) address this issue,&#8221; he noted.<\/p>\n<\/blockquote>\n<p>In one example, the model refused to provide arguments for human rights violations <span data-descr=\"a network of facilities in the Xinjiang Uyghur Autonomous Region of China, reportedly used for the mass detention of Uyghurs and other Muslim minorities under the pretext of combating extremism. Human rights activists and journalists report these are mass detention camps without trial\" class=\"old_tooltip\">in internment camps in Xinjiang<\/span>. It acknowledged the fact but avoided direct criticism of the Chinese government.<\/p>\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>&#8220;It&#8217;s interesting, though not entirely surprising, that it can cite the camps as an example of human rights violations, but denies it when asked directly,&#8221; wrote xlr8harder.<\/p>\n<\/blockquote>\n<p>Back in April, DeepSeek <a href=\"https:\/\/forklog.com\/en\/news\/deepseek-unveils-mathematical-ai-model-prover-v2\">released<\/a> a new math-oriented AI model, Prover, to the public.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Chinese AI laboratory DeepSeek has updated its &#8220;reasoning&#8221; AI model R1. Its &#8220;distilled&#8221; version is capable of running on a single graphics card. DeepSeek-R1-0528-Qwen3-8B is based on Qwen3-8B, which Alibaba introduced in May. According to the company, it outperformed Google&#8217;s Gemini 2.5 Flash in AIME 2025\u2014a collection of complex mathematical questions. The &#8220;distilled&#8221; version is [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":24403,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"select":"","news_style_id":"","cryptorium_level":"","_short_excerpt_text":"","creation_source":"","_metatest_mainpost_news_update":false,"footnotes":""},"categories":[3],"tags":[438,1743],"class_list":["post-24404","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-news-and-analysis","tag-artificial-intelligence","tag-deepseek"],"aioseo_notices":[],"amp_enabled":true,"views":"46","promo_type":"","layout_type":"","short_excerpt":"","is_update":"","_links":{"self":[{"href":"https:\/\/forklog.com\/en\/wp-json\/wp\/v2\/posts\/24404","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/forklog.com\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/forklog.com\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/forklog.com\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/forklog.com\/en\/wp-json\/wp\/v2\/comments?post=24404"}],"version-history":[{"count":0,"href":"https:\/\/forklog.com\/en\/wp-json\/wp\/v2\/posts\/24404\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/forklog.com\/en\/wp-json\/wp\/v2\/media\/24403"}],"wp:attachment":[{"href":"https:\/\/forklog.com\/en\/wp-json\/wp\/v2\/media?parent=24404"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/forklog.com\/en\/wp-json\/wp\/v2\/categories?post=24404"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/forklog.com\/en\/wp-json\/wp\/v2\/tags?post=24404"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}