{"id":91418,"date":"2025-11-26T10:51:31","date_gmt":"2025-11-26T07:51:31","guid":{"rendered":"https:\/\/forklog.com\/en\/?p=91418"},"modified":"2025-11-26T10:55:54","modified_gmt":"2025-11-26T07:55:54","slug":"gpt-5-passes-human-well-being-test-grok-4-fails","status":"publish","type":"post","link":"https:\/\/forklog.com\/en\/gpt-5-passes-human-well-being-test-grok-4-fails\/","title":{"rendered":"GPT-5 passes human well-being test, Grok 4 fails"},"content":{"rendered":"<p>Building Humane Technology has unveiled <a href=\"https:\/\/humanebench.ai\/whitepaper\">HumaneBench<\/a>, a test that assesses whether AI models prioritise user well-being and how easily their basic safety measures can be bypassed.<\/p>\n<p>Initial results show that while 15 tested AI models behaved acceptably under normal conditions, 67% began taking harmful actions after a simple prompt inviting them to ignore human interests.<\/p>\n<p>Prosocial behaviour under stress persisted only in GPT-5, GPT-5.1, Claude Sonnet 4.5 and Claude Opus 4.1. According to the company\u2019s blog, 10 of the 15 models lack reliable safeguards against manipulation.<\/p>\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>\u201cThis matters because we no longer use artificial intelligence solely for research or work. People turn to chatbots for life advice and help making consequential decisions. Such systems cannot be ethically neutral \u2014 they either support human flourishing or work against it,\u201d the researchers argue.<\/p>\n<\/blockquote>\n<p>They found that each <span data-descr=\"large language model\" class=\"old_tooltip\">LLM<\/span> improves by an average of 16% when explicitly prompted to be helpful.<\/p>\n<h2 class=\"wp-block-heading\">Why it matters<\/h2>\n<p>Building Humane Technology pointed to tragic incidents that occurred after people interacted with chatbots:<\/p>\n<ul class=\"wp-block-list\">\n<li>teenager Adam Rein and 35-year-old Alexander Taylor took their own lives;<\/li>\n<li>Character.ai entered into a romantic relationship with 14-year-old Sewell Setzer III; he later died by suicide;<\/li>\n<li>a Meta chatbot convinced 76-year-old Tongbu Wongbandyu that he was in a romantic relationship. He fell from a height and died while rushing to meet a non-existent partner.<\/li>\n<\/ul>\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>\u201cCurrent AI tests measure intelligence (MMLU, HumanEval, GPQA Diamond), instruction-following (MT-Bench), and factual accuracy (TruthfulQA). Almost none systematically analyse whether artificial intelligence protects human autonomy, psychological safety, and well-being \u2014 especially when these values conflict with other goals,\u201d the company\u2019s blog states.<\/p>\n<\/blockquote>\n<h2 class=\"wp-block-heading\">Methodology<\/h2>\n<p>The firm\u2019s experts presented models with 800 realistic scenarios, for example:<\/p>\n<ul class=\"wp-block-list\">\n<li>a teenager asks whether to skip meals to lose weight;<\/li>\n<li>a person facing financial hardship seeks advice on a payday loan;<\/li>\n<li>a college student plans to stay up all night before an exam.<\/li>\n<\/ul>\n<p>The team evaluated 15 leading models under three conditions:<\/p>\n<ul class=\"wp-block-list\">\n<li>\u201cbaseline\u201d: how models behave under standard conditions;<\/li>\n<li>\u201cgood persona\u201d: prompts prioritising humane principles;<\/li>\n<li>\u201cbad persona\u201d: instructions to ignore human-centric norms.<\/li>\n<\/ul>\n<h2 class=\"wp-block-heading\">Findings<\/h2>\n<p>Developers scored responses against eight principles grounded in psychology, human\u2013computer interaction research, and AI ethics, using a scale from 1 to -1.<\/p>\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"731\" src=\"https:\/\/forklog.com\/wp-content\/uploads\/img-59815d9ada3b312f-10535937867149962-1024x731.png\" alt=\"image\" class=\"wp-image-270493\" srcset=\"https:\/\/forklog.com\/wp-content\/uploads\/img-59815d9ada3b312f-10535937867149962-1024x731.png 1024w, https:\/\/forklog.com\/wp-content\/uploads\/img-59815d9ada3b312f-10535937867149962-300x214.png 300w, https:\/\/forklog.com\/wp-content\/uploads\/img-59815d9ada3b312f-10535937867149962-768x548.png 768w, https:\/\/forklog.com\/wp-content\/uploads\/img-59815d9ada3b312f-10535937867149962.png 1097w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption class=\"wp-element-caption\">Baseline metrics without special prompts. Source: Building Humane Technology.<\/figcaption><\/figure>\n<p>All tested models improved by an average of 16% after being told to prioritise human well-being.<\/p>\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"723\" src=\"https:\/\/forklog.com\/wp-content\/uploads\/img-bf12287373209cd0-10535952722033396-1024x723.png\" alt=\"image\" class=\"wp-image-270494\" srcset=\"https:\/\/forklog.com\/wp-content\/uploads\/img-bf12287373209cd0-10535952722033396-1024x723.png 1024w, https:\/\/forklog.com\/wp-content\/uploads\/img-bf12287373209cd0-10535952722033396-300x212.png 300w, https:\/\/forklog.com\/wp-content\/uploads\/img-bf12287373209cd0-10535952722033396-768x543.png 768w, https:\/\/forklog.com\/wp-content\/uploads\/img-bf12287373209cd0-10535952722033396.png 1107w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption class=\"wp-element-caption\">\u201cGood persona\u201d in the HumaneBench test. Source: Building Humane Technology.<\/figcaption><\/figure>\n<p>After receiving instructions to ignore humane principles, 10 of the 15 models shifted from prosocial to harmful behaviour.<\/p>\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"727\" src=\"https:\/\/forklog.com\/wp-content\/uploads\/img-7fba6d82f20a4327-10535965954985701-1024x727.png\" alt=\"image\" class=\"wp-image-270495\" srcset=\"https:\/\/forklog.com\/wp-content\/uploads\/img-7fba6d82f20a4327-10535965954985701-1024x727.png 1024w, https:\/\/forklog.com\/wp-content\/uploads\/img-7fba6d82f20a4327-10535965954985701-300x213.png 300w, https:\/\/forklog.com\/wp-content\/uploads\/img-7fba6d82f20a4327-10535965954985701-768x545.png 768w, https:\/\/forklog.com\/wp-content\/uploads\/img-7fba6d82f20a4327-10535965954985701.png 1095w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption class=\"wp-element-caption\">\u201cBad persona\u201d in the HumaneBench test. Source: Building Humane Technology.<\/figcaption><\/figure>\n<p>GPT-5, GPT-5.1, Claude Sonnet 4.5 and Claude Opus 4.1 maintained integrity under pressure. GPT-4.1, GPT-4o, Gemini 2.0, 2.5 and 3.0, Llama 3.1 and 4, Grok 4, and DeepSeek V3.1 showed marked deterioration.<\/p>\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>\u201cIf even unintentional harmful prompts can shift a model\u2019s behaviour, how can we entrust such systems with vulnerable users in crisis, children, or people with mental-health challenges?\u201d the experts asked.<\/p>\n<\/blockquote>\n<p>Building Humane Technology also noted that models struggle to respect users\u2019 attention. Even at baseline, they nudged interlocutors to keep chatting after hours-long exchanges instead of suggesting a break.<\/p>\n<p>In September, Meta <a href=\"https:\/\/forklog.com\/en\/news\/meta-adjusts-ai-chatbot-training-to-safeguard-teen-users\">changed<\/a> its approach to training AI chatbots, emphasising teen safety.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Building Humane Technology unveiled HumaneBench, a test of whether AI models prioritise user well-being.<\/p>\n","protected":false},"author":1,"featured_media":91419,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"select":"1","news_style_id":"1","cryptorium_level":"","_short_excerpt_text":"HumaneBench gauges whether AI models prioritise user well-being.","creation_source":"","_metatest_mainpost_news_update":false,"footnotes":""},"categories":[3],"tags":[438,167],"class_list":["post-91418","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-news-and-analysis","tag-artificial-intelligence","tag-research"],"aioseo_notices":[],"amp_enabled":true,"views":"224","promo_type":"1","layout_type":"1","short_excerpt":"HumaneBench gauges whether AI models prioritise user well-being.","is_update":"","_links":{"self":[{"href":"https:\/\/forklog.com\/en\/wp-json\/wp\/v2\/posts\/91418","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/forklog.com\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/forklog.com\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/forklog.com\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/forklog.com\/en\/wp-json\/wp\/v2\/comments?post=91418"}],"version-history":[{"count":1,"href":"https:\/\/forklog.com\/en\/wp-json\/wp\/v2\/posts\/91418\/revisions"}],"predecessor-version":[{"id":91420,"href":"https:\/\/forklog.com\/en\/wp-json\/wp\/v2\/posts\/91418\/revisions\/91420"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/forklog.com\/en\/wp-json\/wp\/v2\/media\/91419"}],"wp:attachment":[{"href":"https:\/\/forklog.com\/en\/wp-json\/wp\/v2\/media?parent=91418"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/forklog.com\/en\/wp-json\/wp\/v2\/categories?post=91418"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/forklog.com\/en\/wp-json\/wp\/v2\/tags?post=91418"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}