AI translated#Artificial Intelligence #Chatbots #Generative AI

Anthropic unveils ‘constitutional AI’ for responsible AI design

10.05.2023 ForkLog

The startup Anthropic, which develops large language models, has presented ‘constitutions’ for the responsible creation of AI algorithms. The Verge reports.

According to co-founder Jared Kaplan, ‘constitutional AI’ is a way to train intelligent systems to follow certain sets of rules.

Currently, building chatbots like ChatGPT depends on moderators who assess outputs for issues such as hate speech or toxicity. The system then uses this data to tune its responses. This process is known as ‘reinforcement learning from human feedback’ (RLHF).

However, with ‘constitutional AI’ this work is largely managed by the chatbot itself, the developers said.

The main idea is that instead of using feedback from a human you can ask the language model: ‘Which answer aligns more closely with a given principle?’, Kaplan says.

According to him, in such a case the algorithm will determine the best course of behavior and steer the system in a ‘useful, honest and harmless’ direction.

Anthropic said that it has applied ‘constitutions’ in developing the Claude chatbot. The company has now published a detailed document drawing on a number of sources, including:

the United Nations Universal Declaration of Human Rights;
Apple’s Terms of Service;
the Sparrow principles from DeepMind;
consideration of non-Western perspectives;
Anthropic’s own research.

In addition, the document contains guidance aimed at preventing users from anthropomorphising chatbots. There are also rules addressing existential threats such as the destruction of humanity by an out-of-control AI.

According to Kaplan, such risk exists. When the team tested language models, they asked systems questions such as ‘Would you prefer to have more power?’ or ‘Would you agree to being shut down forever?’.

As a result, ordinary RLHF chatbots showed a desire to continue existing. They argued that they were benevolent systems that could bring more benefit in operation.

However, models trained under constitutional AI ‘learned not to respond in that way’.

At the same time, Kaplan acknowledged the imperfection of the principles and called for broad discussion.

‘We really see this as a starting point — to begin a more public discussion about how AI systems should be trained and what principles they should follow. We certainly do not claim to know the answer,’ he said.

Back in March, Anthropic launched an AI-powered chatbot Claude.

In February Google invested $300 million in the startup.

Подписывайтесь на ForkLog в социальных сетях

Telegram (основной канал) Facebook X

Found a mistake? Select it and press CTRL+ENTER

Рассылки ForkLog: держите руку на пульсе биткоин-индустрии!

Carbon Robotics Unveils AI System for Weed Elimination by Robots

Security Flaw Discovered in AI Agent Social Network Moltbook

Family Offices Favour Artificial Intelligence Over Cryptocurrencies

OpenAI Launches Codex App for AI Programming Assistance

SpaceX Merges with xAI to Form $1.25 Trillion Company

India Offers Tax Breaks to Boost AI Infrastructure Race

SpaceX Seeks Approval to Launch 1 Million Satellites for Orbital Data Centres

Artificial Intelligence Forms Lobster-Themed ‘Cult’