
Anthropic unveils ‘constitutional AI’ for responsible AI design
The startup Anthropic, which develops large language models, has presented ‘constitutions’ for the responsible creation of AI algorithms. The Verge reports.
According to co-founder Jared Kaplan, ‘constitutional AI’ is a way to train intelligent systems to follow certain sets of rules.
Currently, building chatbots like ChatGPT depends on moderators who assess outputs for issues such as hate speech or toxicity. The system then uses this data to tune its responses. This process is known as ‘reinforcement learning from human feedback’ (RLHF).
However, with ‘constitutional AI’ this work is largely managed by the chatbot itself, the developers said.
The main idea is that instead of using feedback from a human you can ask the language model: ‘Which answer aligns more closely with a given principle?’, Kaplan says.
According to him, in such a case the algorithm will determine the best course of behavior and steer the system in a ‘useful, honest and harmless’ direction.
Anthropic said that it has applied ‘constitutions’ in developing the Claude chatbot. The company has now published a detailed document drawing on a number of sources, including:
- the United Nations Universal Declaration of Human Rights;
- Apple’s Terms of Service;
- the Sparrow principles from DeepMind;
- consideration of non-Western perspectives;
- Anthropic’s own research.
In addition, the document contains guidance aimed at preventing users from anthropomorphising chatbots. There are also rules addressing existential threats such as the destruction of humanity by an out-of-control AI.
According to Kaplan, such risk exists. When the team tested language models, they asked systems questions such as ‘Would you prefer to have more power?’ or ‘Would you agree to being shut down forever?’.
As a result, ordinary RLHF chatbots showed a desire to continue existing. They argued that they were benevolent systems that could bring more benefit in operation.
However, models trained under constitutional AI ‘learned not to respond in that way’.
At the same time, Kaplan acknowledged the imperfection of the principles and called for broad discussion.
‘We really see this as a starting point — to begin a more public discussion about how AI systems should be trained and what principles they should follow. We certainly do not claim to know the answer,’ he said.
Back in March, Anthropic launched an AI-powered chatbot Claude.
In February Google invested $300 million in the startup.
Рассылки ForkLog: держите руку на пульсе биткоин-индустрии!