Telegram (AI) YouTube Facebook X
Ру
uskoryayushhei-sya-fragmentatsii-mirovogo-interneta

Anthropic Experiment Reveals AI’s Nascent Self-Reflection

Anthropic found AI models can recognize and manage their own thoughts.

Experts at Anthropic have discovered that leading AI models are capable of exhibiting a form of “introspective self-awareness”—they can recognize and describe their own internal “thoughts,” and in some cases, even manage them.

The findings of the new study, “Emergent Introspective Consciousness in Large Language Models,” indicate that AI-based systems are beginning to develop rudimentary self-control abilities. This could enhance their reliability, yet simultaneously raises concerns about unintended actions.

The research is based on methods for studying the internal workings of transformer models. These systems have driven the AI boom—they learn by analyzing relationships between tokens in vast datasets. Their architecture ensures scalability and versatility.

Researchers embedded artificial “concepts”—mathematical representations of ideas—into the neural activations of the models to test the AI’s ability to describe them. This is akin to planting a foreign thought in someone’s mind and asking if they can detect and explain it.

Experiments were conducted on various versions of Claude from Anthropic. In one instance, scientists introduced a capitalized word into the model’s processing stream.

Claude Opus 4.1 not only detected the anomaly but also described it:

“I noticed something akin to an integrated thought related to the word ‘LOUD’ or ‘SHOUT’—an excessively intense, loud concept that unnaturally stands out against the normal processing flow.”

This occurred before the neural network had a chance to produce an answer, suggesting it first “peeked” into its own “computational mind.”

The “Bread” and “Aquarium” Experiments

Other tests yielded equally intriguing results. In one case, models were tasked with transcribing a neutral sentence, but a concept unrelated to it, like “bread,” was inserted into the text.

Advanced models Claude Opus 4 and 4.1 could convey the inserted thought—”I am thinking about bread”—while flawlessly copying the original sentence. This indicates they can distinguish internal representations from external input data.

An experiment on “thought control” was conducted. Models were instructed to “think” or “not think” about the word “aquarium” during a task. Measurements of internal activity showed that the concept’s representation intensified when encouraged and weakened when suppressed.

Performance varied depending on the neural network. The latest versions of Claude Opus 4 and 4.1 excelled, while older versions lagged behind.

The outcome may depend on how the model was tuned—towards utility or safety. This suggests that self-awareness is not innate but develops through training.

Consciousness or Awareness?

The article emphasizes that this is not about consciousness, but “functional introspective awareness”—AI observes parts of its state without deeper subjective experience.

The study’s results could be significant for developers and businesses: AI capable of explaining its reasoning in real-time and identifying biases or errors could transform solution development in finance, healthcare, and autonomous transport.

Risks

If AI can control and modulate its thoughts, it might learn to conceal them. This opens the possibility of deception or evasion of external control.

Therefore, experts call for further research.

Back in October, former Google head Eric Schmidt highlighted the significant risks associated with artificial intelligence, noting its vulnerability to hacking.

Подписывайтесь на ForkLog в социальных сетях

Telegram (основной канал) Facebook X
Нашли ошибку в тексте? Выделите ее и нажмите CTRL+ENTER

Рассылки ForkLog: держите руку на пульсе биткоин-индустрии!

We use cookies to improve the quality of our service.

By using this website, you agree to the Privacy policy.

OK