Anthropic CEO wants to open the black box of AI models by 2027


Anthropic CEO Dario Amodei published an attempt Thursday, stressing how researchers understand the interior operation of the best IA models in the world. To remedy this, Amodei set an ambitious objective so that Anthropic reliably detects most IA model problems by 2027.

Amodei recognizes the upcoming challenge. In “The urgency of interpretability”, the CEO says that Anthropic has made early breakthroughs in tracing the way the models arrive at their answers – but underline that much more research is necessary to decode these systems as they become more powerful.

“I am very concerned about the deployment of these systems without better understanding interpretability,” wrote Amodei in the test. “These systems will be absolutely at the heart of the economy, technology and national security, and will be capable of so much autonomy that I consider that it is essentially unacceptable for humanity to be completely ignorant of their functioning.”

Anthropic is one of the pioneering companies in mechanistic interpretability, an area that aims to open the black box of AI models and to understand why they make the decisions they make. Despite the rapid improvements in the performance of AI models in the technological industry, we still have relatively few ideas how these systems come to decisions.

For example, Openai has recently launched new models of AI, O3 and O4-Mini reasoning, which work better on certain tasks, but also hallucinate more than its other models. The company does not know why it happens.

“When a generative AI system does something, like summarizing a financial document, we have no idea, at a specific or precise level, why he makes the choices he makes – why he chooses certain words compared to others, or why he sometimes makes an error despite being exact,” wrote Amodei in the test.

In the test, Amodei notes that the anthropogenic co-founder Chris Olah says that AI models are “cultivated more than they are built”. In other words, AI researchers have found ways to improve the intelligence of AI models, but they don’t really know why.

In the test, Amodei says that it could be dangerous to reach act – or as he calls it, “a country of geniuses in a data center” – without understanding how these models work. In a previous test, Amodei said that the technology industry could reach such a step by 2026 or 2027, but thinks that we are much further to fully understand these AI models.

In the long term, Amodei says that Anthropic would essentially like to make “brain scans” or “MRI” of advanced AI models. These controls would help identify a wide range of problems in AI models, including their trends to lie or seek power, or any other weakness, he said. This could take five to 10 years to reach, but these measures will be necessary to test and deploy future anthropic AI models, he added.

Anthropic made some research breakthroughs which allowed him to better understand the operation of his AI models. For example, the company recently found means of Draw the reflection paths of an AI model throughWhat the company calls, the circuits. Anthropic has identified a circuit that helps AI models to understand what American cities are located in American states. The company has only found some of these circuits, but estimates that there are millions in AI models.

Anthropic has invested in the search for interpretability itself and recently made His first investment in a startup work on interpretability. Although interpretability is widely considered as a security area today, Amodei notes that, ultimately, explaining how AI models come to their responses could have a commercial advantage.

In the test, Amodei called Openai and Google Deepmind to increase their research efforts in the field. Beyond the friendly blow, the CEO of Anthropic asked governments to impose “slight” regulations to encourage research on interpretability, such as the requirements for companies to disclose their security and security practices. In the trial, Amodei also says that the United States should put export controls on fleas to China, in order to limit the probability of an uncontrollable world AI race.

Anthropic has always stood out from Openai and Google to focus on security. While other technological companies have pushed the controversial California AI security bill, SB 1047, Anthropic has issued modest support and recommendations for the bill, which would have established security report standards for developers of frontal IA models.

In this case, Anthropic seems to put pressure for an effort at the industry scale to better understand AI models, not only to increase their capacities.

Leave a Reply

Your email address will not be published. Required fields are marked *