A Test So Hard No AI System Can Pass It

If you’re looking for a new reason to be nervous about artificial intelligence, try this: Some of the smartest humans in the world are struggling to create tests that AI systems can’t pass.

For years, AI systems have been measured by subjecting new models to a variety of standardized benchmark tests. Many of these tests consisted of difficult SAT-caliber problems in areas such as math, science and logic. Comparing model scores over time served as a rough measure of AI progress.

But AI systems eventually became too good at these tests, so new, more difficult tests were created — often with the kinds of questions that graduate students might encounter on their exams.

These tests are also not in good condition. New business models like OpenAI, Google, and Anthropic have scored high in many PhD-level challenges, limiting the usefulness of these tests and leading to a scary question: are AI systems becoming too smart to that we can measure them?

This week, researchers at the Center for AI Safety and Scale AI are releasing a possible answer to that question: a new assessment, called “Humanity’s Last Exam“, which they say is the toughest test ever given to AI systems.

Humanity’s Last Exam is the brainchild of Dan Hendrycks, a well-known AI safety researcher and director of the Center for AI Safety. (The original name of the test, “Humanity’s Last Stand”, was abandoned as too dramatic.)

Google Test the new “parental checks” setting in Android Canary

Apple iOS 26 update will add a much appreciated MacBook functionality to your iPhone: know everything about this

Emirates Airlines is associated with crypto.com to promote the digital part as payment

What Are The Boring Stocks With ‘Tech-Like Returns?’

Save 50% on Google Stream 4K for Amazon Prime Day (the best Android streaming device under $ 100)

Blackrock, Fidelity, Graycale dominate the assets of Crypto ETF in the middle of institutional interest

A Test So Hard No AI System Can Pass It — Yet

Leave a Reply Cancel reply

Leave a Reply Cancel reply

Related News