MLE Examples - Search News

Scientists design new 'AGI benchmark' that indicates whether any future AI model could cause 'catastrophic harm'

OpenAI scientists have designed MLE-bench — a compilation of 75 extremely difficult tests that can assess whether a future advanced AI agent is capable of modifying its own code and improving itself.

Some results have been hidden because they may be inaccessible to you

Show inaccessible results

Feedback

Scientists design new 'AGI benchmark' that indicates whether any future AI model could cause 'catastrophic harm'

Trending now