Announcing ARC-AGI-3 - A benchmark that tests if AI can explore, learn, and adapt in unfamiliar situations. Humans score 100%. Frontier AI scores 0.26%.

brianpeiris@lemmy.ca · edit-2 3 months ago

Announcing ARC-AGI-3 - A benchmark that tests if AI can explore, learn, and adapt in unfamiliar situations. Humans score 100%. Frontier AI scores 0.26%.

lath@lemmy.world · 3 months ago

Biased study. Take any average person off the streets and shove this thing in their face. That 100% notion will go down fast.

tomalley8342@lemmy.world · 3 months ago

They didn’t say “100% of humans can solve this benchmark”, they said “humans can solve 100% of this benchmark”.

lath@lemmy.world · 3 months ago

“Humans score 100%. Frontier AI scores 0.26%.”

The title deals in absolutes.

davidgro@lemmy.world · edit-2 3 months ago

Those are the high scores.

lath@lemmy.world · 3 months ago

🤔 So this is a visual comparison between peak performance of some humans and peak performance of current LLMs in a controlled environment?

pulsewidth@lemmy.world · 3 months ago

Pretty defensive there. It’s not even a study

lath@lemmy.world · 3 months ago

If it studies something, it’s a study. If you feel defensiveness, you consider aggression. If you feel bias in one way, someone can feel bias in another way. If there’s an action, there’s a reaction.

pulsewidth@lemmy.world · 3 months ago

If there’s an action, there’s a reaction.

Sort of like how when people outsource all their critical thinking to AI, their ability for critical thinking atrophies?

brianpeiris@lemmy.ca · edit-2 3 months ago

ARC-AGI-3 Launch event - Shared publicly live on March 25 in San Francisco at Y Combinator HQ, featuring a fireside conversation between François Chollet (creator, ARC-AGI) and Sam Altman (CEO, OpenAI) on measuring intelligence on the path to AGI.

François Chollet is a software engineer, artificial intelligence researcher, and former Senior Staff Engineer at Google. Chollet is the creator of the Keras deep-learning library released in 2015.

Announcing ARC-AGI-3 - A benchmark that tests if AI can explore, learn, and adapt in unfamiliar situations. Humans score 100%. Frontier AI scores 0.26%.

Announcing ARC-AGI-3 - A benchmark that tests if AI can explore, learn, and adapt in unfamiliar situations. Humans score 100%. Frontier AI scores 0.26%.

Announcing ARC-AGI-3 | ARC Prize