🏢 Snowflake AI Research
In Case You Missed It: ARC 'Challenge' Is Not That Challenging
·2565 words·13 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Snowflake AI Research
LLM evaluation on multiple-choice questions is flawed; considering all options simultaneously, not individually, reveals much higher accuracy and challenges existing benchmark rankings.