↓Skip to main content

🏢 University College London

BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games

20 November 2024·2774 words·14 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 University College London

BALROG benchmark rigorously evaluates LLMs’/VLMs’ abilities in complex games, revealing their strengths and weaknesses in long-term planning and decision-making, highlighting the need for improved vis…