🏢 University College London
BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games
·2774 words·14 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 University College London
BALROG benchmark rigorously evaluates LLMs’/VLMs’ abilities in complex games, revealing their strengths and weaknesses in long-term planning and decision-making, highlighting the need for improved vis…