↓Skip to main content

🏢 Australian Institute for Machine Learning, University of Adelaide

Weak-eval-Strong: Evaluating and Eliciting Lateral Thinking of LLMs with Situation Puzzles

26 September 2024·1675 words·8 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Australian Institute for Machine Learning, University of Adelaide

SPLAT, a new benchmark using situation puzzles, effectively evaluates and elicits lateral thinking in LLMs through a multi-turn player-judge framework, revealing significant performance improvements o…