Skip to main content

🏢 Australian Institute for Machine Learning, University of Adelaide

Weak-eval-Strong: Evaluating and Eliciting Lateral Thinking of LLMs with Situation Puzzles
·1675 words·8 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Australian Institute for Machine Learning, University of Adelaide
SPLAT, a new benchmark using situation puzzles, effectively evaluates and elicits lateral thinking in LLMs through a multi-turn player-judge framework, revealing significant performance improvements o…