Skip to main content

🏢 the Chinese University of Hong Kong, Shenzhen

RealCritic: Towards Effectiveness-Driven Evaluation of Language Model Critiques
·2423 words·12 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 the Chinese University of Hong Kong, Shenzhen
RealCritic: A new benchmark effectively evaluates language models’ critique abilities using a closed-loop methodology, showcasing advanced reasoning models’ superiority in self and iterative critique.