🏢 the Chinese University of Hong Kong, Shenzhen
RealCritic: Towards Effectiveness-Driven Evaluation of Language Model Critiques
·2423 words·12 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 the Chinese University of Hong Kong, Shenzhen
RealCritic: A new benchmark effectively evaluates language models’ critique abilities using a closed-loop methodology, showcasing advanced reasoning models’ superiority in self and iterative critique.