Skip to main content

🏢 Sea AI Lab, Singapore

Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs
·2704 words·13 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Sea AI Lab, Singapore
Chain of Preference Optimization (CPO) dramatically improves LLM reasoning by leveraging ToT’s search tree for efficient fine-tuning, achieving similar or better performance with significantly reduced…