🏢 Sea AI Lab, Singapore
Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs
·2704 words·13 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 Sea AI Lab, Singapore
Chain of Preference Optimization (CPO) dramatically improves LLM reasoning by leveraging ToT’s search tree for efficient fine-tuning, achieving similar or better performance with significantly reduced…