Skip to main content

🏢 Case.edu

Thinking Preference Optimization
·5794 words·28 mins· loading · loading
AI Generated 🤗 Daily Papers Machine Learning Deep Learning 🏢 Case.edu
ThinkPO improves LLM reasoning by preferring longer CoT, boosting performance without new data.