🏢 Case.edu
Thinking Preference Optimization
·5794 words·28 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Machine Learning
Deep Learning
🏢 Case.edu
ThinkPO improves LLM reasoning by preferring longer CoT, boosting performance without new data.