Skip to main content

🏢 Qiyuan Tech

Light-R1: Curriculum SFT, DPO and RL for Long COT from Scratch and Beyond
·1730 words·9 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Qiyuan Tech
Light-R1: Trains long COT models from scratch using curriculum SFT, DPO, and RL, achieving SOTA performance and strong generalization with limited resources.