🏢 Qiyuan Tech
Light-R1: Curriculum SFT, DPO and RL for Long COT from Scratch and Beyond
·1730 words·9 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Qiyuan Tech
Light-R1: Trains long COT models from scratch using curriculum SFT, DPO, and RL, achieving SOTA performance and strong generalization with limited resources.