Skip to main content

🏢 Beijing Jiaotong University

OpenRFT: Adapting Reasoning Foundation Model for Domain-specific Tasks with Reinforcement Fine-Tuning
·2034 words·10 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Beijing Jiaotong University
OpenRFT adapts generalist reasoning models for domain-specific tasks using reinforcement fine-tuning, overcoming data scarcity and lack of reasoning step data via question augmentation, synthesized re…
o1-Coder: an o1 Replication for Coding
·1672 words·8 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Beijing Jiaotong University
O1-CODER replicates OpenAI’s o1 model for coding, integrating reinforcement learning and Monte Carlo Tree Search to enhance System-2 thinking and generate high-quality code with reasoning steps.