Reinforcement Learning

Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model

31 March 2025·3072 words·15 mins· loading · loading

AI Generated 🤗 Daily Papers Machine Learning Reinforcement Learning 🏢 StepFun

Open-Reasoner-Zero pioneers scalable, accessible RL training for reasoning in LLMs, achieving superior performance with a minimalist approach.

Expanding RL with Verifiable Rewards Across Diverse Domains

31 March 2025·3127 words·15 mins· loading · loading

AI Generated 🤗 Daily Papers Machine Learning Reinforcement Learning 🏢 Tencent AI Lab

RL with Verifiable Rewards is now expanding to diverse domains like medicine!

Exploring Data Scaling Trends and Effects in Reinforcement Learning from Human Feedback

28 March 2025·3814 words·18 mins· loading · loading

AI Generated 🤗 Daily Papers Machine Learning Reinforcement Learning 🏢 ByteDance Seed

This paper enhances Reinforcement Learning from Human Feedback (RLHF) by tackling reward hacking and response diversity issues through improved data construction methods.

Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't

20 March 2025·1719 words·9 mins· loading · loading

AI Generated 🤗 Daily Papers Machine Learning Reinforcement Learning 🏢 VNU University of Science, Vietnam

RL fine-tuning enhances reasoning in small LLMs, achieving competitive performance with limited resources, despite optimization & length challenges.

DAPO: An Open-Source LLM Reinforcement Learning System at Scale

18 March 2025·3349 words·16 mins· loading · loading

AI Generated 🤗 Daily Papers Machine Learning Reinforcement Learning 🏢 Tsinghua University

DAPO: Open-sources a LLM reinforcement learning system that achieves SOTA AIME scores, fostering reproducible research at scale.

Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning

10 March 2025·4375 words·21 mins· loading · loading

AI Generated 🤗 Daily Papers Machine Learning Reinforcement Learning 🏢 Carnegie Mellon University

LLMs can now reason more efficiently!

Learning from Failures in Multi-Attempt Reinforcement Learning

4 March 2025·1948 words·10 mins· loading · loading

AI Generated 🤗 Daily Papers Machine Learning Reinforcement Learning 🏢 University of Cambridge

Multi-attempt RL refines LLMs, significantly boosting accuracy on math tasks by enabling them to learn from failures through user feedback.

Language Models can Self-Improve at State-Value Estimation for Better Search

4 March 2025·2765 words·13 mins· loading · loading

AI Generated 🤗 Daily Papers Machine Learning Reinforcement Learning 🏢 Georgia Institute of Technology

Self-Taught Lookahead improves LLM search via self-supervision, matching costly methods at a fraction of the compute!

MultiAgentBench: Evaluating the Collaboration and Competition of LLM agents

3 March 2025·3403 words·16 mins· loading · loading

AI Generated 🤗 Daily Papers Machine Learning Reinforcement Learning 🏢 University of Illinois Urbana-Champaign

MultiAgentBench: A benchmark for evaluating collaboration and competition in LLM agents across diverse, interactive scenarios with novel metrics and protocols.

Self-rewarding correction for mathematical reasoning

26 February 2025·3488 words·17 mins· loading · loading

AI Generated 🤗 Daily Papers Machine Learning Reinforcement Learning 🏢 University of Illinois Urbana-Champaign

LLM can now reason and correct itself using self-generated data, achieving performance on par with external reward models!

Lean and Mean: Decoupled Value Policy Optimization with Global Value Guidance

24 February 2025·3383 words·16 mins· loading · loading

AI Generated 🤗 Daily Papers Machine Learning Reinforcement Learning 🏢 Microsoft

DVPO: A lean RLHF framework that decouples value & policy optimization with global value guidance, cutting GPU use by 40% and training time by 35%.

TAG: A Decentralized Framework for Multi-Agent Hierarchical Reinforcement Learning

21 February 2025·1243 words·6 mins· loading · loading

AI Generated 🤗 Daily Papers Machine Learning Reinforcement Learning 🏢 Noah's Ark Lab, Huawei Technologies France

TAG: A decentralized framework for scalable multi-agent hierarchical reinforcement learning.

MLGym: A New Framework and Benchmark for Advancing AI Research Agents

20 February 2025·1911 words·9 mins· loading · loading

AI Generated 🤗 Daily Papers Machine Learning Reinforcement Learning 🏢 UC Santa Barbara

MLGYM: A new framework & benchmark to advance AI Research Agents

Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning

20 February 2025·3688 words·18 mins· loading · loading

AI Generated 🤗 Daily Papers Machine Learning Reinforcement Learning 🏢 Microsoft Research Asia

Logic-RL unlocks LLM reasoning via rule-based reinforcement learning, generalizing to math problems after training on logic puzzles.

AdaptiveStep: Automatically Dividing Reasoning Step through Model Confidence

19 February 2025·4758 words·23 mins· loading · loading

AI Generated 🤗 Daily Papers Machine Learning Reinforcement Learning 🏢 Nanjing University

AdaptiveStep: Divides reasoning steps automatically through model confidence, enhancing PRM training & performance.

S$^2$R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning

18 February 2025·3894 words·19 mins· loading · loading

AI Generated 🤗 Daily Papers Machine Learning Reinforcement Learning 🏢 Tencent

S2R: Teaches LLMs to self-verify and self-correct, boosting reasoning with efficient reinforcement learning.

Memory, Benchmark & Robots: A Benchmark for Solving Complex Tasks with Reinforcement Learning

14 February 2025·4399 words·21 mins· loading · loading

AI Generated 🤗 Daily Papers Machine Learning Reinforcement Learning 🏢 AIRI

MIKASA, a new benchmark for memory-intensive reinforcement learning, provides a unified framework for evaluating memory capabilities in diverse scenarios, including complex robotic manipulation tasks.

Agency Is Frame-Dependent

6 February 2025·400 words·2 mins· loading · loading

AI Generated 🤗 Daily Papers Machine Learning Reinforcement Learning 🏢 Google DeepMind

Agency, a key concept in AI, is shown to be relative to the observer’s perspective (frame-dependent), challenging traditional binary definitions and necessitating a more nuanced approach for AI system…

Improving Transformer World Models for Data-Efficient RL

3 February 2025·2775 words·14 mins· loading · loading

AI Generated 🤗 Daily Papers Machine Learning Reinforcement Learning 🏢 Google DeepMind

AI agents now master complex tasks with improved Transformer World Models, achieving a new state-of-the-art in data-efficient reinforcement learning.

ACECODER: Acing Coder RL via Automated Test-Case Synthesis

3 February 2025·3269 words·16 mins· loading · loading

AI Generated 🤗 Daily Papers Machine Learning Reinforcement Learning 🏢 University of Waterloo

AceCoder uses automated test-case synthesis to create a large-scale dataset for training reward models, enabling effective reinforcement learning to significantly boost code generation model performan…