Reinforcement Learning
Exploring Data Scaling Trends and Effects in Reinforcement Learning from Human Feedback
·3814 words·18 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Machine Learning
Reinforcement Learning
🏢 ByteDance Seed
This paper enhances Reinforcement Learning from Human Feedback (RLHF) by tackling reward hacking and response diversity issues through improved data construction methods.
Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't
·1719 words·9 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Machine Learning
Reinforcement Learning
🏢 VNU University of Science, Vietnam
RL fine-tuning enhances reasoning in small LLMs, achieving competitive performance with limited resources, despite optimization & length challenges.
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
·3349 words·16 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Machine Learning
Reinforcement Learning
🏢 Tsinghua University
DAPO: Open-sources a LLM reinforcement learning system that achieves SOTA AIME scores, fostering reproducible research at scale.
Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning
·4375 words·21 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Machine Learning
Reinforcement Learning
🏢 Carnegie Mellon University
LLMs can now reason more efficiently!
Learning from Failures in Multi-Attempt Reinforcement Learning
·1948 words·10 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Machine Learning
Reinforcement Learning
🏢 University of Cambridge
Multi-attempt RL refines LLMs, significantly boosting accuracy on math tasks by enabling them to learn from failures through user feedback.
Language Models can Self-Improve at State-Value Estimation for Better Search
·2765 words·13 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Machine Learning
Reinforcement Learning
🏢 Georgia Institute of Technology
Self-Taught Lookahead improves LLM search via self-supervision, matching costly methods at a fraction of the compute!
MultiAgentBench: Evaluating the Collaboration and Competition of LLM agents
·3403 words·16 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Machine Learning
Reinforcement Learning
🏢 University of Illinois Urbana-Champaign
MultiAgentBench: A benchmark for evaluating collaboration and competition in LLM agents across diverse, interactive scenarios with novel metrics and protocols.
Self-rewarding correction for mathematical reasoning
·3488 words·17 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Machine Learning
Reinforcement Learning
🏢 University of Illinois Urbana-Champaign
LLM can now reason and correct itself using self-generated data, achieving performance on par with external reward models!
Lean and Mean: Decoupled Value Policy Optimization with Global Value Guidance
·3383 words·16 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Machine Learning
Reinforcement Learning
🏢 Microsoft
DVPO: A lean RLHF framework that decouples value & policy optimization with global value guidance, cutting GPU use by 40% and training time by 35%.
TAG: A Decentralized Framework for Multi-Agent Hierarchical Reinforcement Learning
·1243 words·6 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Machine Learning
Reinforcement Learning
🏢 Noah's Ark Lab, Huawei Technologies France
TAG: A decentralized framework for scalable multi-agent hierarchical reinforcement learning.
MLGym: A New Framework and Benchmark for Advancing AI Research Agents
·1911 words·9 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Machine Learning
Reinforcement Learning
🏢 UC Santa Barbara
MLGYM: A new framework & benchmark to advance AI Research Agents
Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning
·3688 words·18 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Machine Learning
Reinforcement Learning
🏢 Microsoft Research Asia
Logic-RL unlocks LLM reasoning via rule-based reinforcement learning, generalizing to math problems after training on logic puzzles.
AdaptiveStep: Automatically Dividing Reasoning Step through Model Confidence
·4758 words·23 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Machine Learning
Reinforcement Learning
🏢 Nanjing University
AdaptiveStep: Divides reasoning steps automatically through model confidence, enhancing PRM training & performance.
S$^2$R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning
·3894 words·19 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Machine Learning
Reinforcement Learning
🏢 Tencent
S2R: Teaches LLMs to self-verify and self-correct, boosting reasoning with efficient reinforcement learning.
Memory, Benchmark & Robots: A Benchmark for Solving Complex Tasks with Reinforcement Learning
·4399 words·21 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Machine Learning
Reinforcement Learning
🏢 AIRI
MIKASA, a new benchmark for memory-intensive reinforcement learning, provides a unified framework for evaluating memory capabilities in diverse scenarios, including complex robotic manipulation tasks.
Agency Is Frame-Dependent
·400 words·2 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Machine Learning
Reinforcement Learning
🏢 Google DeepMind
Agency, a key concept in AI, is shown to be relative to the observer’s perspective (frame-dependent), challenging traditional binary definitions and necessitating a more nuanced approach for AI system…
Improving Transformer World Models for Data-Efficient RL
·2775 words·14 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Machine Learning
Reinforcement Learning
🏢 Google DeepMind
AI agents now master complex tasks with improved Transformer World Models, achieving a new state-of-the-art in data-efficient reinforcement learning.
ACECODER: Acing Coder RL via Automated Test-Case Synthesis
·3269 words·16 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Machine Learning
Reinforcement Learning
🏢 University of Waterloo
AceCoder uses automated test-case synthesis to create a large-scale dataset for training reward models, enabling effective reinforcement learning to significantly boost code generation model performan…
SRMT: Shared Memory for Multi-agent Lifelong Pathfinding
·3632 words·18 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Machine Learning
Reinforcement Learning
🏢 AIRI
SRMT: Shared Recurrent Memory Transformer boosts multi-agent coordination by implicitly sharing information via a global memory, significantly outperforming baselines in complex pathfinding tasks.