Skip to main content

Reinforcement Learning

Sequential Decision Making with Expert Demonstrations under Unobserved Heterogeneity
·1771 words·9 mins· loading · loading
Machine Learning Reinforcement Learning 🏢 University of Toronto
ExPerior leverages expert demonstrations to enhance online decision-making, even when experts use hidden contextual information unseen by the learner.
Seek Commonality but Preserve Differences: Dissected Dynamics Modeling for Multi-modal Visual RL
·2815 words·14 mins· loading · loading
Machine Learning Reinforcement Learning 🏢 Peking University
Dissected Dynamics Modeling (DDM) excels at multi-modal visual reinforcement learning by cleverly separating and integrating common and unique features across different sensory inputs for more accurat…
Scalable Kernel Inverse Optimization
·1850 words·9 mins· loading · loading
AI Generated Machine Learning Reinforcement Learning 🏢 Delft Center for Systems and Control
Scalable Kernel Inverse Optimization (KIO) efficiently learns unknown objective functions from data using kernel methods and a novel Sequential Selection Optimization (SSO) algorithm, enabling applica…
Scalable Constrained Policy Optimization for Safe Multi-agent Reinforcement Learning
·1664 words·8 mins· loading · loading
AI Generated Machine Learning Reinforcement Learning 🏢 Peking University
Scalable MAPPO-L: Decentralized training with local interactions ensures safe, high-reward multi-agent systems, even with limited communication.
Scalable and Effective Arithmetic Tree Generation for Adder and Multiplier Designs
·2646 words·13 mins· loading · loading
Reinforcement Learning 🏢 University of Hong Kong
ArithTreeRL, a novel reinforcement learning approach, generates optimized arithmetic tree structures for adders and multipliers, significantly improving computational efficiency and reducing hardware …
Sample-Efficient Constrained Reinforcement Learning with General Parameterization
·263 words·2 mins· loading · loading
Machine Learning Reinforcement Learning 🏢 Indian Institute of Technology Kanpur
Accelerated Primal-Dual Natural Policy Gradient (PD-ANPG) algorithm achieves a theoretical lower bound sample complexity for solving general parameterized CMDPs, improving state-of-the-art by a factor…
Sample-Efficient Agnostic Boosting
·1303 words·7 mins· loading · loading
Machine Learning Reinforcement Learning 🏢 Amazon
Agnostic boosting gets a major efficiency upgrade! A new algorithm leverages sample reuse to drastically reduce the data needed for accurate learning, closing the gap with computationally expensive al…
Sample Complexity Reduction via Policy Difference Estimation in Tabular Reinforcement Learning
·406 words·2 mins· loading · loading
Reinforcement Learning 🏢 University of Washington
This paper reveals that estimating only policy differences, while effective in bandits, is insufficient for tabular reinforcement learning. However, it introduces a novel algorithm achieving near-opti…
Safety through feedback in Constrained RL
·2526 words·12 mins· loading · loading
Machine Learning Reinforcement Learning 🏢 Singapore Management University
Reinforcement Learning from Safety Feedback (RLSF) efficiently infers cost functions from trajectory-level feedback, enabling safe policy learning in complex environments.
Safe and Efficient: A Primal-Dual Method for Offline Convex CMDPs under Partial Data Coverage
·1556 words·8 mins· loading · loading
AI Generated Machine Learning Reinforcement Learning 🏢 ShanghaiTech University
A novel primal-dual method boosts offline safe reinforcement learning efficiency for convex CMDPs by using uncertainty parameters and achieving a sample complexity of O(1/(1-γ)√n).
ROIDICE: Offline Return on Investment Maximization for Efficient Decision Making
·2572 words·13 mins· loading · loading
AI Generated Machine Learning Reinforcement Learning 🏢 Korea University
ROIDICE: A novel offline reinforcement learning algorithm maximizes Return on Investment (ROI) by formulating the problem as linear fractional programming, yielding superior return-cost trade-offs.
Robust Reinforcement Learning with General Utility
·352 words·2 mins· loading · loading
AI Generated Machine Learning Reinforcement Learning 🏢 University of Maryland College Park
This paper introduces robust reinforcement learning with general utility, providing novel algorithms with convergence guarantees for training robust policies under environmental uncertainty, significa…
Robust Reinforcement Learning from Corrupted Human Feedback
·2358 words·12 mins· loading · loading
AI Generated Machine Learning Reinforcement Learning 🏢 Georgia Tech
R³M enhances reinforcement learning from human feedback by robustly handling corrupted preference labels, consistently learning the underlying reward and identifying outliers with minimal computationa…
Robot Policy Learning with Temporal Optimal Transport Reward
·2204 words·11 mins· loading · loading
Machine Learning Reinforcement Learning 🏢 McGill University
Temporal Optimal Transport (TemporalOT) reward enhances robot policy learning by incorporating temporal order information into Optimal Transport (OT)-based proxy rewards, leading to improved accuracy …
RL in Latent MDPs is Tractable: Online Guarantees via Off-Policy Evaluation
·435 words·3 mins· loading · loading
Machine Learning Reinforcement Learning 🏢 University of Wisconsin-Madison
First sample-efficient algorithm for LMDPs without separation assumptions, achieving near-optimal guarantees via novel off-policy evaluation.
Risk-sensitive control as inference with Rényi divergence
·1494 words·8 mins· loading · loading
Machine Learning Reinforcement Learning 🏢 University of Tokyo
Risk-sensitive control is recast as inference using Rényi divergence, yielding new algorithms and revealing equivalences between seemingly disparate methods.
RGMDT: Return-Gap-Minimizing Decision Tree Extraction in Non-Euclidean Metric Space
·2296 words·11 mins· loading · loading
Machine Learning Reinforcement Learning 🏢 the George Washington University
RGMDT algorithm extracts high-performing, interpretable decision trees from deep RL policies, guaranteeing near-optimal returns with size constraints, and extending to multi-agent settings.
Reward Machines for Deep RL in Noisy and Uncertain Environments
·2032 words·10 mins· loading · loading
Machine Learning Reinforcement Learning 🏢 University of Toronto
Deep RL agents can now effectively learn complex tasks even with noisy, uncertain sensor readings by exploiting the structure of Reward Machines.
Rethinking Optimal Transport in Offline Reinforcement Learning
·2060 words·10 mins· loading · loading
AI Generated Machine Learning Reinforcement Learning 🏢 AIRI
Offline RL enhanced via Optimal Transport: A new algorithm stitches best expert behaviors for efficient policy extraction.
Rethinking Model-based, Policy-based, and Value-based Reinforcement Learning via the Lens of Representation Complexity
·1788 words·9 mins· loading · loading
Machine Learning Reinforcement Learning 🏢 Peking University
Reinforcement learning paradigms exhibit a representation complexity hierarchy: models are easiest, then policies, and value functions are hardest to approximate.