Skip to main content

Reinforcement Learning

Is Value Learning Really the Main Bottleneck in Offline RL?
·2601 words·13 mins· loading · loading
Machine Learning Reinforcement Learning 🏢 UC Berkeley
Offline RL’s performance often lags behind imitation learning, but this paper reveals that policy learning and generalization, not value function learning, are often the main bottlenecks.
Is Mamba Compatible with Trajectory Optimization in Offline Reinforcement Learning?
·3014 words·15 mins· loading · loading
Machine Learning Reinforcement Learning 🏢 National University of Defence Technology
Decision Mamba (DeMa) outperforms Decision Transformer (DT) in offline RL trajectory optimization with 30% fewer parameters in Atari and a quarter in MuJoCo, demonstrating the efficacy of Mamba’s line…
Is Behavior Cloning All You Need? Understanding Horizon in Imitation Learning
·1726 words·9 mins· loading · loading
Reinforcement Learning 🏢 Microsoft Research
Offline imitation learning achieves surprisingly strong performance, matching online methods’ efficiency under certain conditions, contradicting prior assumptions.
Inverse Factorized Soft Q-Learning for Cooperative Multi-agent Imitation Learning
·3040 words·15 mins· loading · loading
Machine Learning Reinforcement Learning 🏢 Singapore Management University
New multi-agent imitation learning algorithm (MIFQ) leverages inverse soft Q-learning and factorization for stable, efficient training, achieving state-of-the-art results on challenging benchmarks.
Interpretable Concept Bottlenecks to Align Reinforcement Learning Agents
·2372 words·12 mins· loading · loading
Machine Learning Reinforcement Learning 🏢 Computer Science Department, TU Darmstadt
Successive Concept Bottleneck Agents (SCoBots) improve reinforcement learning by integrating interpretable layers, enabling concept-level inspection and human-in-the-loop revisions to fix misalignment…
Integrating Suboptimal Human Knowledge with Hierarchical Reinforcement Learning for Large-Scale Multiagent Systems
·2222 words·11 mins· loading · loading
AI Generated Machine Learning Reinforcement Learning 🏢 University of Wollongong
Hierarchical Human Knowledge-guided MARL (hhk-MARL) framework accelerates large-scale multi-agent training by integrating suboptimal human knowledge, significantly improving performance and scalabilit…
Incremental Learning of Retrievable Skills For Efficient Continual Task Adaptation
·2821 words·14 mins· loading · loading
Machine Learning Reinforcement Learning 🏢 Carnegie Mellon University
IsCiL: a novel adapter-based continual imitation learning framework that efficiently adapts to new tasks by incrementally learning and retrieving reusable skills.
In-Trajectory Inverse Reinforcement Learning: Learn Incrementally From An Ongoing Trajectory
·1427 words·7 mins· loading · loading
Machine Learning Reinforcement Learning 🏢 Pennsylvania State University
MERIT-IRL: First in-trajectory IRL framework learns reward & policy incrementally from ongoing trajectories, guaranteeing sub-linear regret.
Improving Environment Novelty Quantification for Effective Unsupervised Environment Design
·2893 words·14 mins· loading · loading
Reinforcement Learning 🏢 Singapore Management University
Boosting AI generalization: CENIE framework quantifies environment novelty via state-action coverage, enhancing unsupervised environment design for robust generalization.
Improving Deep Reinforcement Learning by Reducing the Chain Effect of Value and Policy Churn
·3413 words·17 mins· loading · loading
Machine Learning Reinforcement Learning 🏢 Université De Montréal
Deep RL agents often suffer from instability due to the ‘chain effect’ of value and policy churn; this paper introduces CHAIN, a novel method to reduce this churn, thereby improving DRL performance an…
Improved Regret of Linear Ensemble Sampling
·1286 words·7 mins· loading · loading
AI Generated Machine Learning Reinforcement Learning 🏢 Seoul National University
Linear ensemble sampling achieves a state-of-the-art regret bound of Õ(d³/²√T) with a logarithmic ensemble size, closing the theory-practice gap in linear bandit algorithms.
Improved learning rates in multi-unit uniform price auctions
·442 words·3 mins· loading · loading
AI Generated Machine Learning Reinforcement Learning 🏢 University of Oxford
New modeling of bid space in multi-unit uniform price auctions achieves regret of Õ(K4/3T2/3) under bandit feedback, improving over prior work and closing the gap with discriminatory pricing.
Improved Bayes Regret Bounds for Multi-Task Hierarchical Bayesian Bandit Algorithms
·1596 words·8 mins· loading · loading
Machine Learning Reinforcement Learning 🏢 Hong Kong University of Science and Technology
This paper significantly improves Bayes regret bounds for hierarchical Bayesian bandit algorithms, achieving logarithmic regret in finite action settings and enhanced bounds in multi-task linear and c…
Implicit Curriculum in Procgen Made Explicit
·1613 words·8 mins· loading · loading
Reinforcement Learning 🏢 National University of Singapore
C-Procgen reveals implicit curriculum in Procgen’s multi-level training, showing learning shifts gradually from easy to hard contexts.
Identifying Selections for Unsupervised Subtask Discovery
·3702 words·18 mins· loading · loading
AI Generated Machine Learning Reinforcement Learning 🏢 Carnegie Mellon University
This paper introduces seq-NMF, a novel method for unsupervised subtask discovery in reinforcement learning that leverages selection variables to enhance generalization and data efficiency.
Identifying Latent State-Transition Processes for Individualized Reinforcement Learning
·2375 words·12 mins· loading · loading
Machine Learning Reinforcement Learning 🏢 Carnegie Mellon University
This study introduces a novel framework for individualized reinforcement learning, guaranteeing the identifiability of latent factors influencing state transitions and providing a practical method for…
Hybrid Reinforcement Learning Breaks Sample Size Barriers In Linear MDPs
·1464 words·7 mins· loading · loading
Machine Learning Reinforcement Learning 🏢 University of Pennsylvania
Hybrid RL algorithms achieve sharper error/regret bounds than existing offline/online RL methods in linear MDPs, improving sample efficiency without stringent assumptions on behavior policy quality.
How to Solve Contextual Goal-Oriented Problems with Offline Datasets?
·2005 words·10 mins· loading · loading
Machine Learning Reinforcement Learning 🏢 Microsoft Research
CODA: A novel method for solving contextual goal-oriented problems with offline datasets, using unlabeled trajectories and context-goal pairs to create a fully labeled dataset, outperforming other bas…
How Does Variance Shape the Regret in Contextual Bandits?
·334 words·2 mins· loading · loading
Machine Learning Reinforcement Learning 🏢 MIT
Low reward variance drastically improves contextual bandit regret, defying minimax assumptions and highlighting the crucial role of eluder dimension.
How does Inverse RL Scale to Large State Spaces? A Provably Efficient Approach
·1501 words·8 mins· loading · loading
Machine Learning Reinforcement Learning 🏢 Politecnico Di Milano
CATY-IRL: A novel, provably efficient algorithm solves Inverse Reinforcement Learning’s scalability issues for large state spaces, improving upon state-of-the-art methods.