Reinforcement Learning

Is Value Learning Really the Main Bottleneck in Offline RL?

26 September 2024·2601 words·13 mins· loading · loading

Machine Learning Reinforcement Learning 🏢 UC Berkeley

Offline RL’s performance often lags behind imitation learning, but this paper reveals that policy learning and generalization, not value function learning, are often the main bottlenecks.

Is Mamba Compatible with Trajectory Optimization in Offline Reinforcement Learning?

26 September 2024·3014 words·15 mins· loading · loading

Machine Learning Reinforcement Learning 🏢 National University of Defence Technology

Decision Mamba (DeMa) outperforms Decision Transformer (DT) in offline RL trajectory optimization with 30% fewer parameters in Atari and a quarter in MuJoCo, demonstrating the efficacy of Mamba’s line…

Is Behavior Cloning All You Need? Understanding Horizon in Imitation Learning

26 September 2024·1726 words·9 mins· loading · loading

Reinforcement Learning 🏢 Microsoft Research

Offline imitation learning achieves surprisingly strong performance, matching online methods’ efficiency under certain conditions, contradicting prior assumptions.

Inverse Factorized Soft Q-Learning for Cooperative Multi-agent Imitation Learning

26 September 2024·3040 words·15 mins· loading · loading

Machine Learning Reinforcement Learning 🏢 Singapore Management University

New multi-agent imitation learning algorithm (MIFQ) leverages inverse soft Q-learning and factorization for stable, efficient training, achieving state-of-the-art results on challenging benchmarks.

Interpretable Concept Bottlenecks to Align Reinforcement Learning Agents

26 September 2024·2372 words·12 mins· loading · loading

Machine Learning Reinforcement Learning 🏢 Computer Science Department, TU Darmstadt

Successive Concept Bottleneck Agents (SCoBots) improve reinforcement learning by integrating interpretable layers, enabling concept-level inspection and human-in-the-loop revisions to fix misalignment…

Integrating Suboptimal Human Knowledge with Hierarchical Reinforcement Learning for Large-Scale Multiagent Systems

26 September 2024·2222 words·11 mins· loading · loading

AI Generated Machine Learning Reinforcement Learning 🏢 University of Wollongong

Hierarchical Human Knowledge-guided MARL (hhk-MARL) framework accelerates large-scale multi-agent training by integrating suboptimal human knowledge, significantly improving performance and scalabilit…

Incremental Learning of Retrievable Skills For Efficient Continual Task Adaptation

26 September 2024·2821 words·14 mins· loading · loading

Machine Learning Reinforcement Learning 🏢 Carnegie Mellon University

IsCiL: a novel adapter-based continual imitation learning framework that efficiently adapts to new tasks by incrementally learning and retrieving reusable skills.

In-Trajectory Inverse Reinforcement Learning: Learn Incrementally From An Ongoing Trajectory

26 September 2024·1427 words·7 mins· loading · loading

Machine Learning Reinforcement Learning 🏢 Pennsylvania State University

MERIT-IRL: First in-trajectory IRL framework learns reward & policy incrementally from ongoing trajectories, guaranteeing sub-linear regret.

Improving Environment Novelty Quantification for Effective Unsupervised Environment Design

26 September 2024·2893 words·14 mins· loading · loading

Reinforcement Learning 🏢 Singapore Management University

Boosting AI generalization: CENIE framework quantifies environment novelty via state-action coverage, enhancing unsupervised environment design for robust generalization.

Improving Deep Reinforcement Learning by Reducing the Chain Effect of Value and Policy Churn

26 September 2024·3413 words·17 mins· loading · loading

Machine Learning Reinforcement Learning 🏢 Université De Montréal

Deep RL agents often suffer from instability due to the ‘chain effect’ of value and policy churn; this paper introduces CHAIN, a novel method to reduce this churn, thereby improving DRL performance an…

Improved Regret of Linear Ensemble Sampling

26 September 2024·1286 words·7 mins· loading · loading

AI Generated Machine Learning Reinforcement Learning 🏢 Seoul National University

Linear ensemble sampling achieves a state-of-the-art regret bound of Õ(d³/²√T) with a logarithmic ensemble size, closing the theory-practice gap in linear bandit algorithms.

Improved learning rates in multi-unit uniform price auctions

26 September 2024·442 words·3 mins· loading · loading

AI Generated Machine Learning Reinforcement Learning 🏢 University of Oxford

New modeling of bid space in multi-unit uniform price auctions achieves regret of Õ(K4/3T2/3) under bandit feedback, improving over prior work and closing the gap with discriminatory pricing.

Improved Bayes Regret Bounds for Multi-Task Hierarchical Bayesian Bandit Algorithms

26 September 2024·1596 words·8 mins· loading · loading

Machine Learning Reinforcement Learning 🏢 Hong Kong University of Science and Technology

This paper significantly improves Bayes regret bounds for hierarchical Bayesian bandit algorithms, achieving logarithmic regret in finite action settings and enhanced bounds in multi-task linear and c…

Implicit Curriculum in Procgen Made Explicit

26 September 2024·1613 words·8 mins· loading · loading

Reinforcement Learning 🏢 National University of Singapore

C-Procgen reveals implicit curriculum in Procgen’s multi-level training, showing learning shifts gradually from easy to hard contexts.

Identifying Selections for Unsupervised Subtask Discovery

26 September 2024·3702 words·18 mins· loading · loading

AI Generated Machine Learning Reinforcement Learning 🏢 Carnegie Mellon University

This paper introduces seq-NMF, a novel method for unsupervised subtask discovery in reinforcement learning that leverages selection variables to enhance generalization and data efficiency.

Identifying Latent State-Transition Processes for Individualized Reinforcement Learning

26 September 2024·2375 words·12 mins· loading · loading

Machine Learning Reinforcement Learning 🏢 Carnegie Mellon University

This study introduces a novel framework for individualized reinforcement learning, guaranteeing the identifiability of latent factors influencing state transitions and providing a practical method for…

Hybrid Reinforcement Learning Breaks Sample Size Barriers In Linear MDPs

26 September 2024·1464 words·7 mins· loading · loading

Machine Learning Reinforcement Learning 🏢 University of Pennsylvania

Hybrid RL algorithms achieve sharper error/regret bounds than existing offline/online RL methods in linear MDPs, improving sample efficiency without stringent assumptions on behavior policy quality.

How to Solve Contextual Goal-Oriented Problems with Offline Datasets?

26 September 2024·2005 words·10 mins· loading · loading

Machine Learning Reinforcement Learning 🏢 Microsoft Research

CODA: A novel method for solving contextual goal-oriented problems with offline datasets, using unlabeled trajectories and context-goal pairs to create a fully labeled dataset, outperforming other bas…

How Does Variance Shape the Regret in Contextual Bandits?

26 September 2024·334 words·2 mins· loading · loading

Machine Learning Reinforcement Learning 🏢 MIT

Low reward variance drastically improves contextual bandit regret, defying minimax assumptions and highlighting the crucial role of eluder dimension.

How does Inverse RL Scale to Large State Spaces? A Provably Efficient Approach

26 September 2024·1501 words·8 mins· loading · loading

Machine Learning Reinforcement Learning 🏢 Politecnico Di Milano

CATY-IRL: A novel, provably efficient algorithm solves Inverse Reinforcement Learning’s scalability issues for large state spaces, improving upon state-of-the-art methods.