Reinforcement Learning

On Divergence Measures for Training GFlowNets

26 September 2024·2110 words·10 mins· loading · loading

AI Generated Machine Learning Reinforcement Learning 🏢 School of Applied Mathematics

Researchers enhanced Generative Flow Network training by introducing variance-reducing control variates for divergence-based learning objectives, accelerating convergence and improving accuracy.

Offline Reinforcement Learning with OOD State Correction and OOD Action Suppression

26 September 2024·2487 words·12 mins· loading · loading

Machine Learning Reinforcement Learning 🏢 Tsinghua University

Offline RL agents often fail in real-world scenarios due to unseen test states. SCAS, a novel method, simultaneously corrects OOD states to high-value, in-distribution states and suppresses risky OOD …

Offline Oracle-Efficient Learning for Contextual MDPs via Layerwise Exploration-Exploitation Tradeoff

26 September 2024·592 words·3 mins· loading · loading

AI Generated Machine Learning Reinforcement Learning 🏢 MIT

LOLIPOP: A novel algorithm achieving near-optimal regret for offline contextual Markov Decision Processes (CMDPs) using only O(H log T) offline density estimation oracle calls.

Offline Behavior Distillation

26 September 2024·1729 words·9 mins· loading · loading

AI Generated Machine Learning Reinforcement Learning 🏢 School of Computer Science, University of Sydney

This paper introduces Offline Behavior Distillation (OBD) to synthesize compact expert behavioral data from massive sub-optimal RL data, enabling faster policy learning.

Off-Dynamics Reinforcement Learning via Domain Adaptation and Reward Augmented Imitation

26 September 2024·6706 words·32 mins· loading · loading

AI Generated Machine Learning Reinforcement Learning 🏢 Johns Hopkins University

DARAIL, a novel algorithm, tackles off-dynamics reinforcement learning by combining reward modification with imitation learning to transfer a learned policy from a source to a target domain. This app…

Occupancy-based Policy Gradient: Estimation, Convergence, and Optimality

26 September 2024·1532 words·8 mins· loading · loading

AI Generated Machine Learning Reinforcement Learning 🏢 University of Illinois Urbana-Champaign

Model-free policy gradient methods using occupancy functions are developed for online and offline RL, achieving computational efficiency and handling arbitrary data distributions.

OASIS: Conditional Distribution Shaping for Offline Safe Reinforcement Learning

26 September 2024·2351 words·12 mins· loading · loading

Machine Learning Reinforcement Learning 🏢 Carnegie Mellon University

OASIS, a novel data-centric approach, shapes offline data distributions toward safer, higher-reward policies using a conditional diffusion model, outperforming existing offline safe RL methods.

Normalization and effective learning rates in reinforcement learning

26 September 2024·2714 words·13 mins· loading · loading

Machine Learning Reinforcement Learning 🏢 Google DeepMind

Normalize-and-Project (NaP) boosts reinforcement learning by stabilizing layer normalization, preventing plasticity loss, and enabling effective learning rate control.

Non-Stationary Learning of Neural Networks with Automatic Soft Parameter Reset

26 September 2024·4994 words·24 mins· loading · loading

AI Generated Machine Learning Reinforcement Learning 🏢 Google DeepMind

AI models struggle with changing data; this paper introduces Soft Resets, a novel learning approach that uses an adaptive drift to gracefully guide parameters toward initialization, improving adaptabi…

No-Regret Bandit Exploration based on Soft Tree Ensemble Model

26 September 2024·1480 words·7 mins· loading · loading

AI Generated Machine Learning Reinforcement Learning 🏢 LY Corporation

A novel stochastic bandit algorithm using soft tree ensemble models achieves lower cumulative regret than existing ReLU-based neural bandit algorithms, offering a constrained yet effective hypothesis …

No Representation, No Trust: Connecting Representation, Collapse, and Trust Issues in PPO

26 September 2024·5380 words·26 mins· loading · loading

Machine Learning Reinforcement Learning 🏢 CLAIRE, EPFL

Deep RL agents trained under non-stationarity suffer performance collapse due to representation degradation; this work reveals this in PPO and introduces Proximal Feature Optimization (PFO) to mitigat…

No Regrets: Investigating and Improving Regret Approximations for Curriculum Discovery

26 September 2024·4811 words·23 mins· loading · loading

AI Generated Machine Learning Reinforcement Learning 🏢 University of Oxford

AI agents learn better with well-designed training environments. This paper reveals flaws in current environment-selection methods and introduces Sampling for Learnability (SFL), a new approach that …

NeuralSolver: Learning Algorithms For Consistent and Efficient Extrapolation Across General Tasks

26 September 2024·4139 words·20 mins· loading · loading

Machine Learning Reinforcement Learning 🏢 INESC-ID

NeuralSolver: A novel recurrent solver efficiently and consistently extrapolates algorithms from smaller problems to larger ones, handling various problem sizes.

NeoRL: Efficient Exploration for Nonepisodic RL

26 September 2024·1407 words·7 mins· loading · loading

Reinforcement Learning 🏢 ETH Zurich

NEORL: Novel nonepisodic RL algorithm guarantees optimal average cost with sublinear regret for nonlinear systems!

Near-Optimal Dynamic Regret for Adversarial Linear Mixture MDPs

26 September 2024·308 words·2 mins· loading · loading

Machine Learning Reinforcement Learning 🏢 National Key Laboratory for Novel Software Technology, Nanjing University, China

Near-optimal dynamic regret is achieved for adversarial linear mixture MDPs with unknown transitions, bridging occupancy-measure and policy-based methods for superior performance.

Near-Optimal Distributionally Robust Reinforcement Learning with General $L_p$ Norms

26 September 2024·556 words·3 mins· loading · loading

AI Generated Machine Learning Reinforcement Learning 🏢 Ecole Polytechnique

This paper presents near-optimal sample complexity bounds for solving distributionally robust reinforcement learning problems with general Lp norms, showing robust RL can be more sample-efficient than…

Near-Minimax-Optimal Distributional Reinforcement Learning with a Generative Model

26 September 2024·1906 words·9 mins· loading · loading

Machine Learning Reinforcement Learning 🏢 Google DeepMind

New distributional RL algorithm (DCFP) achieves near-minimax optimality for return distribution estimation in the generative model regime.

N-agent Ad Hoc Teamwork

26 September 2024·3605 words·17 mins· loading · loading

AI Generated Machine Learning Reinforcement Learning 🏢 University of Texas at Austin

New algorithm, POAM, excels at multi-agent cooperation by adapting to diverse and changing teammates in dynamic scenarios.

Multi-Reward Best Policy Identification

26 September 2024·4494 words·22 mins· loading · loading

Machine Learning Reinforcement Learning 🏢 Ericsson AB

This paper introduces efficient algorithms, MR-NaS and DBMR-BPI, for identifying optimal policies across multiple reward functions in reinforcement learning, achieving competitive performance with the…

Multi-Agent Imitation Learning: Value is Easy, Regret is Hard

26 September 2024·1706 words·9 mins· loading · loading

AI Theory Reinforcement Learning 🏢 Carnegie Mellon University

In multi-agent imitation learning, achieving regret equivalence is harder than value equivalence; this paper introduces novel algorithms that efficiently minimize the regret gap under various assumpti…