Reinforcement Learning
On Divergence Measures for Training GFlowNets
·2110 words·10 mins·
loading
·
loading
AI Generated
Machine Learning
Reinforcement Learning
🏢 School of Applied Mathematics
Researchers enhanced Generative Flow Network training by introducing variance-reducing control variates for divergence-based learning objectives, accelerating convergence and improving accuracy.
Offline Reinforcement Learning with OOD State Correction and OOD Action Suppression
·2487 words·12 mins·
loading
·
loading
Machine Learning
Reinforcement Learning
🏢 Tsinghua University
Offline RL agents often fail in real-world scenarios due to unseen test states. SCAS, a novel method, simultaneously corrects OOD states to high-value, in-distribution states and suppresses risky OOD …
Offline Oracle-Efficient Learning for Contextual MDPs via Layerwise Exploration-Exploitation Tradeoff
·592 words·3 mins·
loading
·
loading
AI Generated
Machine Learning
Reinforcement Learning
🏢 MIT
LOLIPOP: A novel algorithm achieving near-optimal regret for offline contextual Markov Decision Processes (CMDPs) using only O(H log T) offline density estimation oracle calls.
Offline Behavior Distillation
·1729 words·9 mins·
loading
·
loading
AI Generated
Machine Learning
Reinforcement Learning
🏢 School of Computer Science, University of Sydney
This paper introduces Offline Behavior Distillation (OBD) to synthesize compact expert behavioral data from massive sub-optimal RL data, enabling faster policy learning.
Off-Dynamics Reinforcement Learning via Domain Adaptation and Reward Augmented Imitation
·6706 words·32 mins·
loading
·
loading
AI Generated
Machine Learning
Reinforcement Learning
🏢 Johns Hopkins University
DARAIL, a novel algorithm, tackles off-dynamics reinforcement learning by combining reward modification with imitation learning to transfer a learned policy from a source to a target domain. This app…
Occupancy-based Policy Gradient: Estimation, Convergence, and Optimality
·1532 words·8 mins·
loading
·
loading
AI Generated
Machine Learning
Reinforcement Learning
🏢 University of Illinois Urbana-Champaign
Model-free policy gradient methods using occupancy functions are developed for online and offline RL, achieving computational efficiency and handling arbitrary data distributions.
OASIS: Conditional Distribution Shaping for Offline Safe Reinforcement Learning
·2351 words·12 mins·
loading
·
loading
Machine Learning
Reinforcement Learning
🏢 Carnegie Mellon University
OASIS, a novel data-centric approach, shapes offline data distributions toward safer, higher-reward policies using a conditional diffusion model, outperforming existing offline safe RL methods.
Normalization and effective learning rates in reinforcement learning
·2714 words·13 mins·
loading
·
loading
Machine Learning
Reinforcement Learning
🏢 Google DeepMind
Normalize-and-Project (NaP) boosts reinforcement learning by stabilizing layer normalization, preventing plasticity loss, and enabling effective learning rate control.
Non-Stationary Learning of Neural Networks with Automatic Soft Parameter Reset
·4994 words·24 mins·
loading
·
loading
AI Generated
Machine Learning
Reinforcement Learning
🏢 Google DeepMind
AI models struggle with changing data; this paper introduces Soft Resets, a novel learning approach that uses an adaptive drift to gracefully guide parameters toward initialization, improving adaptabi…
No-Regret Bandit Exploration based on Soft Tree Ensemble Model
·1480 words·7 mins·
loading
·
loading
AI Generated
Machine Learning
Reinforcement Learning
🏢 LY Corporation
A novel stochastic bandit algorithm using soft tree ensemble models achieves lower cumulative regret than existing ReLU-based neural bandit algorithms, offering a constrained yet effective hypothesis …
No Representation, No Trust: Connecting Representation, Collapse, and Trust Issues in PPO
·5380 words·26 mins·
loading
·
loading
Machine Learning
Reinforcement Learning
🏢 CLAIRE, EPFL
Deep RL agents trained under non-stationarity suffer performance collapse due to representation degradation; this work reveals this in PPO and introduces Proximal Feature Optimization (PFO) to mitigat…
No Regrets: Investigating and Improving Regret Approximations for Curriculum Discovery
·4811 words·23 mins·
loading
·
loading
AI Generated
Machine Learning
Reinforcement Learning
🏢 University of Oxford
AI agents learn better with well-designed training environments. This paper reveals flaws in current environment-selection methods and introduces Sampling for Learnability (SFL), a new approach that …
NeuralSolver: Learning Algorithms For Consistent and Efficient Extrapolation Across General Tasks
·4139 words·20 mins·
loading
·
loading
Machine Learning
Reinforcement Learning
🏢 INESC-ID
NeuralSolver: A novel recurrent solver efficiently and consistently extrapolates algorithms from smaller problems to larger ones, handling various problem sizes.
NeoRL: Efficient Exploration for Nonepisodic RL
·1407 words·7 mins·
loading
·
loading
Reinforcement Learning
🏢 ETH Zurich
NEORL: Novel nonepisodic RL algorithm guarantees optimal average cost with sublinear regret for nonlinear systems!
Near-Optimal Dynamic Regret for Adversarial Linear Mixture MDPs
·308 words·2 mins·
loading
·
loading
Machine Learning
Reinforcement Learning
🏢 National Key Laboratory for Novel Software Technology, Nanjing University, China
Near-optimal dynamic regret is achieved for adversarial linear mixture MDPs with unknown transitions, bridging occupancy-measure and policy-based methods for superior performance.
Near-Optimal Distributionally Robust Reinforcement Learning with General $L_p$ Norms
·556 words·3 mins·
loading
·
loading
AI Generated
Machine Learning
Reinforcement Learning
🏢 Ecole Polytechnique
This paper presents near-optimal sample complexity bounds for solving distributionally robust reinforcement learning problems with general Lp norms, showing robust RL can be more sample-efficient than…
Near-Minimax-Optimal Distributional Reinforcement Learning with a Generative Model
·1906 words·9 mins·
loading
·
loading
Machine Learning
Reinforcement Learning
🏢 Google DeepMind
New distributional RL algorithm (DCFP) achieves near-minimax optimality for return distribution estimation in the generative model regime.
N-agent Ad Hoc Teamwork
·3605 words·17 mins·
loading
·
loading
AI Generated
Machine Learning
Reinforcement Learning
🏢 University of Texas at Austin
New algorithm, POAM, excels at multi-agent cooperation by adapting to diverse and changing teammates in dynamic scenarios.
Multi-Reward Best Policy Identification
·4494 words·22 mins·
loading
·
loading
Machine Learning
Reinforcement Learning
🏢 Ericsson AB
This paper introduces efficient algorithms, MR-NaS and DBMR-BPI, for identifying optimal policies across multiple reward functions in reinforcement learning, achieving competitive performance with the…
Multi-Agent Imitation Learning: Value is Easy, Regret is Hard
·1706 words·9 mins·
loading
·
loading
AI Theory
Reinforcement Learning
🏢 Carnegie Mellon University
In multi-agent imitation learning, achieving regret equivalence is harder than value equivalence; this paper introduces novel algorithms that efficiently minimize the regret gap under various assumpti…