Spotlight Reinforcement Learnings
2024
Exclusively Penalized Q-learning for Offline Reinforcement Learning
·2010 words·10 mins·
loading
·
loading
Reinforcement Learning
🏢 UNIST
EPQ, a novel offline RL algorithm, significantly reduces underestimation bias by selectively penalizing states prone to errors, improving performance over existing methods.
Diffusion for World Modeling: Visual Details Matter in Atari
·2473 words·12 mins·
loading
·
loading
Reinforcement Learning
🏢 University of Geneva
DIAMOND, a novel reinforcement learning agent using a diffusion world model, achieves state-of-the-art performance on the Atari 100k benchmark by leveraging visual details often ignored by discrete la…
DiffTORI: Differentiable Trajectory Optimization for Deep Reinforcement and Imitation Learning
·2324 words·11 mins·
loading
·
loading
Reinforcement Learning
🏢 University of California, San Diego
DiffTORI leverages differentiable trajectory optimization for superior deep reinforcement and imitation learning, outperforming prior state-of-the-art methods on high-dimensional robotic tasks.
Can Learned Optimization Make Reinforcement Learning Less Difficult?
·3614 words·17 mins·
loading
·
loading
Reinforcement Learning
🏢 University of Oxford
Learned optimizer OPEN tackles RL’s non-stationarity, plasticity loss, and exploration using meta-learning, significantly outperforming traditional and other learned optimizers.
Bigger, Regularized, Optimistic: scaling for compute and sample efficient continuous control
·3405 words·16 mins·
loading
·
loading
Reinforcement Learning
🏢 Warsaw University of Technology
BRO (Bigger, Regularized, Optimistic) achieves state-of-the-art sample efficiency in continuous control by scaling critic networks and using strong regularization with optimistic exploration.
Assouad, Fano, and Le Cam with Interaction: A Unifying Lower Bound Framework and Characterization for Bandit Learnability
·348 words·2 mins·
loading
·
loading
Reinforcement Learning
🏢 Massachusetts Institute of Technology
This paper presents a novel unified framework for deriving information-theoretic lower bounds for bandit learnability, unifying classical methods with interactive learning techniques and introducing a…
Adversarial Environment Design via Regret-Guided Diffusion Models
·2707 words·13 mins·
loading
·
loading
Reinforcement Learning
🏢 Seoul National University
Regret-Guided Diffusion Models enhance unsupervised environment design by generating challenging, diverse training environments that improve agent robustness and zero-shot generalization.
A Study of Plasticity Loss in On-Policy Deep Reinforcement Learning
·2749 words·13 mins·
loading
·
loading
Reinforcement Learning
🏢 Microsoft Research
On-policy deep RL agents suffer from plasticity loss, but this paper introduces ‘regenerative’ methods that consistently mitigate this, improving performance in challenging environments.