Skip to main content

Reinforcement Learning

A theoretical case-study of Scalable Oversight in Hierarchical Reinforcement Learning
·414 words·2 mins· loading · loading
Machine Learning Reinforcement Learning 🏢 Carnegie Mellon University
Bounded human feedback hinders large AI model training. This paper introduces hierarchical reinforcement learning to enable scalable oversight, efficiently acquiring feedback and learning optimal poli…
A Study of Plasticity Loss in On-Policy Deep Reinforcement Learning
·2749 words·13 mins· loading · loading
Reinforcement Learning 🏢 Microsoft Research
On-policy deep RL agents suffer from plasticity loss, but this paper introduces ‘regenerative’ methods that consistently mitigate this, improving performance in challenging environments.
A Structure-Aware Framework for Learning Device Placements on Computation Graphs
·1503 words·8 mins· loading · loading
Machine Learning Reinforcement Learning 🏢 Intel Labs
Learn optimal device placement for neural networks with HSDAG, a novel framework boosting inference speed by up to 58.2%!
A Simple Framework for Generalization in Visual RL under Dynamic Scene Perturbations
·6610 words·32 mins· loading · loading
AI Generated Machine Learning Reinforcement Learning 🏢 Ewha Womans University
SimGRL: A novel framework boosts visual reinforcement learning’s generalization by mitigating imbalanced saliency and observational overfitting through a feature-level frame stack and shifted random o…
A Nearly Optimal and Low-Switching Algorithm for Reinforcement Learning with General Function Approximation
·317 words·2 mins· loading · loading
Machine Learning Reinforcement Learning 🏢 UC Los Angeles
MQL-UCB: Near-optimal reinforcement learning with low policy switching cost, solving the exploration-exploitation dilemma for complex models.
A Method for Evaluating Hyperparameter Sensitivity in Reinforcement Learning
·1860 words·9 mins· loading · loading
Machine Learning Reinforcement Learning 🏢 University of Alberta
New empirical methodology quantifies how much reinforcement learning algorithm performance relies on per-environment hyperparameter tuning, enabling better algorithm design.
A Best-of-both-worlds Algorithm for Bandits with Delayed Feedback with Robustness to Excessive Delays
·484 words·3 mins· loading · loading
Machine Learning Reinforcement Learning 🏢 Churney ApS
New best-of-both-worlds bandit algorithm tolerates arbitrary excessive delays, overcoming limitations of prior work that required prior knowledge of maximal delay and suffered linear regret dependence…