Reinforcement Learning

A theoretical case-study of Scalable Oversight in Hierarchical Reinforcement Learning

26 September 2024·414 words·2 mins· loading · loading

Machine Learning Reinforcement Learning 🏢 Carnegie Mellon University

Bounded human feedback hinders large AI model training. This paper introduces hierarchical reinforcement learning to enable scalable oversight, efficiently acquiring feedback and learning optimal poli…

A Study of Plasticity Loss in On-Policy Deep Reinforcement Learning

26 September 2024·2749 words·13 mins· loading · loading

Reinforcement Learning 🏢 Microsoft Research

On-policy deep RL agents suffer from plasticity loss, but this paper introduces ‘regenerative’ methods that consistently mitigate this, improving performance in challenging environments.

A Structure-Aware Framework for Learning Device Placements on Computation Graphs

26 September 2024·1503 words·8 mins· loading · loading

Machine Learning Reinforcement Learning 🏢 Intel Labs

Learn optimal device placement for neural networks with HSDAG, a novel framework boosting inference speed by up to 58.2%!

A Simple Framework for Generalization in Visual RL under Dynamic Scene Perturbations

26 September 2024·6610 words·32 mins· loading · loading

AI Generated Machine Learning Reinforcement Learning 🏢 Ewha Womans University

SimGRL: A novel framework boosts visual reinforcement learning’s generalization by mitigating imbalanced saliency and observational overfitting through a feature-level frame stack and shifted random o…

A Nearly Optimal and Low-Switching Algorithm for Reinforcement Learning with General Function Approximation

26 September 2024·317 words·2 mins· loading · loading

Machine Learning Reinforcement Learning 🏢 UC Los Angeles

MQL-UCB: Near-optimal reinforcement learning with low policy switching cost, solving the exploration-exploitation dilemma for complex models.

A Method for Evaluating Hyperparameter Sensitivity in Reinforcement Learning

26 September 2024·1860 words·9 mins· loading · loading

Machine Learning Reinforcement Learning 🏢 University of Alberta

New empirical methodology quantifies how much reinforcement learning algorithm performance relies on per-environment hyperparameter tuning, enabling better algorithm design.

A Best-of-both-worlds Algorithm for Bandits with Delayed Feedback with Robustness to Excessive Delays

26 September 2024·484 words·3 mins· loading · loading

Machine Learning Reinforcement Learning 🏢 Churney ApS

New best-of-both-worlds bandit algorithm tolerates arbitrary excessive delays, overcoming limitations of prior work that required prior knowledge of maximal delay and suffered linear regret dependence…