Reinforcement Learning

Towards an Information Theoretic Framework of Context-Based Offline Meta-Reinforcement Learning

26 September 2024·2208 words·11 mins· loading · loading

Reinforcement Learning 🏢 Chinese University of Hong Kong

UNICORN: a unified framework reveals that existing offline meta-reinforcement learning algorithms optimize variations of mutual information, leading to improved generalization.

Time-Constrained Robust MDPs

26 September 2024·10005 words·47 mins· loading · loading

AI Generated Machine Learning Reinforcement Learning 🏢 IRT Saint-Exupéry

Time-Constrained Robust MDPs (TC-RMDPs) improve reinforcement learning by addressing limitations of traditional methods, offering a novel framework for handling real-world uncertainties and yielding m…

Thompson Sampling For Combinatorial Bandits: Polynomial Regret and Mismatched Sampling Paradox

26 September 2024·1315 words·7 mins· loading · loading

Reinforcement Learning 🏢 Université Paris-Saclay

A novel Thompson Sampling variant achieves polynomial regret for combinatorial bandits, solving a key limitation of existing methods and offering significantly improved performance.

The Value of Reward Lookahead in Reinforcement Learning

26 September 2024·1360 words·7 mins· loading · loading

Reinforcement Learning 🏢 CREST, ENSAE, IP Paris

Reinforcement learning agents can achieve significantly higher rewards by using advance knowledge of future rewards; this paper mathematically analyzes this advantage by computing the worst-case perfo…

The Surprising Ineffectiveness of Pre-Trained Visual Representations for Model-Based Reinforcement Learning

26 September 2024·2250 words·11 mins· loading · loading

Machine Learning Reinforcement Learning 🏢 Bosch Center for Artificial Intelligence

Contrary to expectations, pre-trained visual representations surprisingly don’t improve model-based reinforcement learning’s sample efficiency or generalization; data diversity and network architectu…

The surprising efficiency of temporal difference learning for rare event prediction

26 September 2024·1614 words·8 mins· loading · loading

Machine Learning Reinforcement Learning 🏢 Courant Institute of Mathematical Sciences, New York University

TD learning surprisingly outperforms Monte Carlo methods for rare event prediction in Markov chains, achieving relative accuracy with polynomially, instead of exponentially, many observed transitions.

The Sample-Communication Complexity Trade-off in Federated Q-Learning

26 September 2024·1654 words·8 mins· loading · loading

Reinforcement Learning 🏢 Carnegie Mellon University

Federated Q-learning achieves optimal sample & communication complexities simultaneously via Fed-DVR-Q, a novel algorithm.

The Power of Resets in Online Reinforcement Learning

26 September 2024·233 words·2 mins· loading · loading

Reinforcement Learning 🏢 Google Research

Leveraging local simulator resets in online reinforcement learning dramatically improves sample efficiency, especially for high-dimensional problems with general function approximation.

The Limits of Transfer Reinforcement Learning with Latent Low-rank Structure

26 September 2024·1762 words·9 mins· loading · loading

Machine Learning Reinforcement Learning 🏢 Cornell University

This paper presents computationally efficient transfer reinforcement learning algorithms that remove the dependence on state/action space sizes while achieving minimax optimality.

The Ladder in Chaos: Improving Policy Learning by Harnessing the Parameter Evolving Path in A Low-dimensional Space

26 September 2024·2918 words·14 mins· loading · loading

Machine Learning Reinforcement Learning 🏢 College of Intelligence and Computing, Tianjin University

Deep RL policy learning is improved by identifying and boosting key parameter update directions using a novel temporal SVD analysis, leading to more efficient and effective learning.

The Edge-of-Reach Problem in Offline Model-Based Reinforcement Learning

26 September 2024·2452 words·12 mins· loading · loading

Machine Learning Reinforcement Learning 🏢 University of Oxford

Offline model-based RL methods fail as dynamics models improve; this paper reveals the ’edge-of-reach’ problem causing this and introduces RAVL, a simple solution ensuring robust performance.

The Dormant Neuron Phenomenon in Multi-Agent Reinforcement Learning Value Factorization

26 September 2024·2766 words·13 mins· loading · loading

Machine Learning Reinforcement Learning 🏢 Xiamen University

ReBorn revitalizes multi-agent reinforcement learning by tackling dormant neurons, boosting network expressivity and learning efficiency.

The Collusion of Memory and Nonlinearity in Stochastic Approximation With Constant Stepsize

26 September 2024·1378 words·7 mins· loading · loading

Reinforcement Learning 🏢 Cornell University

Unlocking the mysteries of stochastic approximation with constant stepsize, this paper reveals how memory and nonlinearity interact to create bias, providing novel analysis and solutions for more accu…

Test Where Decisions Matter: Importance-driven Testing for Deep Reinforcement Learning

26 September 2024·3658 words·18 mins· loading · loading

AI Generated Machine Learning Reinforcement Learning 🏢 Graz University of Technology

Prioritize crucial decisions in deep RL policy testing with a novel model-based method for rigorous state importance ranking, enabling efficient safety and performance verification.

Temporal-Difference Learning Using Distributed Error Signals

26 September 2024·2668 words·13 mins· loading · loading

AI Generated Machine Learning Reinforcement Learning 🏢 University of Toronto

Artificial Dopamine (AD) algorithm achieves comparable performance to backpropagation methods in complex RL tasks by using only synchronously distributed per-layer TD errors, demonstrating the suffici…

Taming Heavy-Tailed Losses in Adversarial Bandits and the Best-of-Both-Worlds Setting

26 September 2024·418 words·2 mins· loading · loading

Machine Learning Reinforcement Learning 🏢 Virginia Tech

This paper proposes novel algorithms achieving near-optimal regret in adversarial and logarithmic regret in stochastic multi-armed bandit settings with heavy-tailed losses, relaxing strong assumptions…

Taming 'data-hungry' reinforcement learning? Stability in continuous state-action spaces

26 September 2024·358 words·2 mins· loading · loading

Machine Learning Reinforcement Learning 🏢 New York University

Reinforcement learning achieves unprecedented fast convergence rates in continuous state-action spaces by leveraging novel stability properties of Markov Decision Processes.

Symmetric Linear Bandits with Hidden Symmetry

26 September 2024·1466 words·7 mins· loading · loading

Machine Learning Reinforcement Learning 🏢 University of Warwick

Researchers unveil a novel algorithm for high-dimensional symmetric linear bandits, achieving a regret bound of O(d^(2/3)T^(2/3)log(d)), surpassing limitations of existing approaches that assume expli…

Subwords as Skills: Tokenization for Sparse-Reward Reinforcement Learning

26 September 2024·2209 words·11 mins· loading · loading

Machine Learning Reinforcement Learning 🏢 TTI-Chicago

This paper introduces Subwords as Skills (SaS), a fast and efficient skill extraction method for sparse-reward reinforcement learning that uses tokenization. SaS enables 1000x faster skill extraction…

Sub-optimal Experts mitigate Ambiguity in Inverse Reinforcement Learning

26 September 2024·2049 words·10 mins· loading · loading

AI Generated Machine Learning Reinforcement Learning 🏢 Politecnico Di Milano

Sub-optimal expert data improves Inverse Reinforcement Learning by significantly reducing ambiguity in reward function estimation.