Reinforcement Learning
Towards an Information Theoretic Framework of Context-Based Offline Meta-Reinforcement Learning
·2208 words·11 mins·
loading
·
loading
Reinforcement Learning
π’ Chinese University of Hong Kong
UNICORN: a unified framework reveals that existing offline meta-reinforcement learning algorithms optimize variations of mutual information, leading to improved generalization.
Time-Constrained Robust MDPs
·10005 words·47 mins·
loading
·
loading
AI Generated
Machine Learning
Reinforcement Learning
π’ IRT Saint-ExupΓ©ry
Time-Constrained Robust MDPs (TC-RMDPs) improve reinforcement learning by addressing limitations of traditional methods, offering a novel framework for handling real-world uncertainties and yielding m…
Thompson Sampling For Combinatorial Bandits: Polynomial Regret and Mismatched Sampling Paradox
·1315 words·7 mins·
loading
·
loading
Reinforcement Learning
π’ UniversitΓ© Paris-Saclay
A novel Thompson Sampling variant achieves polynomial regret for combinatorial bandits, solving a key limitation of existing methods and offering significantly improved performance.
The Value of Reward Lookahead in Reinforcement Learning
·1360 words·7 mins·
loading
·
loading
Reinforcement Learning
π’ CREST, ENSAE, IP Paris
Reinforcement learning agents can achieve significantly higher rewards by using advance knowledge of future rewards; this paper mathematically analyzes this advantage by computing the worst-case perfo…
The Surprising Ineffectiveness of Pre-Trained Visual Representations for Model-Based Reinforcement Learning
·2250 words·11 mins·
loading
·
loading
Machine Learning
Reinforcement Learning
π’ Bosch Center for Artificial Intelligence
Contrary to expectations, pre-trained visual representations surprisingly don’t improve model-based reinforcement learning’s sample efficiency or generalization; data diversity and network architectu…
The surprising efficiency of temporal difference learning for rare event prediction
·1614 words·8 mins·
loading
·
loading
Machine Learning
Reinforcement Learning
π’ Courant Institute of Mathematical Sciences, New York University
TD learning surprisingly outperforms Monte Carlo methods for rare event prediction in Markov chains, achieving relative accuracy with polynomially, instead of exponentially, many observed transitions.
The Sample-Communication Complexity Trade-off in Federated Q-Learning
·1654 words·8 mins·
loading
·
loading
Reinforcement Learning
π’ Carnegie Mellon University
Federated Q-learning achieves optimal sample & communication complexities simultaneously via Fed-DVR-Q, a novel algorithm.
The Power of Resets in Online Reinforcement Learning
·233 words·2 mins·
loading
·
loading
Reinforcement Learning
π’ Google Research
Leveraging local simulator resets in online reinforcement learning dramatically improves sample efficiency, especially for high-dimensional problems with general function approximation.
The Limits of Transfer Reinforcement Learning with Latent Low-rank Structure
·1762 words·9 mins·
loading
·
loading
Machine Learning
Reinforcement Learning
π’ Cornell University
This paper presents computationally efficient transfer reinforcement learning algorithms that remove the dependence on state/action space sizes while achieving minimax optimality.
The Ladder in Chaos: Improving Policy Learning by Harnessing the Parameter Evolving Path in A Low-dimensional Space
·2918 words·14 mins·
loading
·
loading
Machine Learning
Reinforcement Learning
π’ College of Intelligence and Computing, Tianjin University
Deep RL policy learning is improved by identifying and boosting key parameter update directions using a novel temporal SVD analysis, leading to more efficient and effective learning.
The Edge-of-Reach Problem in Offline Model-Based Reinforcement Learning
·2452 words·12 mins·
loading
·
loading
Machine Learning
Reinforcement Learning
π’ University of Oxford
Offline model-based RL methods fail as dynamics models improve; this paper reveals the ’edge-of-reach’ problem causing this and introduces RAVL, a simple solution ensuring robust performance.
The Dormant Neuron Phenomenon in Multi-Agent Reinforcement Learning Value Factorization
·2766 words·13 mins·
loading
·
loading
Machine Learning
Reinforcement Learning
π’ Xiamen University
ReBorn revitalizes multi-agent reinforcement learning by tackling dormant neurons, boosting network expressivity and learning efficiency.
The Collusion of Memory and Nonlinearity in Stochastic Approximation With Constant Stepsize
·1378 words·7 mins·
loading
·
loading
Reinforcement Learning
π’ Cornell University
Unlocking the mysteries of stochastic approximation with constant stepsize, this paper reveals how memory and nonlinearity interact to create bias, providing novel analysis and solutions for more accu…
Test Where Decisions Matter: Importance-driven Testing for Deep Reinforcement Learning
·3658 words·18 mins·
loading
·
loading
AI Generated
Machine Learning
Reinforcement Learning
π’ Graz University of Technology
Prioritize crucial decisions in deep RL policy testing with a novel model-based method for rigorous state importance ranking, enabling efficient safety and performance verification.
Temporal-Difference Learning Using Distributed Error Signals
·2668 words·13 mins·
loading
·
loading
AI Generated
Machine Learning
Reinforcement Learning
π’ University of Toronto
Artificial Dopamine (AD) algorithm achieves comparable performance to backpropagation methods in complex RL tasks by using only synchronously distributed per-layer TD errors, demonstrating the suffici…
Taming Heavy-Tailed Losses in Adversarial Bandits and the Best-of-Both-Worlds Setting
·418 words·2 mins·
loading
·
loading
Machine Learning
Reinforcement Learning
π’ Virginia Tech
This paper proposes novel algorithms achieving near-optimal regret in adversarial and logarithmic regret in stochastic multi-armed bandit settings with heavy-tailed losses, relaxing strong assumptions…
Taming 'data-hungry' reinforcement learning? Stability in continuous state-action spaces
·358 words·2 mins·
loading
·
loading
Machine Learning
Reinforcement Learning
π’ New York University
Reinforcement learning achieves unprecedented fast convergence rates in continuous state-action spaces by leveraging novel stability properties of Markov Decision Processes.
Symmetric Linear Bandits with Hidden Symmetry
·1466 words·7 mins·
loading
·
loading
Machine Learning
Reinforcement Learning
π’ University of Warwick
Researchers unveil a novel algorithm for high-dimensional symmetric linear bandits, achieving a regret bound of O(d^(2/3)T^(2/3)log(d)), surpassing limitations of existing approaches that assume expli…
Subwords as Skills: Tokenization for Sparse-Reward Reinforcement Learning
·2209 words·11 mins·
loading
·
loading
Machine Learning
Reinforcement Learning
π’ TTI-Chicago
This paper introduces Subwords as Skills (SaS), a fast and efficient skill extraction method for sparse-reward reinforcement learning that uses tokenization. SaS enables 1000x faster skill extraction…
Sub-optimal Experts mitigate Ambiguity in Inverse Reinforcement Learning
·2049 words·10 mins·
loading
·
loading
AI Generated
Machine Learning
Reinforcement Learning
π’ Politecnico Di Milano
Sub-optimal expert data improves Inverse Reinforcement Learning by significantly reducing ambiguity in reward function estimation.