Reinforcement Learning
Can Learned Optimization Make Reinforcement Learning Less Difficult?
·3614 words·17 mins·
loading
·
loading
Reinforcement Learning
🏢 University of Oxford
Learned optimizer OPEN tackles RL’s non-stationarity, plasticity loss, and exploration using meta-learning, significantly outperforming traditional and other learned optimizers.
C-GAIL: Stabilizing Generative Adversarial Imitation Learning with Control Theory
·1787 words·9 mins·
loading
·
loading
Machine Learning
Reinforcement Learning
🏢 Tsinghua University
C-GAIL stabilizes Generative Adversarial Imitation Learning by applying control theory, resulting in faster convergence, reduced oscillation, and better expert policy matching.
Bridging Model-Based Optimization and Generative Modeling via Conservative Fine-Tuning of Diffusion Models
·2078 words·10 mins·
loading
·
loading
Machine Learning
Reinforcement Learning
🏢 Genentech
BRAID: A novel, conservative fine-tuning method surpasses offline design optimization by cleverly combining generative diffusion models with reward models, preventing over-optimization and generating …
Boosting Sample Efficiency and Generalization in Multi-agent Reinforcement Learning via Equivariance
·3386 words·16 mins·
loading
·
loading
AI Generated
Machine Learning
Reinforcement Learning
🏢 University of Maryland
Equivariant Graph Neural Networks boost multi-agent reinforcement learning by improving sample efficiency and generalization, overcoming inherent exploration biases.
Bigger, Regularized, Optimistic: scaling for compute and sample efficient continuous control
·3405 words·16 mins·
loading
·
loading
Reinforcement Learning
🏢 Warsaw University of Technology
BRO (Bigger, Regularized, Optimistic) achieves state-of-the-art sample efficiency in continuous control by scaling critic networks and using strong regularization with optimistic exploration.
Beyond task diversity: provable representation transfer for sequential multitask linear bandits
·1405 words·7 mins·
loading
·
loading
Machine Learning
Reinforcement Learning
🏢 University of Arizona
Lifelong learning in linear bandits gets a boost! A new algorithm, BOSS, achieves low regret without the usual ‘task diversity’ assumption, opening doors for more realistic sequential multi-task lear…
BECAUSE: Bilinear Causal Representation for Generalizable Offline Model-based Reinforcement Learning
·2825 words·14 mins·
loading
·
loading
Machine Learning
Reinforcement Learning
🏢 Carnegie Mellon University
BECAUSE: a novel algorithm for generalizable offline model-based reinforcement learning that leverages bilinear causal representation to mitigate objective mismatch caused by confounders in offline da…
Beating Adversarial Low-Rank MDPs with Unknown Transition and Bandit Feedback
·355 words·2 mins·
loading
·
loading
AI Generated
Machine Learning
Reinforcement Learning
🏢 Google Research
New algorithms conquer adversarial low-rank MDPs, improving regret bounds for unknown transitions and bandit feedback.
Bandits with Ranking Feedback
·1499 words·8 mins·
loading
·
loading
Machine Learning
Reinforcement Learning
🏢 Politecnico Di Milano
This paper introduces ‘bandits with ranking feedback,’ a novel bandit variation providing ranked feedback instead of numerical rewards. It proves instance-dependent cases require superlogarithmic reg…
Bandits with Abstention under Expert Advice
·2058 words·10 mins·
loading
·
loading
AI Generated
Machine Learning
Reinforcement Learning
🏢 Alan Turing Institute
The Confidence-Rated Bandits with Abstentions (CBA) algorithm significantly improves reward bounds for prediction with expert advice by strategically leveraging an abstention action.
Balancing Context Length and Mixing Times for Reinforcement Learning at Scale
·1724 words·9 mins·
loading
·
loading
Machine Learning
Reinforcement Learning
🏢 IBM Research
Longer context in RL boosts generalization but slows down learning; this paper reveals the crucial tradeoff and offers theoretical insights.
Avoiding Undesired Future with Minimal Cost in Non-Stationary Environments
·2100 words·10 mins·
loading
·
loading
Machine Learning
Reinforcement Learning
🏢 National Key Laboratory for Novel Software Technology, Nanjing University, China
AUF-MICNS: A novel sequential method efficiently solves the avoiding undesired future problem by dynamically updating influence relations in non-stationary environments while minimizing action costs.
Autoregressive Policy Optimization for Constrained Allocation Tasks
·2331 words·11 mins·
loading
·
loading
AI Generated
Machine Learning
Reinforcement Learning
🏢 Munich Center for Machine Learning
PASPO: a novel autoregressive policy optimization method for constrained allocation tasks guarantees constraint satisfaction and outperforms existing methods.
Assouad, Fano, and Le Cam with Interaction: A Unifying Lower Bound Framework and Characterization for Bandit Learnability
·348 words·2 mins·
loading
·
loading
Reinforcement Learning
🏢 Massachusetts Institute of Technology
This paper presents a novel unified framework for deriving information-theoretic lower bounds for bandit learnability, unifying classical methods with interactive learning techniques and introducing a…
Artificial Generational Intelligence: Cultural Accumulation in Reinforcement Learning
·1937 words·10 mins·
loading
·
loading
Machine Learning
Reinforcement Learning
🏢 University of Oxford
Reinforcement learning agents achieve emergent cultural accumulation by balancing social and independent learning, outperforming single-lifetime agents.
An Offline Adaptation Framework for Constrained Multi-Objective Reinforcement Learning
·2513 words·12 mins·
loading
·
loading
Machine Learning
Reinforcement Learning
🏢 Sun Yat-Sen University
This work introduces PDOA, an offline adaptation framework for constrained multi-objective RL, using demonstrations instead of manually designed preferences to infer optimal policies while satisfying …
An Analytical Study of Utility Functions in Multi-Objective Reinforcement Learning
·210 words·1 min·
loading
·
loading
Machine Learning
Reinforcement Learning
🏢 Artificial Intelligence Research Institute (IIIA-CSIC)
This paper provides novel theoretical analyses of utility functions in MORL, characterizing preferences and functions guaranteeing optimal policies.
An Adaptive Approach for Infinitely Many-armed Bandits under Generalized Rotting Constraints
·1703 words·8 mins·
loading
·
loading
Machine Learning
Reinforcement Learning
🏢 Seoul National University
Adaptive algorithm achieves tight regret bounds for infinitely many-armed bandits under generalized rotting constraints, addressing the challenge of decreasing rewards over time.
Amortizing intractable inference in diffusion models for vision, language, and control
·5979 words·29 mins·
loading
·
loading
AI Generated
Machine Learning
Reinforcement Learning
🏢 Mila, Université De Montréal
Amortized sampling from complex posteriors using diffusion models is achieved via a novel data-free learning objective, Relative Trajectory Balance (RTB). RTB’s asymptotic correctness is proven, offe…
Amortized Planning with Large-Scale Transformers: A Case Study on Chess
·3346 words·16 mins·
loading
·
loading
Machine Learning
Reinforcement Learning
🏢 Google DeepMind
Large-scale transformers achieve grandmaster-level chess play via supervised learning on a new 10M game benchmark dataset, demonstrating impressive generalization beyond memorization.