Skip to main content

Reinforcement Learning

Amortized Active Causal Induction with Deep Reinforcement Learning
·3383 words·16 mins· loading · loading
AI Generated Machine Learning Reinforcement Learning ๐Ÿข University of Oxford
CAASL: An amortized active intervention design policy trained via reinforcement learning, enabling adaptive, real-time causal graph inference without likelihood access.
Almost Minimax Optimal Best Arm Identification in Piecewise Stationary Linear Bandits
·1649 words·8 mins· loading · loading
Machine Learning Reinforcement Learning ๐Ÿข National University of Singapore
PSษ›BAI+ is a near-optimal algorithm for best arm identification in piecewise stationary linear bandits, efficiently detecting changepoints and aligning contexts for improved accuracy and minimal sampl…
Aligning Individual and Collective Objectives in Multi-Agent Cooperation
·1828 words·9 mins· loading · loading
Machine Learning Reinforcement Learning ๐Ÿข University of Manchester
AI agents learn to cooperate effectively even when individual and group goals clash using the new Altruistic Gradient Adjustment (AgA) algorithm.
Aligning Diffusion Behaviors with Q-functions for Efficient Continuous Control
·1969 words·10 mins· loading · loading
Machine Learning Reinforcement Learning ๐Ÿข Tsinghua University
Efficient Diffusion Alignment (EDA) leverages pretrained diffusion models and Q-functions for efficient continuous control, exceeding all baselines with minimal annotation.
Adversarially Trained Weighted Actor-Critic for Safe Offline Reinforcement Learning
·2127 words·10 mins· loading · loading
AI Generated Machine Learning Reinforcement Learning ๐Ÿข Washington State University
WSAC, a novel algorithm, robustly optimizes safe offline RL policies using adversarial training, guaranteeing improved performance over reference policies with limited data.
Adversarial Environment Design via Regret-Guided Diffusion Models
·2707 words·13 mins· loading · loading
Reinforcement Learning ๐Ÿข Seoul National University
Regret-Guided Diffusion Models enhance unsupervised environment design by generating challenging, diverse training environments that improve agent robustness and zero-shot generalization.
Adaptive Preference Scaling for Reinforcement Learning with Human Feedback
·3089 words·15 mins· loading · loading
AI Generated Machine Learning Reinforcement Learning ๐Ÿข Georgia Institute of Technology
Adaptive Preference Scaling boosts Reinforcement Learning from Human Feedback by using a novel loss function that adapts to varying preference strengths, resulting in improved policy performance and s…
Adaptive Exploration for Data-Efficient General Value Function Evaluations
·2591 words·13 mins· loading · loading
Machine Learning Reinforcement Learning ๐Ÿข McGill University
GVFExplorer: An adaptive behavior policy efficiently learns multiple GVFs by minimizing return variance, optimizing data usage and reducing prediction errors.
Adaptive $Q$-Aid for Conditional Supervised Learning in Offline Reinforcement Learning
·3193 words·15 mins· loading · loading
Machine Learning Reinforcement Learning ๐Ÿข KAIST
Q-Aided Conditional Supervised Learning (QCS) effectively combines the stability of return-conditioned supervised learning with the stitching ability of Q-functions, achieving superior offline reinfor…
Adam on Local Time: Addressing Nonstationarity in RL with Relative Adam Timesteps
·2522 words·12 mins· loading · loading
AI Generated Machine Learning Reinforcement Learning ๐Ÿข University of Oxford
Adam-Rel: A novel optimizer for RL, dramatically improves performance by resetting Adam’s timestep to 0 after target network updates, preventing large, suboptimal changes.
Action Gaps and Advantages in Continuous-Time Distributional Reinforcement Learning
·1595 words·8 mins· loading · loading
Machine Learning Reinforcement Learning ๐Ÿข McGill University
Distributional RL’s sensitivity to high-frequency decisions is unveiled, with new algorithms solving existing performance issues in continuous-time RL.
Achieving Tractable Minimax Optimal Regret in Average Reward MDPs
·1775 words·9 mins· loading · loading
Machine Learning Reinforcement Learning ๐Ÿข Univ. Grenoble Alpes
First tractable algorithm achieves minimax optimal regret in average-reward MDPs, solving a major computational challenge in reinforcement learning.
Achieving Constant Regret in Linear Markov Decision Processes
·1852 words·9 mins· loading · loading
AI Generated Machine Learning Reinforcement Learning ๐Ÿข MIT
Cert-LSVI-UCB achieves constant regret in RL with linear function approximation, even under model misspecification, using a novel certified estimator.
Achieving $ ilde{O}(1/psilon)$ Sample Complexity for Constrained Markov Decision Process
·390 words·2 mins· loading · loading
AI Generated Machine Learning Reinforcement Learning ๐Ÿข Hong Kong University of Science and Technology
Constrained Markov Decision Processes (CMDPs) get an improved sample complexity bound of ร•(1/ฮต) via a new algorithm, surpassing the existing O(1/ฮตยฒ) bound.
Abstract Reward Processes: Leveraging State Abstraction for Consistent Off-Policy Evaluation
·1668 words·8 mins· loading · loading
AI Generated Machine Learning Reinforcement Learning ๐Ÿข University of Massachusetts
STAR framework leverages state abstraction for consistent, low-variance off-policy evaluation in reinforcement learning, outperforming existing methods.
A2PO: Towards Effective Offline Reinforcement Learning from an Advantage-aware Perspective
·2287 words·11 mins· loading · loading
Machine Learning Reinforcement Learning ๐Ÿข Zhejiang University
A2PO: A novel offline RL method tackles constraint conflicts in mixed-quality datasets by disentangling behavior policies with a conditional VAE and optimizing advantage-aware constraints, achieving s…
A Unifying Normative Framework of Decision Confidence
·1353 words·7 mins· loading · loading
Machine Learning Reinforcement Learning ๐Ÿข University of Washington
New normative framework for decision confidence models diverse tasks by incorporating rewards, priors, and uncertainty, outperforming existing methods.
A Unified Principle of Pessimism for Offline Reinforcement Learning under Model Mismatch
·1838 words·9 mins· loading · loading
AI Generated Machine Learning Reinforcement Learning ๐Ÿข Department of Electrical and Computer Engineering University of Central Florida
Unified pessimism principle in offline RL conquers data sparsity & model mismatch, achieving near-optimal performance across various divergence models.
A Unified Confidence Sequence for Generalized Linear Models, with Applications to Bandits
·1965 words·10 mins· loading · loading
AI Generated Machine Learning Reinforcement Learning ๐Ÿข KAIST
A unified confidence sequence (CS) construction for generalized linear models (GLMs) achieves state-of-the-art regret bounds for contextual bandits, notably a poly(S)-free regret for logistic bandits.
A Tractable Inference Perspective of Offline RL
·2824 words·14 mins· loading · loading
AI Generated Machine Learning Reinforcement Learning ๐Ÿข Peking University
Trifle: Tractable inference for Offline RL achieves state-of-the-art results by using tractable generative models to overcome the inference-time suboptimality of existing sequence modeling approaches.