Reinforcement Learning
Amortized Active Causal Induction with Deep Reinforcement Learning
·3383 words·16 mins·
loading
·
loading
AI Generated
Machine Learning
Reinforcement Learning
๐ข University of Oxford
CAASL: An amortized active intervention design policy trained via reinforcement learning, enabling adaptive, real-time causal graph inference without likelihood access.
Almost Minimax Optimal Best Arm Identification in Piecewise Stationary Linear Bandits
·1649 words·8 mins·
loading
·
loading
Machine Learning
Reinforcement Learning
๐ข National University of Singapore
PSษBAI+ is a near-optimal algorithm for best arm identification in piecewise stationary linear bandits, efficiently detecting changepoints and aligning contexts for improved accuracy and minimal sampl…
Aligning Individual and Collective Objectives in Multi-Agent Cooperation
·1828 words·9 mins·
loading
·
loading
Machine Learning
Reinforcement Learning
๐ข University of Manchester
AI agents learn to cooperate effectively even when individual and group goals clash using the new Altruistic Gradient Adjustment (AgA) algorithm.
Aligning Diffusion Behaviors with Q-functions for Efficient Continuous Control
·1969 words·10 mins·
loading
·
loading
Machine Learning
Reinforcement Learning
๐ข Tsinghua University
Efficient Diffusion Alignment (EDA) leverages pretrained diffusion models and Q-functions for efficient continuous control, exceeding all baselines with minimal annotation.
Adversarially Trained Weighted Actor-Critic for Safe Offline Reinforcement Learning
·2127 words·10 mins·
loading
·
loading
AI Generated
Machine Learning
Reinforcement Learning
๐ข Washington State University
WSAC, a novel algorithm, robustly optimizes safe offline RL policies using adversarial training, guaranteeing improved performance over reference policies with limited data.
Adversarial Environment Design via Regret-Guided Diffusion Models
·2707 words·13 mins·
loading
·
loading
Reinforcement Learning
๐ข Seoul National University
Regret-Guided Diffusion Models enhance unsupervised environment design by generating challenging, diverse training environments that improve agent robustness and zero-shot generalization.
Adaptive Preference Scaling for Reinforcement Learning with Human Feedback
·3089 words·15 mins·
loading
·
loading
AI Generated
Machine Learning
Reinforcement Learning
๐ข Georgia Institute of Technology
Adaptive Preference Scaling boosts Reinforcement Learning from Human Feedback by using a novel loss function that adapts to varying preference strengths, resulting in improved policy performance and s…
Adaptive Exploration for Data-Efficient General Value Function Evaluations
·2591 words·13 mins·
loading
·
loading
Machine Learning
Reinforcement Learning
๐ข McGill University
GVFExplorer: An adaptive behavior policy efficiently learns multiple GVFs by minimizing return variance, optimizing data usage and reducing prediction errors.
Adaptive $Q$-Aid for Conditional Supervised Learning in Offline Reinforcement Learning
·3193 words·15 mins·
loading
·
loading
Machine Learning
Reinforcement Learning
๐ข KAIST
Q-Aided Conditional Supervised Learning (QCS) effectively combines the stability of return-conditioned supervised learning with the stitching ability of Q-functions, achieving superior offline reinfor…
Adam on Local Time: Addressing Nonstationarity in RL with Relative Adam Timesteps
·2522 words·12 mins·
loading
·
loading
AI Generated
Machine Learning
Reinforcement Learning
๐ข University of Oxford
Adam-Rel: A novel optimizer for RL, dramatically improves performance by resetting Adam’s timestep to 0 after target network updates, preventing large, suboptimal changes.
Action Gaps and Advantages in Continuous-Time Distributional Reinforcement Learning
·1595 words·8 mins·
loading
·
loading
Machine Learning
Reinforcement Learning
๐ข McGill University
Distributional RL’s sensitivity to high-frequency decisions is unveiled, with new algorithms solving existing performance issues in continuous-time RL.
Achieving Tractable Minimax Optimal Regret in Average Reward MDPs
·1775 words·9 mins·
loading
·
loading
Machine Learning
Reinforcement Learning
๐ข Univ. Grenoble Alpes
First tractable algorithm achieves minimax optimal regret in average-reward MDPs, solving a major computational challenge in reinforcement learning.
Achieving Constant Regret in Linear Markov Decision Processes
·1852 words·9 mins·
loading
·
loading
AI Generated
Machine Learning
Reinforcement Learning
๐ข MIT
Cert-LSVI-UCB achieves constant regret in RL with linear function approximation, even under model misspecification, using a novel certified estimator.
Achieving $ ilde{O}(1/psilon)$ Sample Complexity for Constrained Markov Decision Process
·390 words·2 mins·
loading
·
loading
AI Generated
Machine Learning
Reinforcement Learning
๐ข Hong Kong University of Science and Technology
Constrained Markov Decision Processes (CMDPs) get an improved sample complexity bound of ร(1/ฮต) via a new algorithm, surpassing the existing O(1/ฮตยฒ) bound.
Abstract Reward Processes: Leveraging State Abstraction for Consistent Off-Policy Evaluation
·1668 words·8 mins·
loading
·
loading
AI Generated
Machine Learning
Reinforcement Learning
๐ข University of Massachusetts
STAR framework leverages state abstraction for consistent, low-variance off-policy evaluation in reinforcement learning, outperforming existing methods.
A2PO: Towards Effective Offline Reinforcement Learning from an Advantage-aware Perspective
·2287 words·11 mins·
loading
·
loading
Machine Learning
Reinforcement Learning
๐ข Zhejiang University
A2PO: A novel offline RL method tackles constraint conflicts in mixed-quality datasets by disentangling behavior policies with a conditional VAE and optimizing advantage-aware constraints, achieving s…
A Unifying Normative Framework of Decision Confidence
·1353 words·7 mins·
loading
·
loading
Machine Learning
Reinforcement Learning
๐ข University of Washington
New normative framework for decision confidence models diverse tasks by incorporating rewards, priors, and uncertainty, outperforming existing methods.
A Unified Principle of Pessimism for Offline Reinforcement Learning under Model Mismatch
·1838 words·9 mins·
loading
·
loading
AI Generated
Machine Learning
Reinforcement Learning
๐ข Department of Electrical and Computer Engineering University of Central Florida
Unified pessimism principle in offline RL conquers data sparsity & model mismatch, achieving near-optimal performance across various divergence models.
A Unified Confidence Sequence for Generalized Linear Models, with Applications to Bandits
·1965 words·10 mins·
loading
·
loading
AI Generated
Machine Learning
Reinforcement Learning
๐ข KAIST
A unified confidence sequence (CS) construction for generalized linear models (GLMs) achieves state-of-the-art regret bounds for contextual bandits, notably a poly(S)-free regret for logistic bandits.
A Tractable Inference Perspective of Offline RL
·2824 words·14 mins·
loading
·
loading
AI Generated
Machine Learning
Reinforcement Learning
๐ข Peking University
Trifle: Tractable inference for Offline RL achieves state-of-the-art results by using tractable generative models to overcome the inference-time suboptimality of existing sequence modeling approaches.