Skip to main content

Reinforcement Learning

Diffusion for World Modeling: Visual Details Matter in Atari
·2473 words·12 mins· loading · loading
Reinforcement Learning 🏒 University of Geneva
DIAMOND, a novel reinforcement learning agent using a diffusion world model, achieves state-of-the-art performance on the Atari 100k benchmark by leveraging visual details often ignored by discrete la…
Diffusion Actor-Critic with Entropy Regulator
·2005 words·10 mins· loading · loading
AI Generated Machine Learning Reinforcement Learning 🏒 Tsinghua University
DACER, a novel online RL algorithm, uses diffusion models to learn complex policies and adaptively balances exploration-exploitation via entropy estimation, achieving state-of-the-art performance on M…
DiffTORI: Differentiable Trajectory Optimization for Deep Reinforcement and Imitation Learning
·2324 words·11 mins· loading · loading
Reinforcement Learning 🏒 University of California, San Diego
DiffTORI leverages differentiable trajectory optimization for superior deep reinforcement and imitation learning, outperforming prior state-of-the-art methods on high-dimensional robotic tasks.
Deterministic Uncertainty Propagation for Improved Model-Based Offline Reinforcement Learning
·2460 words·12 mins· loading · loading
AI Generated Machine Learning Reinforcement Learning 🏒 University of Southern Denmark
MOMBO: a novel offline reinforcement learning algorithm that uses deterministic uncertainty propagation for faster convergence and tighter suboptimality bounds.
Deterministic Policies for Constrained Reinforcement Learning in Polynomial Time
·1494 words·8 mins· loading · loading
Machine Learning Reinforcement Learning 🏒 University of Wisconsin-Madison
This paper presents an efficient algorithm to compute near-optimal deterministic policies for constrained reinforcement learning problems, solving a 25-year-old computational complexity challenge.
Deep Policy Gradient Methods Without Batch Updates, Target Networks, or Replay Buffers
·3268 words·16 mins· loading · loading
Machine Learning Reinforcement Learning 🏒 University of Alberta
Deep RL excels in simulated robotics, but struggles with real-world limitations like limited computational resources. This paper introduces Action Value Gradient (AVG), a novel incremental deep polic…
Decomposed Prompt Decision Transformer for Efficient Unseen Task Generalization
·2344 words·12 mins· loading · loading
Machine Learning Reinforcement Learning 🏒 Wuhan University
Decomposed Prompt Decision Transformer (DPDT) efficiently learns prompts for unseen tasks using a two-stage paradigm, achieving superior performance in multi-task offline reinforcement learning.
Decision Mamba: Reinforcement Learning via Hybrid Selective Sequence Modeling
·1853 words·9 mins· loading · loading
Machine Learning Reinforcement Learning 🏒 School of Artificial Intelligence, Jilin University
Decision Mamba-Hybrid (DM-H) accelerates in-context RL for long-term tasks by cleverly combining the strengths of Mamba’s linear long-term memory processing and transformer’s high-quality predictions,…
Decision Mamba: A Multi-Grained State Space Model with Self-Evolution Regularization for Offline RL
·2365 words·12 mins· loading · loading
AI Generated Machine Learning Reinforcement Learning 🏒 School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen)
Decision Mamba: a novel offline RL model, leverages a multi-grained state space model and self-evolution regularization to overcome challenges with out-of-distribution data and noisy labels, achieving…
Decentralized Noncooperative Games with Coupled Decision-Dependent Distributions
·1853 words·9 mins· loading · loading
Machine Learning Reinforcement Learning 🏒 Hong Kong University of Science and Technology
Decentralized noncooperative games with coupled decision-dependent distributions are analyzed, providing novel equilibrium concepts, uniqueness conditions, and a decentralized algorithm with sublinear…
Controlled maximal variability along with reliable performance in recurrent neural networks
·2025 words·10 mins· loading · loading
Machine Learning Reinforcement Learning 🏒 Universitat Pompeu Fabra
NeuroMOP, a novel neural principle, maximizes neural variability while ensuring reliable performance in recurrent neural networks, offering new insights into brain function and artificial intelligence…
Contextual Multinomial Logit Bandits with General Value Functions
·300 words·2 mins· loading · loading
Machine Learning Reinforcement Learning 🏒 University of Iowa
Contextual MNL bandits are revolutionized with general value functions, offering enhanced algorithms for stochastic and adversarial settings, surpassing previous results in accuracy and efficiency.
Contextual Bilevel Reinforcement Learning for Incentive Alignment
·3140 words·15 mins· loading · loading
Machine Learning Reinforcement Learning 🏒 ETH Zurich
Contextual Bilevel Reinforcement Learning (CB-RL) tackles real-world strategic decision-making where optimal policies depend on environmental configurations and exogenous events, proposing a stochasti…
Constrained Latent Action Policies for Model-Based Offline Reinforcement Learning
·2359 words·12 mins· loading · loading
Machine Learning Reinforcement Learning 🏒 Machine Learning Research Lab, Volkswagen Group
Constrained Latent Action Policies (C-LAP) revolutionizes offline reinforcement learning by jointly modeling state-action distributions, implicitly constraining policies to improve efficiency and redu…
Confident Natural Policy Gradient for Local Planning in q_Ο€-realizable Constrained MDPs
·227 words·2 mins· loading · loading
Machine Learning Reinforcement Learning 🏒 University of Alberta
Confident-NPG-CMDP: First primal-dual algorithm achieving polynomial sample complexity for solving constrained Markov decision processes (CMDPs) using function approximation and local access model.
Compositional Automata Embeddings for Goal-Conditioned Reinforcement Learning
·3934 words·19 mins· loading · loading
AI Generated Machine Learning Reinforcement Learning 🏒 UC Berkeley
Goal-conditioned RL gets a temporal upgrade with compositional DFAs (cDFAs), enabling zero-shot generalization and faster policy specialization via novel graph neural network embeddings and reach-avoi…
CE-NAS: An End-to-End Carbon-Efficient Neural Architecture Search Framework
·2041 words·10 mins· loading · loading
Machine Learning Reinforcement Learning 🏒 Worcester Polytechnic Institute
CE-NAS: A novel framework minimizes the carbon footprint of Neural Architecture Search by dynamically allocating GPU resources based on predicted carbon intensity, achieving state-of-the-art results w…
Causal Imitation for Markov Decision Processes: a Partial Identification Approach
·1601 words·8 mins· loading · loading
Machine Learning Reinforcement Learning 🏒 Columbia University
This paper presents novel causal imitation learning algorithms using partial identification to achieve expert performance even when unobserved confounders affect Markov Decision Processes.
Catastrophic Goodhart: regularizing RLHF with KL divergence does not mitigate heavy-tailed reward misspecification
·2257 words·11 mins· loading · loading
AI Generated Machine Learning Reinforcement Learning 🏒 Independent / FAR Labs
RLHF’s KL regularization fails to prevent ‘catastrophic Goodhart’β€”policies achieving high proxy reward but low actual utilityβ€”when reward errors have heavy tails.
Carrot and Stick: Eliciting Comparison Data and Beyond
·1825 words·9 mins· loading · loading
Machine Learning Reinforcement Learning 🏒 Harvard University
Truthful comparison data is hard to obtain without ground truth. This paper presents novel peer prediction mechanisms using bonus-penalty payments that incentivize truthful comparisons, even in networ…