Reinforcement Learning

Diffusion for World Modeling: Visual Details Matter in Atari

26 September 2024·2473 words·12 mins· loading · loading

Reinforcement Learning 🏢 University of Geneva

DIAMOND, a novel reinforcement learning agent using a diffusion world model, achieves state-of-the-art performance on the Atari 100k benchmark by leveraging visual details often ignored by discrete la…

Diffusion Actor-Critic with Entropy Regulator

26 September 2024·2005 words·10 mins· loading · loading

AI Generated Machine Learning Reinforcement Learning 🏢 Tsinghua University

DACER, a novel online RL algorithm, uses diffusion models to learn complex policies and adaptively balances exploration-exploitation via entropy estimation, achieving state-of-the-art performance on M…

DiffTORI: Differentiable Trajectory Optimization for Deep Reinforcement and Imitation Learning

26 September 2024·2324 words·11 mins· loading · loading

Reinforcement Learning 🏢 University of California, San Diego

DiffTORI leverages differentiable trajectory optimization for superior deep reinforcement and imitation learning, outperforming prior state-of-the-art methods on high-dimensional robotic tasks.

Deterministic Uncertainty Propagation for Improved Model-Based Offline Reinforcement Learning

26 September 2024·2460 words·12 mins· loading · loading

AI Generated Machine Learning Reinforcement Learning 🏢 University of Southern Denmark

MOMBO: a novel offline reinforcement learning algorithm that uses deterministic uncertainty propagation for faster convergence and tighter suboptimality bounds.

Deterministic Policies for Constrained Reinforcement Learning in Polynomial Time

26 September 2024·1494 words·8 mins· loading · loading

Machine Learning Reinforcement Learning 🏢 University of Wisconsin-Madison

This paper presents an efficient algorithm to compute near-optimal deterministic policies for constrained reinforcement learning problems, solving a 25-year-old computational complexity challenge.

Deep Policy Gradient Methods Without Batch Updates, Target Networks, or Replay Buffers

26 September 2024·3268 words·16 mins· loading · loading

Machine Learning Reinforcement Learning 🏢 University of Alberta

Deep RL excels in simulated robotics, but struggles with real-world limitations like limited computational resources. This paper introduces Action Value Gradient (AVG), a novel incremental deep polic…

Decomposed Prompt Decision Transformer for Efficient Unseen Task Generalization

26 September 2024·2344 words·12 mins· loading · loading

Machine Learning Reinforcement Learning 🏢 Wuhan University

Decomposed Prompt Decision Transformer (DPDT) efficiently learns prompts for unseen tasks using a two-stage paradigm, achieving superior performance in multi-task offline reinforcement learning.

Decision Mamba: Reinforcement Learning via Hybrid Selective Sequence Modeling

26 September 2024·1853 words·9 mins· loading · loading

Machine Learning Reinforcement Learning 🏢 School of Artificial Intelligence, Jilin University

Decision Mamba-Hybrid (DM-H) accelerates in-context RL for long-term tasks by cleverly combining the strengths of Mamba’s linear long-term memory processing and transformer’s high-quality predictions,…

Decision Mamba: A Multi-Grained State Space Model with Self-Evolution Regularization for Offline RL

26 September 2024·2365 words·12 mins· loading · loading

AI Generated Machine Learning Reinforcement Learning 🏢 School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen)

Decision Mamba: a novel offline RL model, leverages a multi-grained state space model and self-evolution regularization to overcome challenges with out-of-distribution data and noisy labels, achieving…

Decentralized Noncooperative Games with Coupled Decision-Dependent Distributions

26 September 2024·1853 words·9 mins· loading · loading

Machine Learning Reinforcement Learning 🏢 Hong Kong University of Science and Technology

Decentralized noncooperative games with coupled decision-dependent distributions are analyzed, providing novel equilibrium concepts, uniqueness conditions, and a decentralized algorithm with sublinear…

Controlled maximal variability along with reliable performance in recurrent neural networks

26 September 2024·2025 words·10 mins· loading · loading

Machine Learning Reinforcement Learning 🏢 Universitat Pompeu Fabra

NeuroMOP, a novel neural principle, maximizes neural variability while ensuring reliable performance in recurrent neural networks, offering new insights into brain function and artificial intelligence…

Contextual Multinomial Logit Bandits with General Value Functions

26 September 2024·300 words·2 mins· loading · loading

Machine Learning Reinforcement Learning 🏢 University of Iowa

Contextual MNL bandits are revolutionized with general value functions, offering enhanced algorithms for stochastic and adversarial settings, surpassing previous results in accuracy and efficiency.

Contextual Bilevel Reinforcement Learning for Incentive Alignment

26 September 2024·3140 words·15 mins· loading · loading

Machine Learning Reinforcement Learning 🏢 ETH Zurich

Contextual Bilevel Reinforcement Learning (CB-RL) tackles real-world strategic decision-making where optimal policies depend on environmental configurations and exogenous events, proposing a stochasti…

Constrained Latent Action Policies for Model-Based Offline Reinforcement Learning

26 September 2024·2359 words·12 mins· loading · loading

Machine Learning Reinforcement Learning 🏢 Machine Learning Research Lab, Volkswagen Group

Constrained Latent Action Policies (C-LAP) revolutionizes offline reinforcement learning by jointly modeling state-action distributions, implicitly constraining policies to improve efficiency and redu…

Confident Natural Policy Gradient for Local Planning in q_π-realizable Constrained MDPs

26 September 2024·227 words·2 mins· loading · loading

Machine Learning Reinforcement Learning 🏢 University of Alberta

Confident-NPG-CMDP: First primal-dual algorithm achieving polynomial sample complexity for solving constrained Markov decision processes (CMDPs) using function approximation and local access model.

Compositional Automata Embeddings for Goal-Conditioned Reinforcement Learning

26 September 2024·3934 words·19 mins· loading · loading

AI Generated Machine Learning Reinforcement Learning 🏢 UC Berkeley

Goal-conditioned RL gets a temporal upgrade with compositional DFAs (cDFAs), enabling zero-shot generalization and faster policy specialization via novel graph neural network embeddings and reach-avoi…

CE-NAS: An End-to-End Carbon-Efficient Neural Architecture Search Framework

26 September 2024·2041 words·10 mins· loading · loading

Machine Learning Reinforcement Learning 🏢 Worcester Polytechnic Institute

CE-NAS: A novel framework minimizes the carbon footprint of Neural Architecture Search by dynamically allocating GPU resources based on predicted carbon intensity, achieving state-of-the-art results w…

Causal Imitation for Markov Decision Processes: a Partial Identification Approach

26 September 2024·1601 words·8 mins· loading · loading

Machine Learning Reinforcement Learning 🏢 Columbia University

This paper presents novel causal imitation learning algorithms using partial identification to achieve expert performance even when unobserved confounders affect Markov Decision Processes.

Catastrophic Goodhart: regularizing RLHF with KL divergence does not mitigate heavy-tailed reward misspecification

26 September 2024·2257 words·11 mins· loading · loading

AI Generated Machine Learning Reinforcement Learning 🏢 Independent / FAR Labs

RLHF’s KL regularization fails to prevent ‘catastrophic Goodhart’—policies achieving high proxy reward but low actual utility—when reward errors have heavy tails.

Carrot and Stick: Eliciting Comparison Data and Beyond

26 September 2024·1825 words·9 mins· loading · loading

Machine Learning Reinforcement Learning 🏢 Harvard University

Truthful comparison data is hard to obtain without ground truth. This paper presents novel peer prediction mechanisms using bonus-penalty payments that incentivize truthful comparisons, even in networ…