Reinforcement Learning

Hierarchical Programmatic Option Framework

26 September 2024·5774 words·28 mins· loading · loading

AI Generated Machine Learning Reinforcement Learning 🏢 National Taiwan University

Hierarchical Programmatic Option framework (HIPO) uses human-readable programs as options in reinforcement learning to solve long, repetitive tasks with improved interpretability and generalization.

GUIDE: Real-Time Human-Shaped Agents

26 September 2024·2015 words·10 mins· loading · loading

Machine Learning Reinforcement Learning 🏢 Duke University

GUIDE: Real-time human-shaped AI agents achieve up to 30% higher success rates using continuous human feedback, boosted by a parallel training model that mimics human input for continued improvement.

GTA: Generative Trajectory Augmentation with Guidance for Offline Reinforcement Learning

26 September 2024·3982 words·19 mins· loading · loading

Machine Learning Reinforcement Learning 🏢 KAIST

Generative Trajectory Augmentation (GTA) significantly boosts offline reinforcement learning by generating high-reward trajectories using a conditional diffusion model, enhancing algorithm performance…

Grounded Answers for Multi-agent Decision-making Problem through Generative World Model

26 September 2024·2428 words·12 mins· loading · loading

Machine Learning Reinforcement Learning 🏢 National Key Laboratory of Human-Machine Hybrid Augmented Intelligence

Generative world models enhance multi-agent decision-making by simulating trial-and-error learning, improving answer accuracy and explainability.

Graph Diffusion Policy Optimization

26 September 2024·2821 words·14 mins· loading · loading

AI Generated Machine Learning Reinforcement Learning 🏢 Zhejiang University

GDPO: A novel method optimizes graph diffusion models for any objective using reinforcement learning, achieving state-of-the-art performance in diverse graph generation tasks.

Going Beyond Heuristics by Imposing Policy Improvement as a Constraint

26 September 2024·3576 words·17 mins· loading · loading

Machine Learning Reinforcement Learning 🏢 National Taiwan University

HEPO, a novel constrained optimization method, consistently surpasses heuristic-trained policies in reinforcement learning by ensuring policy improvement over heuristics, regardless of heuristic quali…

Goal-Conditioned On-Policy Reinforcement Learning

26 September 2024·2363 words·12 mins· loading · loading

Machine Learning Reinforcement Learning 🏢 National University of Defense Technology

GCPO: a novel on-policy goal-conditioned reinforcement learning framework tackles limitations of existing HER-based methods by effectively addressing multi-goal Markovian and non-Markovian reward prob…

Goal Reduction with Loop-Removal Accelerates RL and Models Human Brain Activity in Goal-Directed Learning

26 September 2024·1872 words·9 mins· loading · loading

Reinforcement Learning 🏢 Indiana University Bloomington

Goal Reduction with Loop-Removal accelerates Reinforcement Learning (RL) and accurately models human brain activity during goal-directed learning by efficiently deriving subgoals from distant original…

Global Rewards in Restless Multi-Armed Bandits

26 September 2024·2031 words·10 mins· loading · loading

Machine Learning Reinforcement Learning 🏢 Carnegie Mellon University

Restless multi-armed bandits with global rewards (RMAB-G) are introduced, extending the model to handle non-separable rewards and offering novel index-based and adaptive policies that outperform exist…

Generating Code World Models with Large Language Models Guided by Monte Carlo Tree Search

26 September 2024·4163 words·20 mins· loading · loading

AI Generated Machine Learning Reinforcement Learning 🏢 Aalto University

LLMs guided by Monte Carlo Tree Search generate precise, efficient Python code as world models for model-based reinforcement learning, significantly improving sample efficiency and inference speed.

Generalizing Consistency Policy to Visual RL with Prioritized Proximal Experience Regularization

26 September 2024·3205 words·16 mins· loading · loading

AI Generated Machine Learning Reinforcement Learning 🏢 Institute of Automation, Chinese Academy of Sciences

CP3ER, a novel consistency policy with prioritized proximal experience regularization, significantly boosts sample efficiency and stability in visual reinforcement learning, achieving state-of-the-art…

Generalized Linear Bandits with Limited Adaptivity

26 September 2024·341 words·2 mins· loading · loading

Reinforcement Learning 🏢 Stanford University

This paper introduces two novel algorithms, achieving optimal regret in generalized linear contextual bandits despite limited policy updates, a significant advancement for real-world applications.

Gaussian Process Bandits for Top-k Recommendations

26 September 2024·1799 words·9 mins· loading · loading

Machine Learning Reinforcement Learning 🏢 University of Massachusetts Amherst

GP-TopK: A novel contextual bandit algorithm uses Gaussian processes with a Kendall kernel for efficient & accurate top-k recommendations, even with limited feedback.

Gaussian Approximation and Multiplier Bootstrap for Polyak-Ruppert Averaged Linear Stochastic Approximation with Applications to TD Learning

26 September 2024·1382 words·7 mins· loading · loading

Machine Learning Reinforcement Learning 🏢 HSE University

This paper delivers non-asymptotic accuracy bounds for confidence intervals in linear stochastic approximation, leveraging a novel multiplier bootstrap method.

Functional Bilevel Optimization for Machine Learning

26 September 2024·1884 words·9 mins· loading · loading

Reinforcement Learning 🏢 University of Grenoble Alpes

Functional Bilevel Optimization tackles the ambiguity of using neural networks in bilevel optimization by minimizing the inner objective over a function space, leading to scalable & efficient algorith…

From Text to Trajectory: Exploring Complex Constraint Representation and Decomposition in Safe Reinforcement Learning

26 September 2024·3972 words·19 mins· loading · loading

AI Generated Machine Learning Reinforcement Learning 🏢 Beihang University

TTCT translates natural language constraints into effective training signals for safe reinforcement learning, enabling agents to learn safer policies with lower violation rates and zero-shot transfer …

Foundations of Multivariate Distributional Reinforcement Learning

26 September 2024·1558 words·8 mins· loading · loading

Machine Learning Reinforcement Learning 🏢 Google DeepMind

First oracle-free, computationally tractable algorithms for provably convergent multivariate distributional RL are introduced, achieving convergence rates matching scalar settings and offering insight…

Focus On What Matters: Separated Models For Visual-Based RL Generalization

26 September 2024·2557 words·13 mins· loading · loading

Machine Learning Reinforcement Learning 🏢 Department of Computer Science, Tongji University

SMG (Separated Models for Generalization) enhances visual RL generalization by disentangling task-relevant and irrelevant visual features via cooperative reconstruction, achieving state-of-the-art per…

FlexPlanner: Flexible 3D Floorplanning via Deep Reinforcement Learning in Hybrid Action Space with Multi-Modality Representation

26 September 2024·3516 words·17 mins· loading · loading

AI Generated Machine Learning Reinforcement Learning 🏢 Dept. of CSE & School of AI & MoE Key Lab of AI, Shanghai Jiao Tong University

FlexPlanner: Deep reinforcement learning solves flexible 3D floorplanning, improving wirelength and alignment significantly.

Fixed Confidence Best Arm Identification in the Bayesian Setting

26 September 2024·1424 words·7 mins· loading · loading

AI Generated Machine Learning Reinforcement Learning 🏢 Universitá Degli Studi Di Milano

Bayesian best-arm identification algorithm achieves near-optimal sample complexity by incorporating an early-stopping criterion.