Reinforcement Learning

Streaming Bayes GFlowNets

26 September 2024·2088 words·10 mins· loading · loading

AI Generated Machine Learning Reinforcement Learning 🏢 Getulio Vargas Foundation

SB-GFlowNets: Streaming Bayesian inference is now efficient and accurate using GFlowNets, enabling real-time model updates for large, sequential datasets.

Strategic Multi-Armed Bandit Problems Under Debt-Free Reporting

26 September 2024·1460 words·7 mins· loading · loading

Machine Learning Reinforcement Learning 🏢 CREST, ENSAE

Incentive-aware algorithm achieves low regret in strategic multi-armed bandits under debt-free reporting, establishing truthful equilibrium among arms.

Stochastic contextual bandits with graph feedback: from independence number to MAS number

26 September 2024·289 words·2 mins· loading · loading

Machine Learning Reinforcement Learning 🏢 New York University

Contextual bandits with graph feedback achieve near-optimal regret by leveraging a novel graph-theoretic quantity that interpolates between independence and maximum acyclic subgraph numbers, depending…

Statistical Efficiency of Distributional Temporal Difference Learning

26 September 2024·295 words·2 mins· loading · loading

Reinforcement Learning 🏢 Peking University

Researchers achieve minimax optimal sample complexity bounds for distributional temporal difference learning, enhancing reinforcement learning algorithm efficiency.

State-free Reinforcement Learning

26 September 2024·357 words·2 mins· loading · loading

Machine Learning Reinforcement Learning 🏢 Boston University

State-free Reinforcement Learning (SFRL) framework eliminates the need for state-space information in RL algorithms, achieving regret bounds independent of the state space size and adaptive to the rea…

State Chrono Representation for Enhancing Generalization in Reinforcement Learning

26 September 2024·2535 words·12 mins· loading · loading

Machine Learning Reinforcement Learning 🏢 University of California, Santa Barbara

State Chrono Representation (SCR) enhances reinforcement learning generalization by incorporating extensive temporal information and cumulative rewards into state representations, improving performanc…

SPRINQL: Sub-optimal Demonstrations driven Offline Imitation Learning

26 September 2024·3010 words·15 mins· loading · loading

Machine Learning Reinforcement Learning 🏢 Singapore Management University

SPRINQL: Sub-optimal Demonstrations for Offline Imitation Learning

SPO: Sequential Monte Carlo Policy Optimisation

26 September 2024·3026 words·15 mins· loading · loading

Machine Learning Reinforcement Learning 🏢 University of Amsterdam

SPO: A novel model-based RL algorithm leverages parallelisable Monte Carlo tree search for efficient and robust policy improvement in both discrete and continuous environments.

Speculative Monte-Carlo Tree Search

26 September 2024·1943 words·10 mins· loading · loading

Machine Learning Reinforcement Learning 🏢 Pennsylvania State University

Speculative MCTS accelerates AlphaZero training by implementing speculative execution, enabling parallel processing of future moves and reducing latency by up to 5.8x.

Spectral-Risk Safe Reinforcement Learning with Convergence Guarantees

26 September 2024·2502 words·12 mins· loading · loading

AI Generated Machine Learning Reinforcement Learning 🏢 Seoul National University

SRCPO: a novel spectral risk measure-constrained RL algorithm guaranteeing convergence to a global optimum, outperforming existing methods in continuous control tasks.

Sparsity-Agnostic Linear Bandits with Adaptive Adversaries

26 September 2024·336 words·2 mins· loading · loading

Machine Learning Reinforcement Learning 🏢 National University of Singapore

SparseLinUCB: First sparse regret bounds for adversarial action sets with unknown sparsity, achieving superior performance over existing methods!

Span-Based Optimal Sample Complexity for Weakly Communicating and General Average Reward MDPs

26 September 2024·1956 words·10 mins· loading · loading

Reinforcement Learning 🏢 University of Wisconsin-Madison

This paper achieves minimax-optimal bounds for learning near-optimal policies in average-reward MDPs, addressing a long-standing open problem in reinforcement learning.

Solving Zero-Sum Markov Games with Continous State via Spectral Dynamic Embedding

26 September 2024·391 words·2 mins· loading · loading

Machine Learning Reinforcement Learning 🏢 Zhejiang University

SDEPO, a new natural policy gradient algorithm, efficiently solves zero-sum Markov games with continuous state spaces, achieving near-optimal convergence independent of state space cardinality.

Solving Minimum-Cost Reach Avoid using Reinforcement Learning

26 September 2024·2253 words·11 mins· loading · loading

AI Generated Machine Learning Reinforcement Learning 🏢 MIT

RC-PPO: Reinforcement learning solves minimum-cost reach-avoid problems with up to 57% lower costs!

Small steps no more: Global convergence of stochastic gradient bandits for arbitrary learning rates

26 September 2024·447 words·3 mins· loading · loading

AI Generated Machine Learning Reinforcement Learning 🏢 Google DeepMind

Stochastic gradient bandit algorithms now guaranteed to globally converge, using ANY constant learning rate!

SleeperNets: Universal Backdoor Poisoning Attacks Against Reinforcement Learning Agents

26 September 2024·2849 words·14 mins· loading · loading

Machine Learning Reinforcement Learning 🏢 Khoury College of Computer Sciences, Northeastern University

SleeperNets: A universal backdoor attack against RL agents, achieving 100% success rate across diverse environments while preserving benign performance.

Skill-aware Mutual Information Optimisation for Zero-shot Generalisation in Reinforcement Learning

26 September 2024·5509 words·26 mins· loading · loading

AI Generated Machine Learning Reinforcement Learning 🏢 University of Edinburgh

Skill-aware Mutual Information optimization enhances RL agent generalization across diverse tasks by distinguishing context embeddings based on skills, leading to improved zero-shot performance and ro…

SkiLD: Unsupervised Skill Discovery Guided by Factor Interactions

26 September 2024·2028 words·10 mins· loading · loading

Machine Learning Reinforcement Learning 🏢 University of Texas at Austin

SkiLD, a novel unsupervised skill discovery method, uses state factorization and a new objective function to learn skills inducing diverse interactions between state factors, outperforming existing me…

Simplifying Latent Dynamics with Softly State-Invariant World Models

26 September 2024·2423 words·12 mins· loading · loading

Machine Learning Reinforcement Learning 🏢 Max Planck Institute for Biological Cybernetics

This paper introduces the Parsimonious Latent Space Model (PLSM), a novel world model that regularizes latent dynamics to improve action predictability, enhancing RL performance.

Simplifying Constraint Inference with Inverse Reinforcement Learning

26 September 2024·1653 words·8 mins· loading · loading

Machine Learning Reinforcement Learning 🏢 University of Toronto

This paper simplifies constraint inference in reinforcement learning, demonstrating that standard inverse RL methods can effectively infer constraints from expert data, surpassing complex, previously …