Spotlight Reinforcement Learnings

Variational Delayed Policy Optimization

26 September 2024·1922 words·10 mins· loading · loading

Reinforcement Learning 🏢 University of Southampton

VDPO: A novel framework for delayed reinforcement learning achieving 50% sample efficiency improvement without compromising performance.

Towards an Information Theoretic Framework of Context-Based Offline Meta-Reinforcement Learning

26 September 2024·2208 words·11 mins· loading · loading

Reinforcement Learning 🏢 Chinese University of Hong Kong

UNICORN: a unified framework reveals that existing offline meta-reinforcement learning algorithms optimize variations of mutual information, leading to improved generalization.

Thompson Sampling For Combinatorial Bandits: Polynomial Regret and Mismatched Sampling Paradox

26 September 2024·1315 words·7 mins· loading · loading

Reinforcement Learning 🏢 Université Paris-Saclay

A novel Thompson Sampling variant achieves polynomial regret for combinatorial bandits, solving a key limitation of existing methods and offering significantly improved performance.

The Value of Reward Lookahead in Reinforcement Learning

26 September 2024·1360 words·7 mins· loading · loading

Reinforcement Learning 🏢 CREST, ENSAE, IP Paris

Reinforcement learning agents can achieve significantly higher rewards by using advance knowledge of future rewards; this paper mathematically analyzes this advantage by computing the worst-case perfo…

The Power of Resets in Online Reinforcement Learning

26 September 2024·233 words·2 mins· loading · loading

Reinforcement Learning 🏢 Google Research

Leveraging local simulator resets in online reinforcement learning dramatically improves sample efficiency, especially for high-dimensional problems with general function approximation.

The Collusion of Memory and Nonlinearity in Stochastic Approximation With Constant Stepsize

26 September 2024·1378 words·7 mins· loading · loading

Reinforcement Learning 🏢 Cornell University

Unlocking the mysteries of stochastic approximation with constant stepsize, this paper reveals how memory and nonlinearity interact to create bias, providing novel analysis and solutions for more accu…

Scalable and Effective Arithmetic Tree Generation for Adder and Multiplier Designs

26 September 2024·2646 words·13 mins· loading · loading

Reinforcement Learning 🏢 University of Hong Kong

ArithTreeRL, a novel reinforcement learning approach, generates optimized arithmetic tree structures for adders and multipliers, significantly improving computational efficiency and reducing hardware …

Sample Complexity Reduction via Policy Difference Estimation in Tabular Reinforcement Learning

26 September 2024·406 words·2 mins· loading · loading

Reinforcement Learning 🏢 University of Washington

This paper reveals that estimating only policy differences, while effective in bandits, is insufficient for tabular reinforcement learning. However, it introduces a novel algorithm achieving near-opti…

Rethinking Exploration in Reinforcement Learning with Effective Metric-Based Exploration Bonus

26 September 2024·2904 words·14 mins· loading · loading

Reinforcement Learning 🏢 University of Macau

Effective Metric-based Exploration Bonus (EME) enhances reinforcement learning exploration by using a robust metric for state discrepancy and a dynamically adjusted scaling factor based on reward mode…

Reinforcement Learning Gradients as Vitamin for Online Finetuning Decision Transformers

26 September 2024·4163 words·20 mins· loading · loading

Reinforcement Learning 🏢 University of Illinois Urbana-Champaign

Boost online finetuning of Decision Transformers by adding TD3 gradients, especially when pretrained with low-reward data.

Personalizing Reinforcement Learning from Human Feedback with Variational Preference Learning

26 September 2024·2411 words·12 mins· loading · loading

Reinforcement Learning 🏢 University of Washington

VPL: a novel multimodal RLHF personalizes AI by inferring user-specific latent preferences, enabling accurate reward modeling and improved policy alignment for diverse populations.

Optimizing Automatic Differentiation with Deep Reinforcement Learning

26 September 2024·2461 words·12 mins· loading · loading

Reinforcement Learning 🏢 Forschungszentrum Jülich & RWTH Aachen

Deep reinforcement learning optimizes automatic differentiation, achieving up to 33% improvement in Jacobian computation by finding efficient elimination orders.

NeoRL: Efficient Exploration for Nonepisodic RL

26 September 2024·1407 words·7 mins· loading · loading

Reinforcement Learning 🏢 ETH Zurich

NEORL: Novel nonepisodic RL algorithm guarantees optimal average cost with sublinear regret for nonlinear systems!

Logarithmic Smoothing for Pessimistic Off-Policy Evaluation, Selection and Learning

26 September 2024·2182 words·11 mins· loading · loading

Reinforcement Learning 🏢 Criteo AI Lab

Logarithmic Smoothing enhances pessimistic offline contextual bandit algorithms by providing tighter concentration bounds for improved policy evaluation, selection and learning.

Is Behavior Cloning All You Need? Understanding Horizon in Imitation Learning

26 September 2024·1726 words·9 mins· loading · loading

Reinforcement Learning 🏢 Microsoft Research

Offline imitation learning achieves surprisingly strong performance, matching online methods’ efficiency under certain conditions, contradicting prior assumptions.

Implicit Curriculum in Procgen Made Explicit

26 September 2024·1613 words·8 mins· loading · loading

Reinforcement Learning 🏢 National University of Singapore

C-Procgen reveals implicit curriculum in Procgen’s multi-level training, showing learning shifts gradually from easy to hard contexts.

Goal Reduction with Loop-Removal Accelerates RL and Models Human Brain Activity in Goal-Directed Learning

26 September 2024·1872 words·9 mins· loading · loading

Reinforcement Learning 🏢 Indiana University Bloomington

Goal Reduction with Loop-Removal accelerates Reinforcement Learning (RL) and accurately models human brain activity during goal-directed learning by efficiently deriving subgoals from distant original…

Generalized Linear Bandits with Limited Adaptivity

26 September 2024·341 words·2 mins· loading · loading

Reinforcement Learning 🏢 Stanford University

This paper introduces two novel algorithms, achieving optimal regret in generalized linear contextual bandits despite limited policy updates, a significant advancement for real-world applications.

Functional Bilevel Optimization for Machine Learning

26 September 2024·1884 words·9 mins· loading · loading

Reinforcement Learning 🏢 University of Grenoble Alpes

Functional Bilevel Optimization tackles the ambiguity of using neural networks in bilevel optimization by minimizing the inner objective over a function space, leading to scalable & efficient algorith…

Extensive-Form Game Solving via Blackwell Approachability on Treeplexes

26 September 2024·2500 words·12 mins· loading · loading

Reinforcement Learning 🏢 Columbia University

First algorithmic framework for Blackwell approachability on treeplexes, enabling stepsize-invariant EFG solvers with state-of-the-art convergence rates.