Reinforcement Learning

Parallelizing Model-based Reinforcement Learning Over the Sequence Length

26 September 2024·2553 words·12 mins· loading · loading

Machine Learning Reinforcement Learning 🏢 Zhejiang University

PaMoRL framework boosts model-based reinforcement learning speed by parallelizing model and policy learning stages over sequence length, maintaining high sample efficiency.

Overcoming the Sim-to-Real Gap: Leveraging Simulation to Learn to Explore for Real-World RL

26 September 2024·2419 words·12 mins· loading · loading

Machine Learning Reinforcement Learning 🏢 University of California, Berkeley

Leveraging simulation for real-world RL is often hampered by the sim-to-real gap. This paper shows that instead of directly transferring policies, transferring exploratory policies from simulation d…

Oracle-Efficient Reinforcement Learning for Max Value Ensembles

26 September 2024·1715 words·9 mins· loading · loading

Machine Learning Reinforcement Learning 🏢 University of Pennsylvania

Boost RL performance in large state spaces by efficiently learning a policy competitive with the best combination of existing base policies!

Optimizing over Multiple Distributions under Generalized Quasar-Convexity Condition

26 September 2024·331 words·2 mins· loading · loading

Machine Learning Reinforcement Learning 🏢 Peking University

This paper proposes ‘generalized quasar-convexity’ to optimize problems with multiple probability distributions, offering adaptive algorithms with superior iteration complexities compared to existing …

Optimizing Automatic Differentiation with Deep Reinforcement Learning

26 September 2024·2461 words·12 mins· loading · loading

Reinforcement Learning 🏢 Forschungszentrum Jülich & RWTH Aachen

Deep reinforcement learning optimizes automatic differentiation, achieving up to 33% improvement in Jacobian computation by finding efficient elimination orders.

Optimistic Critic Reconstruction and Constrained Fine-Tuning for General Offline-to-Online RL

26 September 2024·2843 words·14 mins· loading · loading

Machine Learning Reinforcement Learning 🏢 Nanjing University of Aeronautics and Astronautics

This paper introduces OCR-CFT, a novel method for general offline-to-online RL, achieving stable and efficient performance improvements by addressing evaluation and improvement mismatches through opti…

Optimal Top-Two Method for Best Arm Identification and Fluid Analysis

26 September 2024·2219 words·11 mins· loading · loading

Machine Learning Reinforcement Learning 🏢 TIFR Mumbai

Optimal Top-Two Algorithm solves best arm identification problem with improved efficiency and computational cost, achieving asymptotic optimality.

Optimal Multi-Fidelity Best-Arm Identification

26 September 2024·2446 words·12 mins· loading · loading

Machine Learning Reinforcement Learning 🏢 Politecnico Di Milano

A new algorithm for multi-fidelity best-arm identification achieves asymptotically optimal cost complexity, offering significant improvements over existing methods.

Optimal Design for Human Preference Elicitation

26 September 2024·1485 words·7 mins· loading · loading

Machine Learning Reinforcement Learning 🏢 University of Wisconsin-Madison

Dope: Efficient algorithms optimize human preference elicitation for learning to rank, minimizing ranking loss and prediction error with absolute and ranking feedback models.

Optimal Batched Best Arm Identification

26 September 2024·1734 words·9 mins· loading · loading

AI Generated Machine Learning Reinforcement Learning 🏢 National University of Singapore

Tri-BBAI & Opt-BBAI achieve optimal asymptotic and near-optimal non-asymptotic sample & batch complexities in batched best arm identification.

Opponent Modeling with In-context Search

26 September 2024·2301 words·11 mins· loading · loading

Machine Learning Reinforcement Learning 🏢 Tencent AI Lab

Opponent Modeling with In-context Search (OMIS) leverages in-context learning and decision-time search for stable and effective opponent adaptation in multi-agent environments.

Opponent Modeling based on Subgoal Inference

26 September 2024·2148 words·11 mins· loading · loading

Machine Learning Reinforcement Learning 🏢 Peking University

Opponent modeling based on subgoal inference (OMG) outperforms existing methods by inferring opponent subgoals, enabling better generalization to unseen opponents in multi-agent environments.

Operator World Models for Reinforcement Learning

26 September 2024·388 words·2 mins· loading · loading

AI Generated Machine Learning Reinforcement Learning 🏢 Istituto Italiano Di Tecnologia

POWR: a novel RL algorithm using operator world models and policy mirror descent achieves global convergence with improved sample efficiency.

OPERA: Automatic Offline Policy Evaluation with Re-weighted Aggregates of Multiple Estimators

26 September 2024·2594 words·13 mins· loading · loading

Machine Learning Reinforcement Learning 🏢 Stanford University

OPERA: A new algorithm intelligently blends multiple offline policy evaluation estimators for more accurate policy performance estimates.

Online Posterior Sampling with a Diffusion Prior

26 September 2024·1905 words·9 mins· loading · loading

AI Generated Machine Learning Reinforcement Learning 🏢 Adobe Research

This paper introduces efficient approximate posterior sampling for contextual bandits using diffusion model priors, improving Thompson sampling’s performance and expressiveness.

Online Control with Adversarial Disturbance for Continuous-time Linear Systems

26 September 2024·1592 words·8 mins· loading · loading

Machine Learning Reinforcement Learning 🏢 Tsinghua University

This paper presents a novel two-level online control algorithm that learns to control continuous-time linear systems under adversarial disturbances, achieving sublinear regret.

On the Role of Information Structure in Reinforcement Learning for Partially-Observable Sequential Teams and Games

26 September 2024·2014 words·10 mins· loading · loading

Machine Learning Reinforcement Learning 🏢 Yale University

New reinforcement learning model clarifies the role of information structure in partially-observable sequential decision-making problems, proving an upper bound on learning complexity.

On the Minimax Regret for Contextual Linear Bandits and Multi-Armed Bandits with Expert Advice

26 September 2024·360 words·2 mins· loading · loading

Machine Learning Reinforcement Learning 🏢 University of Tokyo

This paper provides novel algorithms and matching lower bounds for multi-armed bandits with expert advice and contextual linear bandits, resolving open questions and advancing theoretical understandin…

On the Curses of Future and History in Future-dependent Value Functions for Off-policy Evaluation

26 September 2024·299 words·2 mins· loading · loading

Machine Learning Reinforcement Learning 🏢 University of Illinois Urbana-Champaign

This paper tackles the ‘curse of horizon’ in off-policy evaluation for partially observable Markov decision processes (POMDPs) by proposing novel coverage assumptions, enabling polynomial estimation e…

On the Complexity of Teaching a Family of Linear Behavior Cloning Learners

26 September 2024·1819 words·9 mins· loading · loading

Machine Learning Reinforcement Learning 🏢 University of Washington

A novel algorithm, TIE, optimally teaches a family of linear behavior cloning learners, achieving instance-optimal teaching dimension while providing efficient approximation for larger action spaces.