Skip to main content

Reinforcement Learning

Parallelizing Model-based Reinforcement Learning Over the Sequence Length
·2553 words·12 mins· loading · loading
Machine Learning Reinforcement Learning 🏒 Zhejiang University
PaMoRL framework boosts model-based reinforcement learning speed by parallelizing model and policy learning stages over sequence length, maintaining high sample efficiency.
Overcoming the Sim-to-Real Gap: Leveraging Simulation to Learn to Explore for Real-World RL
·2419 words·12 mins· loading · loading
Machine Learning Reinforcement Learning 🏒 University of California, Berkeley
Leveraging simulation for real-world RL is often hampered by the sim-to-real gap. This paper shows that instead of directly transferring policies, transferring exploratory policies from simulation d…
Oracle-Efficient Reinforcement Learning for Max Value Ensembles
·1715 words·9 mins· loading · loading
Machine Learning Reinforcement Learning 🏒 University of Pennsylvania
Boost RL performance in large state spaces by efficiently learning a policy competitive with the best combination of existing base policies!
Optimizing over Multiple Distributions under Generalized Quasar-Convexity Condition
·331 words·2 mins· loading · loading
Machine Learning Reinforcement Learning 🏒 Peking University
This paper proposes ‘generalized quasar-convexity’ to optimize problems with multiple probability distributions, offering adaptive algorithms with superior iteration complexities compared to existing …
Optimizing Automatic Differentiation with Deep Reinforcement Learning
·2461 words·12 mins· loading · loading
Reinforcement Learning 🏒 Forschungszentrum Jülich & RWTH Aachen
Deep reinforcement learning optimizes automatic differentiation, achieving up to 33% improvement in Jacobian computation by finding efficient elimination orders.
Optimistic Critic Reconstruction and Constrained Fine-Tuning for General Offline-to-Online RL
·2843 words·14 mins· loading · loading
Machine Learning Reinforcement Learning 🏒 Nanjing University of Aeronautics and Astronautics
This paper introduces OCR-CFT, a novel method for general offline-to-online RL, achieving stable and efficient performance improvements by addressing evaluation and improvement mismatches through opti…
Optimal Top-Two Method for Best Arm Identification and Fluid Analysis
·2219 words·11 mins· loading · loading
Machine Learning Reinforcement Learning 🏒 TIFR Mumbai
Optimal Top-Two Algorithm solves best arm identification problem with improved efficiency and computational cost, achieving asymptotic optimality.
Optimal Multi-Fidelity Best-Arm Identification
·2446 words·12 mins· loading · loading
Machine Learning Reinforcement Learning 🏒 Politecnico Di Milano
A new algorithm for multi-fidelity best-arm identification achieves asymptotically optimal cost complexity, offering significant improvements over existing methods.
Optimal Design for Human Preference Elicitation
·1485 words·7 mins· loading · loading
Machine Learning Reinforcement Learning 🏒 University of Wisconsin-Madison
Dope: Efficient algorithms optimize human preference elicitation for learning to rank, minimizing ranking loss and prediction error with absolute and ranking feedback models.
Optimal Batched Best Arm Identification
·1734 words·9 mins· loading · loading
AI Generated Machine Learning Reinforcement Learning 🏒 National University of Singapore
Tri-BBAI & Opt-BBAI achieve optimal asymptotic and near-optimal non-asymptotic sample & batch complexities in batched best arm identification.
Opponent Modeling with In-context Search
·2301 words·11 mins· loading · loading
Machine Learning Reinforcement Learning 🏒 Tencent AI Lab
Opponent Modeling with In-context Search (OMIS) leverages in-context learning and decision-time search for stable and effective opponent adaptation in multi-agent environments.
Opponent Modeling based on Subgoal Inference
·2148 words·11 mins· loading · loading
Machine Learning Reinforcement Learning 🏒 Peking University
Opponent modeling based on subgoal inference (OMG) outperforms existing methods by inferring opponent subgoals, enabling better generalization to unseen opponents in multi-agent environments.
Operator World Models for Reinforcement Learning
·388 words·2 mins· loading · loading
AI Generated Machine Learning Reinforcement Learning 🏒 Istituto Italiano Di Tecnologia
POWR: a novel RL algorithm using operator world models and policy mirror descent achieves global convergence with improved sample efficiency.
OPERA: Automatic Offline Policy Evaluation with Re-weighted Aggregates of Multiple Estimators
·2594 words·13 mins· loading · loading
Machine Learning Reinforcement Learning 🏒 Stanford University
OPERA: A new algorithm intelligently blends multiple offline policy evaluation estimators for more accurate policy performance estimates.
Online Posterior Sampling with a Diffusion Prior
·1905 words·9 mins· loading · loading
AI Generated Machine Learning Reinforcement Learning 🏒 Adobe Research
This paper introduces efficient approximate posterior sampling for contextual bandits using diffusion model priors, improving Thompson sampling’s performance and expressiveness.
Online Control with Adversarial Disturbance for Continuous-time Linear Systems
·1592 words·8 mins· loading · loading
Machine Learning Reinforcement Learning 🏒 Tsinghua University
This paper presents a novel two-level online control algorithm that learns to control continuous-time linear systems under adversarial disturbances, achieving sublinear regret.
On the Role of Information Structure in Reinforcement Learning for Partially-Observable Sequential Teams and Games
·2014 words·10 mins· loading · loading
Machine Learning Reinforcement Learning 🏒 Yale University
New reinforcement learning model clarifies the role of information structure in partially-observable sequential decision-making problems, proving an upper bound on learning complexity.
On the Minimax Regret for Contextual Linear Bandits and Multi-Armed Bandits with Expert Advice
·360 words·2 mins· loading · loading
Machine Learning Reinforcement Learning 🏒 University of Tokyo
This paper provides novel algorithms and matching lower bounds for multi-armed bandits with expert advice and contextual linear bandits, resolving open questions and advancing theoretical understandin…
On the Curses of Future and History in Future-dependent Value Functions for Off-policy Evaluation
·299 words·2 mins· loading · loading
Machine Learning Reinforcement Learning 🏒 University of Illinois Urbana-Champaign
This paper tackles the ‘curse of horizon’ in off-policy evaluation for partially observable Markov decision processes (POMDPs) by proposing novel coverage assumptions, enabling polynomial estimation e…
On the Complexity of Teaching a Family of Linear Behavior Cloning Learners
·1819 words·9 mins· loading · loading
Machine Learning Reinforcement Learning 🏒 University of Washington
A novel algorithm, TIE, optimally teaches a family of linear behavior cloning learners, achieving instance-optimal teaching dimension while providing efficient approximation for larger action spaces.