Spotlight Reinforcement Learnings
2024
Variational Delayed Policy Optimization
·1922 words·10 mins·
loading
·
loading
Reinforcement Learning
π’ University of Southampton
VDPO: A novel framework for delayed reinforcement learning achieving 50% sample efficiency improvement without compromising performance.
Towards an Information Theoretic Framework of Context-Based Offline Meta-Reinforcement Learning
·2208 words·11 mins·
loading
·
loading
Reinforcement Learning
π’ Chinese University of Hong Kong
UNICORN: a unified framework reveals that existing offline meta-reinforcement learning algorithms optimize variations of mutual information, leading to improved generalization.
Thompson Sampling For Combinatorial Bandits: Polynomial Regret and Mismatched Sampling Paradox
·1315 words·7 mins·
loading
·
loading
Reinforcement Learning
π’ UniversitΓ© Paris-Saclay
A novel Thompson Sampling variant achieves polynomial regret for combinatorial bandits, solving a key limitation of existing methods and offering significantly improved performance.
The Value of Reward Lookahead in Reinforcement Learning
·1360 words·7 mins·
loading
·
loading
Reinforcement Learning
π’ CREST, ENSAE, IP Paris
Reinforcement learning agents can achieve significantly higher rewards by using advance knowledge of future rewards; this paper mathematically analyzes this advantage by computing the worst-case perfo…
The Power of Resets in Online Reinforcement Learning
·233 words·2 mins·
loading
·
loading
Reinforcement Learning
π’ Google Research
Leveraging local simulator resets in online reinforcement learning dramatically improves sample efficiency, especially for high-dimensional problems with general function approximation.
The Collusion of Memory and Nonlinearity in Stochastic Approximation With Constant Stepsize
·1378 words·7 mins·
loading
·
loading
Reinforcement Learning
π’ Cornell University
Unlocking the mysteries of stochastic approximation with constant stepsize, this paper reveals how memory and nonlinearity interact to create bias, providing novel analysis and solutions for more accu…
Scalable and Effective Arithmetic Tree Generation for Adder and Multiplier Designs
·2646 words·13 mins·
loading
·
loading
Reinforcement Learning
π’ University of Hong Kong
ArithTreeRL, a novel reinforcement learning approach, generates optimized arithmetic tree structures for adders and multipliers, significantly improving computational efficiency and reducing hardware …
Sample Complexity Reduction via Policy Difference Estimation in Tabular Reinforcement Learning
·406 words·2 mins·
loading
·
loading
Reinforcement Learning
π’ University of Washington
This paper reveals that estimating only policy differences, while effective in bandits, is insufficient for tabular reinforcement learning. However, it introduces a novel algorithm achieving near-opti…
Rethinking Exploration in Reinforcement Learning with Effective Metric-Based Exploration Bonus
·2904 words·14 mins·
loading
·
loading
Reinforcement Learning
π’ University of Macau
Effective Metric-based Exploration Bonus (EME) enhances reinforcement learning exploration by using a robust metric for state discrepancy and a dynamically adjusted scaling factor based on reward mode…
Reinforcement Learning Gradients as Vitamin for Online Finetuning Decision Transformers
·4163 words·20 mins·
loading
·
loading
Reinforcement Learning
π’ University of Illinois Urbana-Champaign
Boost online finetuning of Decision Transformers by adding TD3 gradients, especially when pretrained with low-reward data.
Personalizing Reinforcement Learning from Human Feedback with Variational Preference Learning
·2411 words·12 mins·
loading
·
loading
Reinforcement Learning
π’ University of Washington
VPL: a novel multimodal RLHF personalizes AI by inferring user-specific latent preferences, enabling accurate reward modeling and improved policy alignment for diverse populations.
Optimizing Automatic Differentiation with Deep Reinforcement Learning
·2461 words·12 mins·
loading
·
loading
Reinforcement Learning
π’ Forschungszentrum JΓΌlich & RWTH Aachen
Deep reinforcement learning optimizes automatic differentiation, achieving up to 33% improvement in Jacobian computation by finding efficient elimination orders.
NeoRL: Efficient Exploration for Nonepisodic RL
·1407 words·7 mins·
loading
·
loading
Reinforcement Learning
π’ ETH Zurich
NEORL: Novel nonepisodic RL algorithm guarantees optimal average cost with sublinear regret for nonlinear systems!
Logarithmic Smoothing for Pessimistic Off-Policy Evaluation, Selection and Learning
·2182 words·11 mins·
loading
·
loading
Reinforcement Learning
π’ Criteo AI Lab
Logarithmic Smoothing enhances pessimistic offline contextual bandit algorithms by providing tighter concentration bounds for improved policy evaluation, selection and learning.
Is Behavior Cloning All You Need? Understanding Horizon in Imitation Learning
·1726 words·9 mins·
loading
·
loading
Reinforcement Learning
π’ Microsoft Research
Offline imitation learning achieves surprisingly strong performance, matching online methods’ efficiency under certain conditions, contradicting prior assumptions.
Implicit Curriculum in Procgen Made Explicit
·1613 words·8 mins·
loading
·
loading
Reinforcement Learning
π’ National University of Singapore
C-Procgen reveals implicit curriculum in Procgen’s multi-level training, showing learning shifts gradually from easy to hard contexts.
Goal Reduction with Loop-Removal Accelerates RL and Models Human Brain Activity in Goal-Directed Learning
·1872 words·9 mins·
loading
·
loading
Reinforcement Learning
π’ Indiana University Bloomington
Goal Reduction with Loop-Removal accelerates Reinforcement Learning (RL) and accurately models human brain activity during goal-directed learning by efficiently deriving subgoals from distant original…
Generalized Linear Bandits with Limited Adaptivity
·341 words·2 mins·
loading
·
loading
Reinforcement Learning
π’ Stanford University
This paper introduces two novel algorithms, achieving optimal regret in generalized linear contextual bandits despite limited policy updates, a significant advancement for real-world applications.
Functional Bilevel Optimization for Machine Learning
·1884 words·9 mins·
loading
·
loading
Reinforcement Learning
π’ University of Grenoble Alpes
Functional Bilevel Optimization tackles the ambiguity of using neural networks in bilevel optimization by minimizing the inner objective over a function space, leading to scalable & efficient algorith…
Extensive-Form Game Solving via Blackwell Approachability on Treeplexes
·2500 words·12 mins·
loading
·
loading
Reinforcement Learning
π’ Columbia University
First algorithmic framework for Blackwell approachability on treeplexes, enabling stepsize-invariant EFG solvers with state-of-the-art convergence rates.