↗ OpenReview ↗ NeurIPS Proc. ↗ Chat
TL;DR#
Reinforcement learning (RL) often relies on simulators, yet existing algorithms struggle to efficiently utilize them, especially in high-dimensional spaces needing general function approximation. This is primarily because online RL algorithms don’t incorporate the additional information simulators provide. This paper focuses on ‘online reinforcement learning with local simulator access’ (RLLS), where the agent can reset to previously seen states.
The paper introduces two novel algorithms, SimGolf and RVFS, that leverage RLLS. SimGolf proves sample efficiency under relaxed representation conditions for MDPs with low coverability. RVFS, a computationally efficient algorithm, provides theoretical guarantees under a stronger assumption (pushforward coverability). Both algorithms demonstrate that RLLS can unlock previously unattainable statistical guarantees, solving notoriously difficult problems like the exogenous block MDP (ExBMDP).
Key Takeaways#
Why does it matter?#
This paper is crucial for researchers in reinforcement learning because it demonstrates how using local simulators, a common tool in RL, can significantly improve the sample efficiency of algorithms, especially those using complex function approximation. The findings challenge existing assumptions about the limitations of online RL and open new avenues for algorithm design and theoretical analysis, leading to more efficient and robust RL systems.