Skip to main content

🏢 Department of Computer Science, University of British Columbia

First-Explore, then Exploit: Meta-Learning to Solve Hard Exploration-Exploitation Trade-Offs
·3099 words·15 mins· loading · loading
Machine Learning Reinforcement Learning 🏢 Department of Computer Science, University of British Columbia
Meta-RL agents often fail to explore effectively in environments where optimal behavior requires sacrificing immediate rewards for greater future gains. First-Explore, a novel method, tackles this by…