🏢 Department of Computer Science, University of British Columbia
First-Explore, then Exploit: Meta-Learning to Solve Hard Exploration-Exploitation Trade-Offs
·3099 words·15 mins·
loading
·
loading
Machine Learning
Reinforcement Learning
🏢 Department of Computer Science, University of British Columbia
Meta-RL agents often fail to explore effectively in environments where optimal behavior requires sacrificing immediate rewards for greater future gains. First-Explore, a novel method, tackles this by…