Skip to main content

🏢 UNIST

Exclusively Penalized Q-learning for Offline Reinforcement Learning
·2010 words·10 mins· loading · loading
Reinforcement Learning 🏢 UNIST
EPQ, a novel offline RL algorithm, significantly reduces underestimation bias by selectively penalizing states prone to errors, improving performance over existing methods.