↓Skip to main content

🏢 UNIST

Exclusively Penalized Q-learning for Offline Reinforcement Learning

26 September 2024·2010 words·10 mins· loading · loading

Reinforcement Learning 🏢 UNIST

EPQ, a novel offline RL algorithm, significantly reduces underestimation bias by selectively penalizing states prone to errors, improving performance over existing methods.