🏢 UNIST
Exclusively Penalized Q-learning for Offline Reinforcement Learning
·2010 words·10 mins·
loading
·
loading
Reinforcement Learning
🏢 UNIST
EPQ, a novel offline RL algorithm, significantly reduces underestimation bias by selectively penalizing states prone to errors, improving performance over existing methods.