Skip to main content

🏢 Univ. Grenoble Alpes

Achieving Tractable Minimax Optimal Regret in Average Reward MDPs
·1775 words·9 mins· loading · loading
Machine Learning Reinforcement Learning 🏢 Univ. Grenoble Alpes
First tractable algorithm achieves minimax optimal regret in average-reward MDPs, solving a major computational challenge in reinforcement learning.