↓Skip to main content

🏢 Ericsson AB

Multi-Reward Best Policy Identification

26 September 2024·4494 words·22 mins· loading · loading

Machine Learning Reinforcement Learning 🏢 Ericsson AB

This paper introduces efficient algorithms, MR-NaS and DBMR-BPI, for identifying optimal policies across multiple reward functions in reinforcement learning, achieving competitive performance with the…