Skip to main content

🏢 Ericsson AB

Multi-Reward Best Policy Identification
·4494 words·22 mins· loading · loading
Machine Learning Reinforcement Learning 🏢 Ericsson AB
This paper introduces efficient algorithms, MR-NaS and DBMR-BPI, for identifying optimal policies across multiple reward functions in reinforcement learning, achieving competitive performance with the…