🏢 Ericsson AB
Multi-Reward Best Policy Identification
·4494 words·22 mins·
loading
·
loading
Machine Learning
Reinforcement Learning
🏢 Ericsson AB
This paper introduces efficient algorithms, MR-NaS and DBMR-BPI, for identifying optimal policies across multiple reward functions in reinforcement learning, achieving competitive performance with the…