↗ OpenReview ↗ NeurIPS Homepage ↗ Chat
TL;DR#
Multi-objective reinforcement learning (MORL) uses utility functions to aggregate multiple objectives, but lacks theoretical understanding of these functions. Existing MORL algorithms assume optimal policies always exist for any utility function and that all preferences can be represented by a utility function, which are incorrect assumptions. This creates limitations in algorithm design and solution concept understanding.
This research addresses these gaps by formally characterizing utility functions that guarantee optimal policies. It introduces novel concepts of preference relations and utility maximization in MORL and provides conditions for representing preferences as utility functions. This theoretical groundwork promotes the development of novel, more efficient and reliable MORL algorithms.
Key Takeaways#
Why does it matter?#
This paper is crucial for advancing multi-objective reinforcement learning (MORL). By rigorously analyzing utility functions, it addresses fundamental theoretical gaps, paving the way for more efficient and reliable MORL algorithms. Its findings are relevant to various applications and inspire further research into the theoretical foundations of MORL and developing algorithms that exploit the theoretical findings.