↓Skip to main content

🏢 Criteo AI Lab

Logarithmic Smoothing for Pessimistic Off-Policy Evaluation, Selection and Learning

26 September 2024·2182 words·11 mins· loading · loading

Reinforcement Learning 🏢 Criteo AI Lab

Logarithmic Smoothing enhances pessimistic offline contextual bandit algorithms by providing tighter concentration bounds for improved policy evaluation, selection and learning.