Skip to main content

🏢 Criteo AI Lab

Logarithmic Smoothing for Pessimistic Off-Policy Evaluation, Selection and Learning
·2182 words·11 mins· loading · loading
Reinforcement Learning 🏢 Criteo AI Lab
Logarithmic Smoothing enhances pessimistic offline contextual bandit algorithms by providing tighter concentration bounds for improved policy evaluation, selection and learning.